# Bayesian Bellman Actor-Critic

## Project structure:
- `bbac/algorithms/`: contains the algorithm implementations. Specifically, `bbac/algorithms/bbac.py` contains the `BayesianBellmanActorCritic` algorithm used.
- `policy_evaluation/environments/`: contains the `cartpole-custom_swingup_sparse` environment. For `MountainCar`, we use the gym `MountainCar-Continuous-v0`.
- `policy_evaluation/runners/bbac`: Main entrypoint the experiments.
- `bbac/value_functions/`: contains value function helper classes.

## To run BBAC:

1. Install conda environment (if you have gpu access, uncomment the cuda-related lines in `environment.yml`):
```bash
cd [project-root]

CONDA_ENV_NAME="bbac" && \
  conda activate base && \
  conda env remove -y --name "${CONDA_ENV_NAME}"; \
  conda env create -f ./environment.yml -n "${CONDA_ENV_NAME}" && \
  conda activate "${CONDA_ENV_NAME}"
```

2. Run all algorithm and environment combinations. All the runs take the following form:

```bash
softlearning launch_example_local \
    bbac.runners.bbac \
    --run-eagerly=false \
    --algorithm=BayesianBellmanActorCritic \
    --universe=[universe] \
    --domain=[domain] \
    --task=[task] \
    --exp-name="bbac-1" \
    --num-samples=1 \
    --trial-cpus=1 \
    --trial-gpus=0 \
    --video-save-frequency=0 \
    --checkpoint-frequency=0 \
    --checkpoint-replay-pool=False \
    --checkpoint-at-end=false \
    --max-failures=0 \
    --server-port=''
```

Where `[universe]` in `{gym,dm_control}`, `[domain]` in `{MountainCar,cartpole}`, and `[task]` in `{Continuous-v0,custom_swingup_sparse}`. The hyperparameters can be configured from `control/bbac/runners/bbac/variants.py`. For `BAC`, use the same hyperparameters
as for `BBAC`, but set the target update weight `tau=1.0`. The results can be 
visualized with the scripts in `control/results`.
