# Bayesian Bellman Operators

## Project structure:
- `data/`: stores the result data for the experiments that have been run.
- `policy_evaluation/algorithms/`: contains the algorithm implementations. Specifically, `policy/evaluation/algorithms/bbo.py` contains the `BBORandomizedPrior` algorithm used.
- `policy_evaluation/environments/`: contains the `PuddleWorld` environment. For `MountainCar`, we use the gym `MountainCar-Continuous-v0`, and for `NLinkPendulum`, the one provided by Dann et al. These are not needed when we have access to the previously generated datasets.
- `policy_evaluation/policies/`: contains policy helper classes. These are not needed when we have access to the previously generated datasets.
- `policy_evaluation/runners/`: Main entropoints for all the experiments.
- `policy_evaluation/tasks/`: Just a helper class to run the task.
- `policy_evaluation/value_functions/`: contains value function helper classes.

## To generate datasets:
For Puddle World and Mountain Car, the experiment runners will automatically generate the datasets. For 20-Link Pendulum, the data must be created using `tdlearn/generate_20_link_pendulum_data.py` and the resulting files moved to `policy-evaluation/policy_evaluation/runners/data/[environment-id]/{datasets,value_functions}`.

## To replicate results in Figure 9:

1. Install conda environment (if you have gpu access, uncomment the cuda-related lines in `environment.yml`):
```bash
cd [project-root]

CONDA_ENV_NAME="bbo-policy-evaluation" && \
  conda activate base && \
  conda env remove -y --name "${CONDA_ENV_NAME}"; \
  conda env create -f ./environment.yml -n "${CONDA_ENV_NAME}" && \
  conda activate "${CONDA_ENV_NAME}"
```

2. Run all algorithm and environment combinations. All the runs take the following form:

```bash
python -m \
    policy_evaluation.runners.[environment-name] \
    --algorithm [algorithm] \
    --num-steps 100000 \
    --epoch-length 100 \
    --num-samples 3 \
    --debug=false
```

Where `[environment-name]` in `{n_link_pendulum,puddle_world,mountain_car}` and `[algorithm]` in `{td0,tdc,bbo}`. For TD(0) and TDC, just run the script, the hyperparameters should be set correctly in the runner file. For different BBO variants, you need to change the variants under `algorithm_params[algorithm_id]['config']` in the runner file as follows:
- For Gradient BBO w/ prior: use the default hyperparameters.
- For Gradient BBO w/0 prior: set `'prior_loss_weight': 0.0'`.
- For Direct BBO w/0 prior: set `'omega_lr': 1.0`.


Once you have ran all the experiments, the plot can be created with `./visualization/plot_ablation.py`. You need to change the the `BASE_DIR` and the `EXPERIMENT_IDS` so that they match the experiments just ran.
