This directory contains the code to reproduce the figures in our paper "Minimax Regret for Cascading Bandits". We provide some high-level notes here and thorough comments in the code itself.

TL;DR: The shell script run_and_plot_all.sh can be used to recreate all of the figures. This script downloads and processes the MovieLens data (saving the output in the data directory), generates the raw results needed for the figures (saving them in the results directory), and loads the results to create the plots (saving them in the plots directory). The total time to execute this script is on the order of hours and could take a day or so, depending on the hardware (see Appendix C in our paper for further information). All of the files generated (MovieLens data, raw results, and plots) require roughly 2GB of storage availability (mostly comprised of the raw results).

In more detail, the Python code consists of run_*.py files that generate the results and plot_*.py files that create the plots.
- run_syn_tab.py: generates results for the tabular synthetic data experiment and saves them as .npz (NumPy) files in the results directory.
- plot_syn_tab.py: loads results generated by run_syn_tab.py (if they exist; otherwise, prints an error message), creates plots on top row of Figure 1, and saves them as a .png file in the plots directory.
- run_syn_lin.py: generates results for the linear synthetic data experiment and saves them as .npz (NumPy) files in the results directory.
- plot_syn_lin.py: loads results generated by run_syn_lin.py (if they exist; otherwise, prints an error message), creates plots on bottom row of Figure 1, and saves them as a .png file in the plots directory.
- run_real_lin.py: generates results for the real data experiment (if MovieLens data is available as described below; otherwise, prints an error message) and saves them as .npz (NumPy) files in the results directory.
- plot_real_lin_main.py: loads results generated by run_real_lin.py (if they and the file data/movies.npz described below exist; otherwise, prints an error message), creates plots of Figure 2, and saves them as a .png file in the plots directory.
- plot_real_lin_app.py: loads results generated by run_real_lin.py (if they and the file data/movies.npz described below exist; otherwise, prints an error message), creates plots of Figure 3, and saves them as a .png file in the plots directory.

There are also two additional Python files:
- cascading_alg.py: contains the five cascading bandit algorithms described in our paper.
- other_alg.py: contains other helper functions.

Note that none of the error messages mentioned above should occur if the shell script is executed first. However, if (for example) plot_syn_tab.py is manually executed before run_syn_tab.py (and before the shell script), an error will arise. 

Along these lines, run_real_lin.py will print an error if it is executed before the shell script and before the MovieLens data is available. To make it available without executing the shell script, download ml-1m.zip from https://files.grouplens.org/datasets/movielens, unzip, and place the resulting folder ml-1m in the data directory. Alternatively, simply execute the following on the command line:
- wget -q https://files.grouplens.org/datasets/movielens/ml-1m.zip
- unzip -q ml-1m.zip -d data && rm ml-1m.zip

Finally, the first time run_real_lin.py is successfully executed (either via the shell script or after making MovieLens data available manually as described in the previous paragraph), it will process the data and save a NumPy file data/movies.npz. As alluded to above, the plot_real_lin*.py files print error messages if data/movies.npz is not present, because these files load the list of genres from that .npz file. As long as run_real_lin.py is executed before plot_real_lin*.py (either via the shell script or manually), this error should not arise either.
