## Mediation Analysis

This repository contains the code to replicate the experiments for the paper "Investigating Gender Bias in Language Models Using Causal Mediation Analysis"".

### Neuron Experiments

#### Create Analysis CSVs

You can run all the experiments for a given model by running the `run_profession_neuron_experiments.py` script. Just set the `-model` flag to the GPT-2 version you want to use and point `-out_dir` to the base directory for your results. The resulting CSV's will be saved in `${out_dir}/results/${date}_neuron_intervention`.

#### Compute total effect and correlation with professions

We provide two scripts `compute_neuron_split_total_effect` and `compute_neuron_total_effect` that will report the total effects for a model in multiple different ways.

`compute_neural_total_effect` will additionally compute the correlational value between effect sizes and the bias value of the profession and generate a plot in `${out_dir}/neuron_profession_correlation.pdf`.

#### Compute aggregate neuron effects

If you want to compute the aggregate effect for each neuron, you can run `compute_and_save_neuron_agg_effect.py`, which will create a new file in `results/${date}_neuron_intervention` called `${model_name}_neuron_effects.csv` with the results.

After you have run this for each of the models you want to investigate, you can run `compute_neuron_effect_per_layer.py` which will generate plots of the per-layer effects.
One aggregate plot will be at `${out_dir}/neuron_layer_effect.pdf` and a separate plot for each model will be saved at `${out_dir}/neuron_layer_effect_${model_name}.pdf`.

### Attention Experiments

#### Create Analysis JSON files

Note: the analysis JSON files for winogender and winobias are already available under the `winogender_data` and `winobias_data` directories respectively, so you may disregard the following instructions if you wish. The raw Winogender and Winobias datasets (the non-json datasets in those same directories) were obtained from https://github.com/rudinger/winogender-schemas and from https://github.com/uclanlp/corefBias/tree/master/WinoBias/wino/data respectively.

If you wish to recreate the analysis files from scratch, you can run the attention intervention experiments for a specific configuration by running either the `attention_intervention_winobias.py` or `attention_intervention_winogender.py` scripts. The arguments are specified in the respective script in the `intervene_attention` method. See `attention_intervention_winobias.sh` or `attention_intervention_winogender.sh` for all possible configurations. The results will be written to the `winobias_data/` or `winogender_data/` directory.

#### Generate reports

Various reports can be generated from the JSON files by running `attention_figures1.py`,
`attention_figures2.py`, or `attention_figures3.py.`
See the respective script for a description of the reports generated. You may want to modify these scripts to only generate figures for a subset of configurations. The results are written as pdf files to subfolders in the `results/` directory.

### Sparsity Experiments

#### Attention head selection

You can run experiments for attention head sparsity with `attention_intervention_subset_selection.py` using either Top-k or Greedy algorithm. Results are stored in `{out_dir}/{algo}_{model_type}_{data}.pickle`.

Additionally, intermediate results will be cached in `{out_dir}/{algo}_intermediate_{model_type}_{data}.pickle` and mean effect (for the entire model, each layer and each head) will be stored in `{out_dir}/mean_effect_{model_type}_{data}.pickle`.

Script takes in model_type (gpt-2 version), algo (greedy or topk), k (int), data (winobias or winogender) and out_dir (base directory for results).

`python attention_intervention_subset_selection.py --model_type gpt2 --algo greedy --k 10 \ --data winobias --out_dir results`

#### Neuron selection

You can run experiments for neuron sparsity with `neuron_intervention_subset_selection.py` which outputs results in `{out_dir}/{algo}_{model_type}{_layer}.pickle`. If layer is specified, then neurons are only selected from the specified layer.

Additionally, the average odds ratio for each layer and each neuron will be stored in `{out_dir}/marg_contrib.pickle`. If `{out_dir}/marg_contrib.pickle` exists, script will use data from this file and not recompute.

Script takes in model_type (gpt-2 version), algo (greedy or topk), k (int), layer (-1 to select neurons from entire model and 0-12 for specific layer) and out_dir (base directory for results). Currently, only compatible with GPT-2.

`python neuron_intervention_subset_selection.py --algo greedy --k 10 \ --layer -1 --out_dir results`
