{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Adversarial Training with `mister_ed`\n",
    "This file will contain the basics on how to perform adversarial training under the `mister_ed` framework. It's highly recommended that you have walked through tutorial_1 before going through this."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As usual, we'll start by importing everything we'll need"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# EXTERNAL LIBRARY IMPORTS\n",
    "import numpy as np \n",
    "import scipy \n",
    "import matplotlib.pyplot as plt \n",
    "\n",
    "import torch # Need torch version >=0.3\n",
    "import torch.nn as nn \n",
    "import torch.optim as optim \n",
    "assert float(torch.__version__[:3]) >= 0.3\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# MISTER ED SPECIFIC IMPORT BLOCK\n",
    "# (here we do things so relative imports work )\n",
    "# Universal import block \n",
    "# Block to get the relative imports working \n",
    "import os\n",
    "import sys \n",
    "module_path = os.path.abspath(os.path.join('..'))\n",
    "if module_path not in sys.path:\n",
    "    sys.path.append(module_path)\n",
    "\n",
    "\n",
    "import config\n",
    "import prebuilt_loss_functions as plf\n",
    "import loss_functions as lf \n",
    "import utils.pytorch_utils as utils\n",
    "import utils.image_utils as img_utils\n",
    "import cifar10.cifar_loader as cifar_loader\n",
    "import cifar10.cifar_resnets as cifar_resnets\n",
    "import adversarial_training as advtrain\n",
    "import adversarial_evaluation as adveval\n",
    "import utils.checkpoints as checkpoints\n",
    "import adversarial_perturbations as ap \n",
    "import adversarial_attacks as aa\n",
    "import spatial_transformers as st\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's define what we want to do here:\n",
    "\n",
    "Our goal is to run through a few training epochs of a pretrained classifier where we augment the training data with a set of adversarial examples. For simplicity's sake, let's just try and train a few epochs of a 20-layer ResNet trained on CIFAR-10, defended against an FGSM attack of $\\epsilon=8$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First things first, let's instatiate our pretrained classifier and our training dataset. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cifar_trainset = cifar_loader.load_cifar_data('train')\n",
    "model, normalizer = cifar_loader.load_pretrained_cifar_resnet(flavor=20, return_normalizer=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now let's build the attack parameters: an object that contains all the information to perform an attack on a minibatch. So first let's build an attack object and then furnish it with the necessary kwargs. \n",
    "\n",
    "Like in tutorial 1, to create an attack object, we'll need to create a threat model and a loss function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "delta_threat = ap.ThreatModel(ap.DeltaAddition, \n",
    "                              {'lp_style': 'inf', \n",
    "                               'lp_bound': 8.0 / 255})\n",
    "attack_loss = plf.VanillaXentropy(model, normalizer)\n",
    "attack_object = aa.FGSM(model, normalizer, delta_threat, attack_loss)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then we build the `AttackParameters` object, which just wraps the attack object with the kwargs needed to call the `attack(...)` method on attack. For FGSM attacks, we just want to turn the verbosity off, but for more complicated attacks, this will be more involved. Typically in training, we generate a single adversarial example per training point, but to be speedy here, let's only create 1 example per every 5 training points.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "attack_kwargs = {'verbose': False} # kwargs to be called in attack_object.attack(...)\n",
    "attack_params = advtrain.AdversarialAttackParameters(attack_object, proportion_attacked=0.2, \n",
    "                                                     attack_specific_params={'attack_kwargs': attack_kwargs})\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With our attack parameters built, we can build the object that handles training for us: this is instatiated with knowledge of the classifier, normalizer and some identifying features such as the *name* of the experiment and architecture. It's worthwhile to be informative with these so you keep which attacks this model is trained against straight."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment_name = 'tutorial_fgsm'\n",
    "architecture = 'resnet20'\n",
    "training_obj = advtrain.AdversarialTraining(model, normalizer, experiment_name, architecture)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you start training though, you'll need to furnish the trainer with some extra arguments:\n",
    "    - the data loader \n",
    "    - the number of epochs to train for \n",
    "    - a loss function (not one of the `mister_ed` custom loss functions though!)\n",
    "    - which optimizer to use (defaults to Adam with decent hyperparams)\n",
    "    - the attack parameters object \n",
    "    - whether or not to use the gpu (defaults to not using GPU)\n",
    "    - the verbosity level (ranging from ['low', 'medium', 'high', 'snoop'] (defaults to 'medium')\n",
    "    - whether or not to save the generated adversarial examples as images (defaults to false)\n",
    "    \n",
    "To be cute, we'll just train for two epochs so you get the picture. Also note that unless the verbosity is set to `low`, a checkpoint will be saved after every epoch. By default, these checkpoints are named like `<experiment_name>.<architecture_name>.<epoch>.path.tar`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_loss = nn.CrossEntropyLoss() # just use standard XEntropy to train\n",
    "training_logger = training_obj.train(cifar_trainset, 2, train_loss, \n",
    "                                     attack_parameters=attack_params, \n",
    "                                     verbosity='snoop', loglevel='snoop',\n",
    "                                     test_percentage=0.2) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The printouts look like:\n",
    "``` \n",
    "[epoch_no, minibatch_no] accuracy: (X, Y) \n",
    "[epoch_no, minibatch_no] loss: Z\n",
    "```\n",
    "\n",
    "- X is the percent of successfully classified *adversarial* examples generated from that minibatch only\n",
    "- Y is the percent of successfully classified *original* examples on that minibatch only\n",
    "- Z is the value of the loss function after that minibatch\n",
    "\n",
    "The output of training is a `TrainingLogger` class that stores essentially what was printed, and is separately controlled by its own `loglevel` argument. This can be accessed (and just to be safe, sorted using `sort_series`) and plotted as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot training loss\n",
    "import matplotlib.pyplot as plt \n",
    "plt.plot(training_logger.sort_series('training_loss', return_keys=False))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot adversarial accuracy as we train \n",
    "# This should be noisy, but generally going UP\n",
    "plt.plot([_[0] for _ in training_logger.sort_series('attack', return_keys=False)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once training completes, you can verify that the checkpoints are indeed stored in wherever you have set up pretrained models to be stored. By default this is `mister_ed/pretrained_models/`, so you should have a `tutorial_fgsm.resnet20.000002.path.tar` file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Restarting from Checkpoint\n",
    "When training, sometimes @#\\$& happens and things break. This is why we checkpoint. Here we'll show how to restart from checkpoint in training:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose we want to pick back up from where we left off with the experiment/architecture pair defined above `(tutorial_fgsm, resnet20)`. Then we want to do the following steps:\n",
    "\n",
    "1. Instantiate a model of the same architecture (weights don't matter, since we'll load from the checkpoint) \n",
    "2. Build an `AdversarialTraining` object using this model, its normalizer, and the same experiment name, architecture name \n",
    "3. Build a loss function, attack_parameters object, and all other identical kwargs from the first (aborted) training run \n",
    "4. Run the training using the training object's `train_from_checkpoint` method instead of `train`. All the kwargs are the same "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "naive_model, normalizer = cifar_loader.load_pretrained_cifar_resnet(flavor=20, return_normalizer=True)\n",
    "new_train_obj = advtrain.AdversarialTraining(naive_model, normalizer, experiment_name, architecture)\n",
    "\n",
    "delta_threat = ap.ThreatModel(ap.DeltaAddition, \n",
    "                              {'lp_style': 'inf', \n",
    "                               'lp_bound': 8.0 / 255})\n",
    "attack_loss = plf.VanillaXentropy(naive_model, normalizer)\n",
    "attack_object = aa.FGSM(naive_model, normalizer, delta_threat, attack_loss)\n",
    "attack_kwargs = {'verbose': False} # kwargs to be called in attack_object.attack(...)\n",
    "attack_params = advtrain.AdversarialAttackParameters(attack_object, proportion_attacked=0.2, \n",
    "                                                     attack_specific_params={'attack_kwargs': attack_kwargs})\n",
    "\n",
    "new_train_obj.train_from_checkpoint(cifar_trainset, 4, train_loss, attack_parameters=attack_params, \n",
    "                                    verbosity='high')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once this finishes, notice that you should now have a file \n",
    "`tutorial_fgsm.resnet20.000004.path.tar` in your pretrained_models directory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using the training script \n",
    "Using an ipython notebook isn't typically ideal for training, since it mandates you keep your browser window open. To this end, we've built a script to perform adversarial training in a tmux/screen background. This is located in `scripts/advtrain.py`. Here's what we've found are best practices for doing this:\n",
    "\n",
    "- Copy `scripts/advtrain.py` into `scripts/advtrain_<DESCRIPTIVE_EXPERIMENT_NAME>.py`\n",
    "- Modify the `build_attack_params` method in `scripts/advtrain_<DESCRIPTIVE_EXPERIMENT_NAME>.py` to use the attack parameters that you want. There's plenty of prebuilt attack parameters in that file to choose from. \n",
    "- In a tmux/screen, from `mister_ed`, run \n",
    "\n",
    "```python -m scripts.advtrain_DESCRIPTIVE_EXPERIMENT_NAME --exp <DESCRIPTIVE_EXPERIMENT_NAME> --arch <ARCHITECTURE_CHOICE> --verbosity [snoop/high/medium]```\n",
    "\n",
    "- To resume, you can optionally add the `-r` or `--resume` flag to the script call\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A note on GPU Usage\n",
    "If you have access to a GPU on your machine, you'll probably want to leverage its power when doing training and attacks. `mister_ed` has been designed so \"standard\" GPU behavior should be supported without any extra effort. By \"standard\" GPU behavior, I mean that all objects reside on the same device: either all on the GPU or none on the GPU. If there is a GPU on your machine, which one can check from the output of \n",
    "```\n",
    "import torch.cuda as cuda \n",
    "print(cuda.is_available()) \n",
    "```\n",
    "Globally, unless otherwise specified, all objects will be initialized in GPU-mode if this output is `True`. This is done behind-the-scenes by setting the environment variable `MISTER_ED_GPU`. If you have access to a GPU, but wouldn't like to use it, you can manually override this environment variable by calling:\n",
    "```\n",
    "import utils.pytorch_utils as utils \n",
    "utils.set_global_gpu(False)\n",
    "```\n",
    "And then none of your objects will be in GPU-mode by default. \n",
    "\n",
    "For nonstandard GPU behavior, you should initialize any object that differs from the default gpu status (as defined by `MISTER_ED_GPU`) with the kwarg `manual_gpu=<True/False>`\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
