{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to `mister_ed`\n",
    "Welcome to tutorial #1 for `mister_ed`. This file contains a brief overview of the contents of `mister_ed` and will get you started on creating your first set of adversarial examples. \n",
    "## Contents\n",
    "- Setup and installation \n",
    "- Repository Overview \n",
    "- Building your first adversarial examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup and Installation \n",
    "First let's make sure that you can import everything you need:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# EXTERNAL LIBRARY IMPORTS\n",
    "\n",
    "import numpy as np \n",
    "import scipy \n",
    "\n",
    "import torch # Need torch version >= 0.3 or 0.4\n",
    "import torch.nn as nn \n",
    "import torch.optim as optim \n",
    "assert float(torch.__version__[:3]) >= 0.3\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# MISTER ED SPECIFIC IMPORT BLOCK\n",
    "# (here we do things so relative imports work )\n",
    "# Universal import block \n",
    "# Block to get the relative imports working \n",
    "import os\n",
    "import sys \n",
    "module_path = os.path.abspath(os.path.join('..'))\n",
    "if module_path not in sys.path:\n",
    "    sys.path.append(module_path)\n",
    "\n",
    "\n",
    "import config\n",
    "import prebuilt_loss_functions as plf\n",
    "import loss_functions as lf \n",
    "import utils.pytorch_utils as utils\n",
    "import utils.image_utils as img_utils\n",
    "import cifar10.cifar_loader as cifar_loader\n",
    "import cifar10.cifar_resnets as cifar_resnets\n",
    "import adversarial_training as advtrain\n",
    "import adversarial_evaluation as adveval\n",
    "import utils.checkpoints as checkpoints\n",
    "import adversarial_perturbations as ap \n",
    "import adversarial_attacks as aa\n",
    "import spatial_transformers as st"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's make sure that you have CIFAR-10 data loaded and a pretrained classifier running. If the following block fails, then make sure you've run the setup script: from the `mister_ed` directory, run \n",
    "\n",
    "``` python scripts/setup_cifar.py```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Quick check to ensure cifar 10 data and pretrained classifiers are loaded \n",
    "cifar_valset = cifar_loader.load_cifar_data('val')\n",
    "model, normalizer = cifar_loader.load_pretrained_cifar_resnet(flavor=32, return_normalizer=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Repository Overview\n",
    "Now that we've verified that everything is set up appropriately, we can run through a quick tour of what this repository contains.\n",
    "\n",
    "`mister_ed` is designed to be a one-stop-shop for pytorch adversarial examples, so it's quite general and easily extensible. So as to not be intimidating, there's prebuilt functions to try things out of the box, but for now let's just go over the main components\n",
    "- **Classifiers and Normalizers**: Out of the box, `mister_ed` comes equipped to work on CIFAR-10 and Imagenet classifiers. Code to load the datasets and various pretrained classifiers are contained in the `cifar10` and `imagenet` directories, respectively. Important to note that we use objects called **Normalizers** that come with the classifier and are used to transform images from the [0,1] domain into a mean-zero, unit-variance tensor. \n",
    "\n",
    "\n",
    "- **Adversarial Perturbations**: We view adversarial examples as an object that takes in original images (as NxCxHxW tensors) and outputs the adversarial images. These are defined in `adversarial_perturbations.py`. In general these contain parameters that are optimized by the adversarial attack methods to maximize classifier loss. We'll build an AdversarialPerturbation object for each minibatch that we attack. These have several components:\n",
    "    - **Threat Model**: A threat model is a factory class to build adversarial perturbations that has the hard-constraints of our perturbation explicitly defined. For example, if we wish to create adversarial perturbations that can add noise to an image with L_inf bound of 8.0/255, then we can create a ThreatModel that says we can only transform our image by an additive delta with L_inf bound of 8.0/255:        \n",
    "    ``` threat_model = ap.ThreatModel(ap.DeltaAddition, {'lp_style': 'inf', 'lp_bound': 8.0/255}) ```\n",
    "        \n",
    "    - **AdversarialPerturbation**: Each **subclass** of this class serves as an object that stores parameters to convert a single minibatch of real images into adversarial examples. These fall into three main classes: `DeltaAddition`, which allows for additive noise; `ParameterizedXformAdv`, which allows for various classes of 'spatial transformations', and `SequentialPerturbation`, which allows for arbitrary sequential combinations of the former two. \n",
    "- **Adverarial Attacks**: These are classes contained in the `adversarial_attacks_refactor.py` file. Instances are meant to be initialized with knowledge of the classifier (and its associated normalizer) we're attacking, a loss function to be maximized, and a *threat_model* (which is used to instatiate a perturbation per minibatch). Then a single minibatch can be passed as an argument to the `.attack(...)` method which returns a perturbation object which has parameters optimized to make the given minibatch adversarial.\n",
    "- **Loss Functions**: Standard pyTorch loss functions typically don't have access to the classifier itself and typically don't take the input images as arguments. We allow for easy creation of new loss functions as well as the addition of regularization termms. The building blocks for loss functions are contained in `loss_functions.py`, with commonly used loss functions contained in `prebuilt_loss_functions.py`\n",
    "- **Adversarial Training**: Adversarial training is one of the most common defenses against adversarial attacks and our goal is to make performing this as easy as possible under our framework. Code to perform adversarial training is contained in `adversarial_training.py` and works like this: to train a network against adversarial examples, you can instantiate an `AdversarialTraining` object and training it with a provided attack. In order to store the parameters for generating an attack to be used in training, wrap an `AdversarialAttack` object in an `AdversarialAttackParameters` object.\n",
    "- **Evaluating Attacks/Defenses**: To evaluate how good a suite of attacks is against a particular network, it's worthwhile to have tools to do this quickly and easily. `adversarial_evaluation.py` contains classes that allow easy creation of scripts to evaluate attacks and defenses.\n",
    "- **Spatial Transformations**: The major contribution of `mister_ed` is that it allows for easy extension of existing attacks. A growing field of work is occurring in using spatial transformations, such as rotations/translations and flow networks, to generate adversarial attacks. The building blocks for these types of transformations are contained in `spatial_transformers.py`.\n",
    "- **Notebooks/Scripts**: In general, we'll try to keep the notebooks and scripts out of the github repo, but we've left a few in here for convenience. Each of these live in their separate directories; scripts tend to be more useful for training while notebooks are really good at building examples and evaluating various attacks.\n",
    "\n",
    "\n",
    " ** *If all this seems like a lot to understand at first, don't worry! We'll go over how to build adversarial attacks, train a defended network and evaluate various attacks against this network in the tutorials.* **"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  Building your first adversarial examples!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The rest of this tutorial will demonstrate how to generate a single minibatch worth of adversarial examples on CIFAR-10.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To set up, let's start by collecting a minibatch worth of data and loading up our classifier to attack."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cifar_valset = cifar_loader.load_cifar_data('val', batch_size=16)\n",
    "examples, labels = next(iter(cifar_valset))\n",
    "\n",
    "\n",
    "model, normalizer = cifar_loader.load_pretrained_cifar_resnet(flavor=32, return_normalizer=True)\n",
    "\n",
    "if utils.use_gpu():\n",
    "    examples = examples.cuda()\n",
    "    labels = labels.cuda() \n",
    "    model.cuda()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look at what our original images look like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_utils.show_images(examples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## FGSM L_infinity attack\n",
    "Now let's attack all of these examples with an FGSM attack with a threat model that allows addition of adversarial noise of magnitude up to 8.0 (the standard bound for the [Madry Challenge](https://github.com/MadryLab/cifar10_challenge))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "Everything is in [0,1] range, so lp_bound of 8 pixels is 8.0 / 255!\n",
    "Any perturbation object generated from this threat model will \n",
    "automatically constrain the l_infinity bound to be <=8\n",
    "'''\n",
    "\n",
    "delta_threat = ap.ThreatModel(ap.DeltaAddition, {'lp_style': 'inf', \n",
    "                                                 'lp_bound': 8.0 / 255}) \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then we can build an attack object that knows about the classifier, normalizer, threat model, and a loss function to maximize. We'll use standard CrossEntropy for now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "attack_loss = plf.VanillaXentropy(model, normalizer)\n",
    "fgsm_attack_object = aa.FGSM(model, normalizer, delta_threat, attack_loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then we can perform the attack by calling `.attack(...)` on the attack object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "perturbation_out = fgsm_attack_object.attack(examples, labels, verbose=True) # Verbose prints out accuracy \n",
    "assert isinstance(perturbation_out, ap.DeltaAddition) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then we can examine what the tensors look like by operating on the returned perturbation object..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display in three rows: adversarial examples, originals, and differences magnified\n",
    "adv_examples = perturbation_out.adversarial_tensors()\n",
    "print(attack_loss.forward(adv_examples, labels, output_per_example=True))\n",
    "originals = perturbation_out.originals \n",
    "differences = ((adv_examples - originals) * 5 + 0.5).clamp(0, 1)\n",
    "\n",
    "\n",
    "img_utils.show_images([adv_examples, originals, differences])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or suppose we only want to look at just the 'successful' attacks: (in that the perturbation causes the classifier to change output)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "successful = perturbation_out.collect_successful(model, normalizer)\n",
    "successful_advs = successful['adversarials']\n",
    "successful_origs = successful['originals']\n",
    "successful_diffs = ((successful_advs - successful_origs) * 5 + 0.5).clamp(0, 1)\n",
    "\n",
    "img_utils.show_images([successful_advs, successful_origs, successful_diffs])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PGD StAdv attack\n",
    "Here we give an example of an [StAdv](https://openreview.net/forum?id=HyydRMZC- ) attack that aims to smoothly deform the original images to fool the classifier. This is a slightly more complicated example, but it's okay! We'll do it in parts.\n",
    "\n",
    "Like last time, the first thing to do is define a threat model:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "We need to make a 'maximum-allowable flow' in our threat model.\n",
    "L_infinity for flow networks means that pixels can adopt up to some amount of their neighboring pixels' values.\n",
    "For example, on a 32x32 image, 0.3/64 means a pixel can adopt 30% of the adjacent pixel's values.\n",
    "\n",
    "The xform_class means we allow for a fully parameterized spatial transformer to determine the flow, and \n",
    "use_stadv means the 'norm' of perturbations uses the TV-norm as described in the stAdv paper\n",
    "'''\n",
    "flow_threat = ap.ThreatModel(ap.ParameterizedXformAdv, \n",
    "                             {'xform_class': st.FullSpatial, \n",
    "                              'lp_style': 'inf',\n",
    "                              'lp_bound': 0.3 / 64,\n",
    "                              'use_stadv': True})\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now we need to create a loss function: from the stAdv paper, the loss function used is \n",
    "$$\\mathcal{L}(x, x_{adv}) = \\mathcal{L}_{adv}(x, flow) + \\tau \\mathcal{L}_{flow}(x, flow)$$\n",
    "\n",
    "where $$\\mathcal{L}_{adv}$$ is the standard Carlini Wagner loss\n",
    "$$\\mathcal{L}_{adv}(x, flow) = \\max(\\max_{i\\neq t} g(x_{adv})_i - g(x_{adv})_t, 0)$$ \n",
    "and \n",
    "$\\tau=0.05$ and $\\mathcal{L}_{flow}$ is the total-variation flow-loss.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make the carlini wagner loss first \n",
    "cw_loss = lf.CWLossF6(model, normalizer)\n",
    "\n",
    "# And then the spatial flow, which is just the 'norm' of the flow perturbation \n",
    "flow_loss = lf.PerturbationNormLoss(lp=2)\n",
    "\n",
    "# And then combine them with a RegularizedLossObject \n",
    "flow_attack_loss = lf.RegularizedLoss({'cw': cw_loss, 'flow': flow_loss}, \n",
    "                                      {'cw': 1.0,     'flow': 0.05},\n",
    "                                      negate=True) # Need this true for PGD type attacks\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can build the PGD attack object and run the attack"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pgd_attack_obj = aa.PGD(model, normalizer, flow_threat, flow_attack_loss)\n",
    "flow_perturbation = pgd_attack_obj.attack(examples, labels, num_iterations=20, signed=False, \n",
    "                                          optimizer=optim.Adam, optimizer_kwargs={'lr': 0.001}, verbose=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then we can examine the outputs as usual:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display in three rows: adversarial examples, originals, and differences magnified\n",
    "adv_examples = flow_perturbation.adversarial_tensors()\n",
    "originals = flow_perturbation.originals \n",
    "differences = ((adv_examples - originals) * 5 + 0.5).clamp(0, 1)\n",
    "\n",
    "\n",
    "img_utils.show_images([adv_examples, originals, differences])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "successful = flow_perturbation.collect_successful(model, normalizer)\n",
    "successful_advs = successful['adversarials']\n",
    "successful_origs = successful['originals']\n",
    "successful_diffs = ((successful_advs - successful_origs) * 5 + 0.5).clamp(0, 1)\n",
    "\n",
    "img_utils.show_images([successful_advs, successful_origs, successful_diffs])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sequentially Combining Attacks\n",
    "Now that we've seen how to build a DeltaAddition and an StAdv attack, we can combine these two doing one and then the other, and optimizing them at the same time. So the output of this new attack would be \n",
    "$$x_{adv} = flow(x) + \\delta $$\n",
    "\n",
    "Again, let's start by first defining our threat model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "''' \n",
    "Sequential threats are instatiated from a list of already-existing threats, \n",
    "so we'll just reuse our threats that we previously made. \n",
    "\n",
    "Ultimately we'll keep our loss function as Loss_CW + Loss_Flow, so if we only want to penalize flows, \n",
    "let's add a third parameter that says the 'norm' of instances of this threat only considers the flow norm.\n",
    "'''\n",
    "sequential_threat = ap.ThreatModel(ap.SequentialPerturbation, \n",
    "                                   [flow_threat, delta_threat],\n",
    "                                   ap.PerturbationParameters(norm_weights=[1.0, 0.0]))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then we can make a loss function, which will be the same as the loss used in our flow attack "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make the carlini wagner loss  \n",
    "cw_loss = lf.CWLossF6(model, normalizer)\n",
    "\n",
    "# And then the perturbation norm, which is just the 'norm' of only the flow perturbation\n",
    "perturbation_loss = lf.PerturbationNormLoss(lp=2)\n",
    "\n",
    "# And then combine them with a RegularizedLossObject \n",
    "seq_attack_loss = lf.RegularizedLoss({'cw': cw_loss, 'pert': flow_loss}, \n",
    "                                     {'cw': 1.0,     'pert': 0.05},\n",
    "                                     negate=True) # Need this true for PGD type attacks\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "seq_pgd_attack_obj = aa.PGD(model, normalizer, sequential_threat, seq_attack_loss)\n",
    "seq_perturbation = seq_pgd_attack_obj.attack(examples, labels, num_iterations=20, signed=False, \n",
    "                                             optimizer=optim.Adam, optimizer_kwargs={'lr': 0.001}, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then we examine the outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display in three rows: adversarial examples, originals, and differences magnified\n",
    "adv_examples = seq_perturbation.adversarial_tensors()\n",
    "originals = seq_perturbation.originals \n",
    "differences = ((adv_examples - originals) * 5 + 0.5).clamp(0, 1)\n",
    "\n",
    "\n",
    "img_utils.show_images([adv_examples, originals, differences])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "successful = seq_out.collect_successful(model, normalizer)\n",
    "successful_advs = successful['adversarials']\n",
    "successful_origs = successful['originals']\n",
    "successful_diffs = ((successful_advs - successful_origs) * 5 + 0.5).clamp(0, 1)\n",
    "\n",
    "img_utils.show_images([successful_advs, successful_origs, successful_diffs])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And that's it for making a variety of perturbations! Feel free to play around with various l_infinity bounds, or various spatial transformations (such as rotations or translations). In the next tutorial we'll go over how to perform adversarial training"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
