{"title": "Adversarial training for free!", "book": "Advances in Neural Information Processing Systems", "page_first": 3358, "page_last": 3369, "abstract": "Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters. Our \"free\" adversarial training algorithm achieves comparable robustness to PGD adversarial training on the CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40% accuracy against PGD attacks.", "full_text": "Adversarial Training for Free!\n\nAli Shafahi\n\nUniversity of Maryland\nashafahi@cs.umd.edu\n\nMahyar Najibi\n\nUniversity of Maryland\nnajibi@cs.umd.edu\n\nAmin Ghiasi\n\nUniversity of Maryland\n\namin@cs.umd.edu\n\nZheng Xu\n\nUniversity of Maryland\n\nxuzh@cs.umd.edu\n\nJohn Dickerson\n\nUniversity of Maryland\n\njohn@cs.umd.edu\n\nChristoph Studer\nCornell University\n\nstuder@cornell.edu\n\nLarry S. Davis\n\nUniversity of Maryland\nlsd@umiacs.umd.edu\n\nGavin Taylor\n\nUnited States Naval Academy\n\ntaylor@usna.edu\n\nTom Goldstein\n\nUniversity of Maryland\n\ntomg@cs.umd.edu\n\nAbstract\n\nAdversarial training, in which a network is trained on adversarial examples, is one\nof the few defenses against adversarial attacks that withstands strong attacks. Un-\nfortunately, the high cost of generating strong adversarial examples makes standard\nadversarial training impractical on large-scale problems like ImageNet. We present\nan algorithm that eliminates the overhead cost of generating adversarial examples\nby recycling the gradient information computed when updating model parameters.\nOur \u201cfree\u201d adversarial training algorithm achieves comparable robustness to PGD\nadversarial training on the CIFAR-10 and CIFAR-100 datasets at negligible addi-\ntional cost compared to natural training, and can be 7 to 30 times faster than other\nstrong adversarial training methods. Using a single workstation with 4 P100 GPUs\nand 2 days of runtime, we can train a robust model for the large-scale ImageNet\nclassi\ufb01cation task that maintains 40% accuracy against PGD attacks.\n\n1\n\nIntroduction\n\nDeep learning has been widely applied to various computer vision tasks with excellent performance.\nPrior to the realization of the adversarial example phenomenon by Biggio et al. [2013], Szegedy et al.\n[2013], model performance on clean examples was the the main evaluation criteria. However, in\nsecurity-critical applications, robustness to adversarial attacks has emerged as a critical factor.\nA robust classi\ufb01er is one that correctly labels adversarially perturbed images. Alternatively, robustness\nmay be achieved by detecting and rejecting adversarial examples [Ma et al., 2018, Meng and Chen,\n2017, Xu et al., 2017]. Recently, Athalye et al. [2018] broke a complete suite of allegedly robust\ndefenses, leaving adversarial training, in which the defender augments each minibatch of training\ndata with adversarial examples [Madry et al., 2017], among the few that remain resistant to attacks.\nAdversarial training is time-consuming\u2014in addition to the gradient computation needed to update\nthe network parameters, each stochastic gradient descent (SGD) iteration requires multiple gradient\ncomputations to produce adversarial images. In fact, it takes 3-30 times longer to form a robust\nnetwork with adversarial training than forming a non-robust equivalent. Put simply, the actual\nslowdown factor depends on the number of gradient steps used for adversarial example generation.\nThe high cost of adversarial training has motivated a number of alternatives. Some recent works\nreplace the perturbation generation in adversarial training with a parameterized generator network\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f[Baluja and Fischer, 2018, Poursaeed et al., 2018, Xiao et al., 2018]. This approach is slower than\nstandard training, and problematic on complex datasets, such as ImageNet, for which it is hard to\nproduce highly expressive GANs that cover the entire image space. Another popular defense strategy\nis to regularize the training loss using label smoothing, logit squeezing, or a Jacobian regularization\n[Shafahi et al., 2019a, Mosbach et al., 2018, Ross and Doshi-Velez, 2018, Hein and Andriushchenko,\n2017, Jakubovitz and Giryes, 2018, Yu et al., 2018]. These methods have not been applied to\nlarge-scale problems, such as ImageNet, and can be applied in parallel to adversarial training.\nRecently, there has been a surge of certi\ufb01ed defenses [Wong and Kolter, 2017, Wong et al., 2018,\nRaghunathan et al., 2018a,b, Wang et al., 2018]. These methods were mostly demonstrated for\nsmall networks, low-res datasets, and relatively small perturbation budgets (\u0001). Lecuyer et al. [2018]\npropose randomized smoothing as a certi\ufb01ed defense and which was later improved by Li et al.\n[2018a]. Cohen et al. [2019] prove a tight robustness guarantee under the (cid:96)2 norm for smoothing\nwith Gaussian noise. Their study was the \ufb01rst certi\ufb01able defense for the ImageNet dataset [Deng\net al., 2009]. They claim to achieve 12% robustness against non-targeted attacks that are within an\n(cid:96)2 radius of 3 (for images with pixels in [0, 1]). This is roughly equivalent to an (cid:96)\u221e radius of \u0001 = 2\nwhen pixels lie in [0, 255] .\nAdversarial training remains among the most trusted defenses, but it is nearly intractable on large-\nscale problems. Adversarial training on high-resolution datasets, including ImageNet, has only been\nwithin reach for research labs having hundreds of GPUs1. Even on reasonably-sized datasets, such as\nCIFAR-10 and CIFAR-100, adversarial training is time consuming and can take multiple days.\n\nContributions\n\nWe propose a fast adversarial training algorithm that produces robust models with almost no extra\ncost relative to natural training. The key idea is to update both the model parameters and image\nperturbations using one simultaneous backward pass, rather than using separate gradient computations\nfor each update step. Our proposed method has the same computational cost as conventional natural\ntraining, and can be 3-30 times faster than previous adversarial training methods [Madry et al.,\n2017, Xie et al., 2019]. Our robust models trained on CIFAR-10 and CIFAR-100 achieve accuracies\ncomparable and even slightly exceeding models trained with conventional adversarial training when\ndefending against strong PGD attacks.\nWe can apply our algorithm to the large-scale ImageNet classi\ufb01cation task on a single workstation\nwith four P100 GPUs in about two days, achieving 40% accuracy against non-targeted PGD attacks.\nTo the best of our knowledge, our method is the \ufb01rst to successfully train a robust model for ImageNet\nbased on the non-targeted formulation and achieves results competitive with previous (signi\ufb01cantly\nmore complex) methods [Kannan et al., 2018, Xie et al., 2019].\n\n2 Non-targeted adversarial examples\n\nAdversarial examples come in two \ufb02avors: non-targeted and targeted. Given a \ufb01xed classi\ufb01er with\nparameters \u03b8, an image x with true label y, and classi\ufb01cation proxy loss l, a bounded non-targeted\nattack sneaks an example out of its natural class and into another. This is done by solving\n\nmax\n\n\u03b4\n\nsubject to ||\u03b4||p \u2264 \u0001,\n\nl(x + \u03b4, y, \u03b8),\n\n(1)\nwhere \u03b4 is the adversarial perturbation, ||.||p is some (cid:96)p-norm distance metric, and \u0001 is the adversarial\nmanipulation budget. In contrast to non-targeted attacks, a targeted attack scooches an image into a\nspeci\ufb01c class of the attacker\u2019s choice.\nIn what follows, we will use non-targeted adversarial examples both for evaluating the robustness of\nour models and also for adversarial training. We brie\ufb02y review some of the closely related methods\nfor generating adversarial examples. In the context of (cid:96)\u221e-bounded attacks, the Fast Gradient Sign\nMethod (FGSM) by Goodfellow et al. [2015] is one of the most popular non-targeted methods that\nuses the sign of the gradients to construct an adversarial example in one iteration:\n\nxadv = x + \u0001 \u00b7 sign(\u2207xl(x, y, \u03b8)).\n\n(2)\n\n1Xie et al. [2019] use 128 V100s and Kannan et al. [2018] use 53 P100s for targeted adv training ImageNet.\n\n2\n\n\fThe Basic Iterative Method (BIM) by Kurakin et al. [2016a] is an iterative version of FGSM. The\nPGD attack is a variant of BIM with uniform random noise as initialization, which is recognized by\nAthalye et al. [2018] to be one of the most powerful \ufb01rst-order attacks. The initial random noise was\n\ufb01rst studied by Tram\u00e8r et al. [2017] to enable FGSM to attack models that rely on \u201cgradient masking.\u201d\nIn the PGD attack algorithm, the number of iterations K plays an important role in the strength\nof attacks, and also the computation time for generating adversarial examples. In each iteration, a\ncomplete forward and backward pass is needed to compute the gradient of the loss with respect to the\nimage. Throughout this paper we will refer to a K-step PGD attack as PGD-K.\n\n3 Adversarial training\n\nAdversarial training can be traced back to [Goodfellow et al., 2015], in which models were hardened\nby producing adversarial examples and injecting them into training data. The robustness achieved\nby adversarial training depends on the strength of the adversarial examples used. Training on fast\nnon-iterative attacks such as FGSM and Rand+FGSM only results in robustness against non-iterative\nattacks, and not against PGD attacks [Kurakin et al., 2016b, Madry et al., 2017]. Consequently, Madry\net al. [2017] propose training on multi-step PGD adversaries, achieving state-of-the-art robustness\nlevels against (cid:96)\u221e attacks on MNIST and CIFAR-10 datasets.\nWhile many defenses were broken by Athalye et al. [2018], PGD-based adversarial training was\namong the few that withstood strong attacks. Many other defenses build on PGD adversarial training\nor leverage PGD adversarial generation during training. Examples include Adversarial Logit Pairing\n(ALP) [Kannan et al., 2018], Feature Denoising [Xie et al., 2019], Defensive Quantization [Lin et al.,\n2019], Thermometer Encoding [Buckman et al., 2018], PixelDefend [Song et al., 2017], Robust\nManifold Defense [Ilyas et al., 2017], L2-nonexpansive nets [Qian and Wegman, 2018], Jacobian\nRegularization [Jakubovitz and Giryes, 2018], Universal Perturbation [Shafahi et al., 2018], and\nStochastic Activation Pruning [Dhillon et al., 2018].\nWe focus on the min-max formulation of adversarial training [Madry et al., 2017], which has been\ntheoretically and empirically justi\ufb01ed. This widely used K-PGD adversarial training algorithm has\nan inner loop that constructs adversarial examples by PGD-K, while the outer loop updates the model\nusing minibatch SGD on the generated examples. In the inner loop, the gradient \u2207xl(xadv, y, \u03b8) for\nupdating adversarial examples requires a forward-backward pass of the entire network, which has\nsimilar computation cost as calculating the gradient \u2207\u03b8 l(xadv, y, \u03b8) for updating network parameters.\nCompared to natural training, which only requires \u2207\u03b8 l(x, y, \u03b8) and does not have an inner loop,\nK-PGD adversarial training needs roughly K + 1 times more computation.\n\n4 \u201cFree\u201d adversarial training\n\nK-PGD adversarial training [Madry et al., 2017] is generally slow. For example, the 7-PGD training\nof a WideResNet [Zagoruyko and Komodakis, 2016] on CIFAR-10 in Madry et al. [2017] takes about\nfour days on a Titan X GPU. To scale the algorithm to ImageNet, Xie et al. [2019] and Kannan et al.\n[2018] had to deploy large GPU clusters at a scale far beyond the reach of most organizations.\nHere, we propose free adversarial training, which has a negligible complexity overhead compared to\nnatural training. Our free adversarial training algorithm (alg. 1) computes the ascent step by re-using\nthe backward pass needed for the descent step. To update the network parameters, the current training\nminibatch is passed forward through the network. Then, the gradient with respect to the network\nparameters is computed on the backward pass. When the \u201cfree\u201d method is used, the gradient of the\nloss with respect to the input image is also computed on this same backward pass.\nUnfortunately, this approach does not allow for multiple adversarial updates to be made to the same\nimage without performing multiple backward passes. To overcome this restriction, we propose a\nminor yet nontrivial modi\ufb01cation to training: train on the same minibatch m times in a row. Note\nthat we divide the number of epochs by m such that the overall number of training iterations remains\nconstant. This strategy provides multiple adversarial updates to each training image, thus providing\nstrong/iterative adversarial examples. Finally, when a new minibatch is formed, the perturbation\ngenerated on the previous minibatch is used to warm-start the perturbation for the new minibatch.\n\n3\n\n\ffor i = 1 . . . m do\n\nUpdate \u03b8 with stochastic gradient descent\n\nAlgorithm 1 \u201cFree\u201d Adversarial Training (Free-m)\nRequire: Training samples X, perturbation bound \u0001, learning rate \u03c4, hop steps m\n1: Initialize \u03b8\n2: \u03b4 \u2190 0\n3: for epoch = 1 . . . Nep/m do\nfor minibatch B \u2282 X do\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15: end for\n\ng\u03b8 \u2190 E(x,y)\u2208B[\u2207\u03b8 l(x + \u03b4, y, \u03b8)]\ngadv \u2190 \u2207x l(x + \u03b4, y, \u03b8)]\n\u03b8 \u2190 \u03b8 \u2212 \u03c4 g\u03b8\n\u03b4 \u2190 \u03b4 + \u0001 \u00b7 sign(gadv)\n\u03b4 \u2190 clip(\u03b4,\u2212\u0001, \u0001)\n\nUse gradients calculated for the minimization step to update \u03b4\n\nend for\n\nend for\n\n(a) CIFAR-10 sensitivity to m\n\n(b) CIFAR-100 sensitivity to m\n\nFigure 1: Natural validation accuracy of Wide Resnet 32-10 models using varied mini-batch replay\nparameters m. Here m = 1 corresponds to natural training. For large m\u2019s, validation accuracy drops\ndrastically. However, small m\u2019s have little effect. For reference, CIFAR-10 and CIFAR-100 models\nthat are 7-PGD adversarially trained have natural accuracies of 87.25% and 59.87%, respectively.\n\nThe effect of mini-batch replay on natural training\n\nWhile the hope for alg. 1 is to build robust models, we still want models to perform well on natural\nexamples. As we increase m in alg. 1, there is risk of increasing generalization error. Furthermore,\nit may be possible that catastrophic forgetting happens. Consider the worst case where all the\n\u201cinformative\u201d images of one class are in the \ufb01rst few mini-batches. In this extreme case, we do not see\nuseful examples for most of the epoch, and forgetting may occur. Consequently, a natural question is:\nhow much does mini-batch replay hurt generalization?\nTo answer this question, we naturally train wide-resnet 32-10 models on CIFAR-10 and CIFAR-100\nusing different levels of replay. Fig. 1 plots clean validation accuracy as a function of the replay\nparameter m. We see some dropoff in accuracy for small values of m. Note that a small compromise\nin accuracy is acceptable given a large increase in robustness due to the fundamental tradeoffs\nbetween robustness and generalization [Tsipras et al., 2018, Zhang et al., 2019a, Shafahi et al.,\n2019b]. As a reference, CIFAR-10 and CIFAR-100 models that are 7-PGD adversarially trained\nhave natural accuracies of 87.25% and 59.87%, respectively. These same accuracies are exceeded by\nnatural training with m = 16. We see in section 5 that good robustness can be achieved using \u201cfree\u201d\nadversarial training with just m \u2264 10.\n\n4\n\n\fTable 1: Validation accuracy and robustness of CIFAR-10 models trained with various methods.\n\nTraining\n\nNatural\n\nFree m = 2\nFree m = 4\nFree m = 8\nFree m = 10\n7-PGD trained\n\nNat. Images\n\n95.01%\n91.45%\n87.83%\n85.96%\n83.94%\n87.25%\n\nEvaluated Against\n\nPGD-20\n\nPGD-100 CW-100\n\n10 restart\nPGD-20\n0.00%\n0.00%\n33.41%\n33.92%\n41.15%\n40.73%\n46.82% 46.19% 46.60% 46.33%\n46.31%\n45.94%\n45.53%\n45.84%\n\n0.00%\n33.20%\n40.35%\n\n0.00%\n34.57%\n41.96%\n\n45.79%\n45.29%\n\n45.86%\n46.52%\n\nTrain\nTime\n(min)\n780\n816\n800\n785\n785\n5418\n\nTable 2: Validation accuracy and robustness of CIFAR-100 models trained with various methods.\n\nEvaluated Against\n\nNatural Images\n\nTraining Time\n\n(minutes)\n\nTraining\n\nNatural\n\nFree m = 2\nFree m = 4\nFree m = 6\nFree m = 8\nFree m = 10\n\nMadry et al. (2-PGD trained)\nMadry et al. (7-PGD trained)\n\n78.84%\n69.20%\n65.28%\n64.87%\n62.13%\n59.27%\n67.94%\n59.87%\n\nPGD-100\n\nPGD-20\n0.00%\n0.00%\n14.86%\n15.37%\n20.15%\n20.64%\n23.68%\n23.18%\n25.88% 25.58%\n24.88%\n25.15%\n17.08%\n16.50%\n22.52%\n22.76%\n\n811\n816\n767\n791\n780\n776\n2053\n5157\n\n5 Robust models on CIFAR-10 and 100\n\nIn this section, we train robust models on CIFAR-10 and CIFAR-100 using our \u201cfree\u201d adversarial\ntraining ( alg. 1) and compare them to K-PGD adversarial training 23. We \ufb01nd that free training is\nable to achieve state-of-the-art robustness on the CIFARs without the overhead of standard PGD\ntraining.\n\nCIFAR-10\n\nWe train various CIFAR-10 models using the Wide-Resnet 32-10 model and standard hyper-\nparameters used by Madry et al. [2017]. In the proposed method (alg. 1), we repeat (i.e. replay) each\nminibatch m times before switching to the next minibatch. We present the experimental results for\nvarious choices of m in table 1. Training each of these models costs roughly the same as natural\ntraining since we preserve the same number of iterations. We compare with the 7-PGD adversarially\ntrained model from Madry et al. [2017] 4, whose training requires roughly 7\u00d7 more time than all of\nour free training variations. We attack all models using PGD attacks with K iterations on both the\ncross-entropy loss (PGD-K) and the Carlini-Wagner loss (CW-K) [Carlini and Wagner, 2017]. We\ntest using the PGD-20 attack following Madry et al. [2017], and also increase the number of attack\niterations and employ random restarts to verify robustness under stronger attacks. To measure the\nsensitivity of our method to initialization, we perform \ufb01ve trials for the Free-m = 8 case and \ufb01nd that\nour results are insensitive. The natural accuracy is 85.95\u00b10.14 and robustness against a 20-random\nrestart PGD-20 attack is 46.49\u00b10.19. Note that gradient free-attacks such as SPSA will result in\ninferior results for adversarially trained models in comparison to optimization based attacks such\nas PGD as noted by Uesato et al. [2018]. Gradient-free attacks are superior in settings where the\ndefense works by masking or obfuscating the gradients.\n\n2Adversarial Training for Free code for CIFAR-10 in Tensor\ufb02ow can be found here: https://github.\n\ncom/ashafahi/free_adv_train/\n\n3ImageNet Adversarial Training for Free code in Pytorch can be found here: https://github.com/\n\nmahyarnajibi/FreeAdversarialTraining\n\n4Results based on the \u201cadv_trained\u201d model in Madry\u2019s CIFAR-10 challenge repo.\n\n5\n\n\fplane\n\ncat\n\ndog\n\ncat\n\nship\n\ncat\n\ndog\n\ncar\n\nhorse\n\ncar\n\ndog\n\ncat\n\ndeer\n\ncat\n\nfrog\n\nbird\n\nfrog\n\ndog\n\ncar\n\ndog\n\ncat\n\ndeer\n\ncat\n\nhorse\n\ncat\n\nplane\n\ncar\n\nFigure 2: Attack images built for adversarially trained models look like the class into which they get\nmisclassi\ufb01ed. We display the last 9 CIFAR-10 clean validation images (top row) and their adversarial\nexamples built for a 7-PGD adversarially trained (middle) and our \u201cfree\u201d trained (bottom) models.\n\nOur \u201cfree training\u201d algorithm successfully reaches robustness levels comparable to a 7-PGD ad-\nversarially trained model. As we increase m, the robustness is increased at the cost of validation\naccuracy on natural images. Additionally note that we achieve reasonable robustness over a wide\nrange of choices of the main hyper-parameter of our model, 10 \u2265 m > 2, and the proposed method\nis signi\ufb01cantly faster than 7-PGD adversarial training. Recently, a new method called YOPO [Zhang\net al., 2019b] has been proposed for speeding up adversarial training, in their CIFAR-10 results they\nuse a wider networks (WRN-34-10) with larger batch-sizes (256). As shown in our supplementary,\nboth of these factors increase robustness. To do a direct comparison, we a train WRN-34-10 using\nm = 10 and batch-size=256. We match their best reported result (48.03% against PGD-20 attacks\nfor \u201cFree\u201d training v.s. 47.98% for YOPO 5-3).\n\nCIFAR-100\n\nWe also study the robustness results of \u201cfree training\u201d on CIFAR-100 which is a more dif\ufb01cult dataset\nwith more classes. As we will see in sec. 4, training with large m values on this dataset hurts the\nnatural validation accuracy more in comparison to CIFAR-10. This dataset is less studied in the\nadversarial machine learning community and therefore for comparison purposes, we adversarially\ntrain our own Wide ResNet 32-10 models for CIFAR-100. We train two robust models by varying K\nin the K-PGD adversarial training algorithm. One is trained on PGD-2 with a computational cost\nalmost 3\u00d7 that of free training, and the other is trained on PGD-7 with a computation time roughly\n7\u00d7 that of free training. We adopt the code for adversarial training from Madry et al. [2017], which\nproduces state-of-the-art robust models on CIFAR-10. We summarize the results in table. 2.\nWe see that \u201cfree training\u201d exceeds the accuracy on both natural images and adversarial images when\ncompared to traditional adversarial training. Similar to the effect of increasing m, increasing K in\nK-PGD adversarial training results in increased robustness at the cost of clean validation accuracy.\nHowever, unlike the proposed \u201cfree training\u201d where increasing m has no extra cost, increasing K for\nstandard K-PGD substantially increases training time.\n\n6 Does \u201cfree\u201d training behave like standard adversarial training?\n\nHere, we analyze two properties that are associated with PGD adversarially trained models: The\ninterpretability of their gradients and the \ufb02attness of their loss surface. We \ufb01nd that \u201cfree\u201d training\nenjoys these bene\ufb01ts as well.\n\nGenerative behavior for largely perturbed examples\n\nTsipras et al. [2018] observed that hardened classi\ufb01ers have interpretable gradients; adversarial\nexamples built for PGD trained models often look like the class into which they get misclassi\ufb01ed.\nFig. 2 plots \u201cweakly bounded\u201d adversarial examples for the CIFAR-10 7-PGD adversarially trained\nmodel [Madry et al., 2017] and our free m = 8 trained model. Both models were trained to resist (cid:96)\u221e\nattacks with \u0001 = 8. The examples are made using a 50 iteration BIM attack with \u0001 = 30 and \u0001s = 2.\n\n6\n\n\f(a) Free m = 8\n\n(b) 7-PGD adv trained\n\n(c) Free m = 8 both rad\n\n(d) 7-PGD adv trained both rad\n\nFigure 3: The loss surface of a 7-PGD adversarially trained model and our \u201cfree\u201d trained model for\nCIFAR-10 on the \ufb01rst 2 validation images. In (a) and (b) we display the cross-entropy loss projected\non one random (Rademacher) and one adversarial direction. In (c) and (d) we display the the cross\nentropy loss projected along two random directions. Both training methods behave similarly and do\nnot operate by masking the gradients as the adversarial direction is indeed the direction where the\ncross-entropy loss changes the most.\n\n\u201cFree training\u201d maintains generative properties, as our model\u2019s adversarial examples resemble the\ntarget class.\n\nSmooth and \ufb02attened loss surface\n\nAnother property of PGD adversarial training is that it \ufb02attens and smoothens the loss landscape.\nIn contrast, some defenses work by \u201cmasking\u201d the gradients, i.e., making it dif\ufb01cult to identify\nadversarial examples using gradient methods, even though adversarial examples remain present.\nReference Engstrom et al. [2018] argues that gradient masking adds little security. We show in \ufb01g. 3a\nthat free training does not operate by masking gradients using a rough loss surface. In \ufb01g. 3 we\nplot the cross-entropy loss projected along two directions in image space for the \ufb01rst few validation\nexamples of CIFAR-10 [Li et al., 2018b]. In addition to the loss of the free m = 8 model, we plot the\nloss of the 7-PGD adversarially trained model for comparison.\n\n7 Robust ImageNet classi\ufb01ers\n\nImageNet is a large image classi\ufb01cation dataset of over 1 million high-res images and 1000 classes\n(Russakovsky et al. [2015]). Due to the high computational cost of ImageNet training, only a\nfew research teams have been able to afford building robust models for this problem. Kurakin\net al. [2016b] \ufb01rst hardened ImageNet classi\ufb01ers by adversarial training with non-iterative attacks.5\nAdversarial training was done using a targeted FGSM attack. They found that while their model\nbecame robust against targeted non-iterative attacks, the targeted BIM attack completely broke it.\nLater, Kannan et al. [2018] attempted to train a robust model that withstands targeted PGD attacks.\nThey trained against 10 step PGD targeted attacks (a process that costs 11 times more than natural\ntraining) to build a benchmark model. They also generated PGD targeted attacks to train their\nadversarial logit paired (ALP) ImageNet model. Their baseline achieves a top-1 accuracy of 3.1%\nagainst PGD-20 targeted attacks with \u0001 = 16. Very recently, Xie et al. [2019] trained a robust\nImageNet model against targeted PGD-30 attacks, with a cost 31\u00d7 that of natural training. Training\nthis model required a distributed implementation on 128 GPUs with batch size 4096. Their robust\nResNet-101 model achieves a top-1 accuracy of 35.8% on targeted PGD attacks with many iterations.\n\n5Training using a non-iterative attack such as FGSM only doubles the training cost.\n\n7\n\n\fTable 3: ImageNet validation accuracy and robustness of ResNet-50 models trained with various\nreplay parameters and \u0001 = 2.\n\nEvaluated Against\n\nTraining\n\nNatural\n\nFree m = 2\nFree m = 4\nFree m = 6\nFree m = 8\n\nNatural Images\n\n76.038%\n71.210%\n64.446%\n60.642%\n58.116%\n\nPGD-50\n0.052%\n\nPGD-100\nPGD-10\n0.036%\n0.166%\n37.012% 36.340% 36.250%\n43.522% 43.392% 43.404%\n41.996% 41.900% 41.892%\n40.044% 40.008% 39.996%\n\n(a) Clean\n\n(b) PGD-100\n\nFigure 4: The effect of the perturbation bound \u0001 and the mini-batch replay hyper-parameter m on the\nrobustness achieved by free training.\n\nFree training results\n\nOur alg. 1 is designed for non-targeted adversarial training. As Athalye et al. [2018] state, defending\non this task is important and more challenging than defending against targeted attacks, and for this\nreason smaller \u0001 values are typically used. Even for \u0001 = 2 (the smallest \u0001 we consider defensing\nagainst), a PGD-50 non-targeted attack on a natural model achieves roughly 0.05% top-1 accuracy.\nTo put things further in perspective, Uesato et al. [2018] broke three defenses for \u0001 = 2 non-targeted\nattacks on ImageNet [Guo et al., 2017, Liao et al., 2018, Xie et al., 2017], degrading their performance\nbelow 1%. Our free training algorithm is able to achieve 43% robustness against PGD attacks bounded\nby \u0001 = 2. Furthermore, we ran each experiment on a single workstation with four P100 GPUs. Even\nwith this modest setup, training time for each ResNet-50 experiment is below 50 hours.\nWe summarize our results for various \u0001\u2019s and m\u2019s in table 3 and \ufb01g. 4. To craft attacks, we used a\nstep-size of 1 and the corresponding \u0001 used during training. In all experiments, the training batch\nsize was 256. Table 3 shows the robustness of Resnet-50 on ImageNet with \u0001 = 2. The validation\naccuracy for natural images decreases when we increase the minibatch replay m, just like it did for\nCIFAR in section 5.\nThe naturally trained model is vulnerable to PGD attacks (\ufb01rst row of table 3), while free training\nproduces robust models that achieve over 40% accuracy vs PGD attacks (m = 4, 6, 8 in table 3).\nAttacking the models using PGD-100 does not result in a meaningful drop in accuracy compared to\nPGD-50. Therefore, we did not experiment with increasing the number of PGD iterations further.\nFig. 4 summarizes experimental results for robust models trained and tested under different per-\nturbation bounds \u0001. Each curve represents one training method (natural training or free training)\nwith hyperparameter choice m. Each point on the curve represents the validation accuracy for an\n\u0001-bounded robust model. These results are also provided as tables in the appendix. The proposed\nmethod consistently improves the robust accuracy under PGD attacks for \u0001 = 2 \u2212 7, and m = 4\nperforms the best. It is dif\ufb01cult to train robust models when \u0001 is large, which is consistent with\nprevious studies showing that PGD-based adversarial training has limited robustness for ImageNet\n[Kannan et al., 2018].\n\n8\n\n\fTable 4: Validation accuracy and robustness of \u201cfree\u201d and 2-PGD trained ResNet-50 models \u2013 both\ntrained to resist (cid:96)\u221e \u0001 = 4 attacks. Note that 2-PGD training time is 3.46\u00d7 that of \u201cfree\u201d training.\n\nModel & Training\n\nNatural Images\n\nRN50 \u2013 Free m = 4\nRN50 \u2013 2-PGD trained\n\n60.206%\n64.134%\n\nEvaluated Against\nPGD-100\nPGD-10\n32.768% 31.878% 31.816%\n37.172% 36.352% 36.316%\n\nPGD-50\n\nTrain time\n(minutes)\n\n3016\n10,435\n\nTable 5: Validation accuracy and robustness of free-m = 4 trained ResNets with various capacities.\n\nArchitecture\n\nResNet-50\nResNet-101\nResNet-152\n\nNatural Images\n\n60.206%\n63.340%\n64.446%\n\nEvaluated Against\n\nPGD-50\n\nPGD-10\nPGD-100\n32.768% 31.878% 31.816%\n35.388% 34.402% 34.328%\n36.992% 36.044% 35.994%\n\nComparison with PGD-trained models\n\nWe compare \u201cfree\u201d training to a more costly method using 2-PGD adversarial examples \u0001 = 4.\nWe run the conventional adversarial training algorithm and set \u0001s = 2, \u0001 = 4, and K = 2. All\nother hyper-parameters were identical to those used for training our \u201cfree\u201d models. Note that in our\nexperiments, we do not use any label-smoothing or other common tricks for improving robustness\nsince we want to do a fair comparison between PGD training and our \u201cfree\u201d training. These extra\nregularizations can likely improve results for both approaches.\nWe compare our \u201cfree trained\u201d m = 4 ResNet-50 model and the 2-PGD trained ResNet-50 model in\ntable 4. 2-PGD adversarial training takes roughly 3.4\u00d7 longer than \u201cfree training\u201d and only achieves\nslightly better results (\u22484.5%). This gap is less than 0.5% if we free train a higher capacity model\n(i.e. ResNet-152, see below).\n\nFree training on models with more capacity\n\nIt is believed that increased network capacity leads to greater robustness from adversarial training\n[Madry et al., 2017, Kurakin et al., 2016b]. We verify that this is the case by \u201cfree training\u201d ResNet-\n101 and ResNet-152 with \u0001 = 4. The comparison between ResNet-152, ResNet-101, and ResNet-50\nis summarized in table 5. Free training on ResNet-101 and ResNet-152 each take roughly 1.7\u00d7\nand 2.4\u00d7 more time than ResNet-50 on the same machine, respectively. The higher capacity model\nenjoys a roughly 4% boost to accuracy and robustness.\n\n8 Conclusions\n\nAdversarial training is a well-studied method that boosts the robustness and interpretability of neural\nnetworks. While it remains one of the few effective ways to harden a network to attacks, few can\nafford to adopt it because of its high computation cost. We present a \u201cfree\u201d version of adversarial\ntraining with cost nearly equal to natural training. Free training can be further combined with\nother defenses to produce robust models without a slowdown. We hope that this approach can put\nadversarial training within reach for organizations with modest compute resources.\nAcknowledgements: Goldstein and his students were supported by DARPA GARD, DARPA QED\nfor RML, DARPA L2M, and the YFA program. Additional support was provided by the AFOSR\nMURI program. Davis and his students were supported by the Of\ufb01ce of the Director of National\nIntelligence (ODNI), and IARPA (2014-14071600012). Studer was supported by Xilinx, Inc. and\nthe US NSF under grants ECCS-1408006, CCF-1535897, CCF-1652065, CNS-1717559, and ECCS-\n1824379. Taylor was supported by the Of\ufb01ce of Naval Research (N0001418WX01582) and the\nDepartment of Defense High Performance Computing Modernization Program. The views and\nconclusions contained herein are those of the authors and should not be interpreted as necessarily\nrepresenting the of\ufb01cial policies or endorsements, either expressed or implied, of the ODNI, IARPA,\nor the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for\nGovernmental purposes notwithstanding any copyright annotation thereon.\n\n9\n\n\fReferences\nBattista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim \u0160rndi\u00b4c, Pavel Laskov, Giorgio\nGiacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In ECML-PKDD,\npages 387\u2013402. Springer, 2013.\n\nChristian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow,\n\nand Rob Fergus. Intriguing properties of neural networks. ICLR, 2013.\n\nXingjun Ma, Bo Li, Yisen Wang, Sarah M Erfani, Sudanthi Wijewickrema, Grant Schoenebeck,\nDawn Song, Michael E Houle, and James Bailey. Characterizing adversarial subspaces using local\nintrinsic dimensionality. arXiv preprint arXiv:1801.02613, 2018.\n\nDongyu Meng and Hao Chen. Magnet: a two-pronged defense against adversarial examples. In\nProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security,\npages 135\u2013147. ACM, 2017.\n\nWeilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep\n\nneural networks. arXiv preprint arXiv:1704.01155, 2017.\n\nAnish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of\n\nsecurity: Circumventing defenses to adversarial examples. ICML, 2018.\n\nAleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\n\nTowards deep learning models resistant to adversarial attacks. ICLR, 2017.\n\nShumeet Baluja and Ian Fischer. Adversarial transformation networks: Learning to generate adver-\n\nsarial examples. AAAI, 2018.\n\nOmid Poursaeed, Isay Katsman, Bicheng Gao, and Serge Belongie. Generative adversarial perturba-\n\ntions. CVPR, 2018.\n\nChaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. Generating adversarial\n\nexamples with adversarial networks. IJCAI, 2018.\n\nAli Shafahi, Amin Ghiasi, Furong Huang, and Tom Goldstein. Label smoothing and logit squeezing:\n\nA replacement for adversarial training?, 2019a.\n\nMarius Mosbach, Maksym Andriushchenko, Thomas Trost, Matthias Hein, and Dietrich Klakow.\n\nLogit pairing methods can fool gradient-based attacks. arXiv preprint arXiv:1810.12042, 2018.\n\nAndrew Slavin Ross and Finale Doshi-Velez. Improving the adversarial robustness and interpretability\n\nof deep neural networks by regularizing their input gradients. In AAAI, 2018.\n\nMatthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classi\ufb01er\n\nagainst adversarial manipulation. In NeurIPS, pages 2266\u20132276, 2017.\n\nDaniel Jakubovitz and Raja Giryes. Improving dnn robustness to adversarial attacks using jacobian\n\nregularization. In ECCV, pages 514\u2013529, 2018.\n\nFuxun Yu, Chenchen Liu, Yanzhi Wang, and Xiang Chen. Interpreting adversarial robustness: A\n\nview from decision surface in input space. arXiv preprint arXiv:1810.00144, 2018.\n\nEric Wong and J Zico Kolter. Provable defenses against adversarial examples via the convex outer\n\nadversarial polytope. ICML, 2017.\n\nEric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial\n\ndefenses. In NeurIPS, pages 8400\u20138409, 2018.\n\nAditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semide\ufb01nite relaxations for certifying\n\nrobustness to adversarial examples. In NeurIPS, pages 10877\u201310887, 2018a.\n\nAditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial\n\nexamples. arXiv preprint arXiv:1801.09344, 2018b.\n\n10\n\n\fShiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of formally\n\nrobust neural networks. arXiv preprint arXiv:1811.02625, 2018.\n\nMathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certi\ufb01ed\nrobustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471,\n2018.\n\nBai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Second-order adversarial attack and\ncerti\ufb01able robustness. CoRR, abs/1809.03113, 2018a. URL http://arxiv.org/abs/1809.\n03113.\n\nJeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certi\ufb01ed adversarial robustness via randomized\n\nsmoothing. arXiv preprint arXiv:1902.02918, 2019.\n\nJ. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical\n\nImage Database. In CVPR09, 2009.\n\nCihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, and Kaiming He. Feature denoising for\n\nimproving adversarial robustness. CVPR, 2019.\n\nHarini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial logit pairing. arXiv preprint\n\narXiv:1803.06373, 2018.\n\nIan J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial\n\nexamples. ICLR, 2015.\n\nAlexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world.\n\narXiv preprint arXiv:1607.02533, 2016a.\n\nFlorian Tram\u00e8r, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick Mc-\nDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204,\n2017.\n\nAlexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. ICLR,\n\n2016b.\n\nJi Lin, Chuang Gan, and Song Han. Defensive quantization: When ef\ufb01ciency meets robustness.\n\nICLR, 2019.\n\nJacob Buckman, Aurko Roy, Colin Raffel, and Ian Goodfellow. Thermometer encoding: One hot\n\nway to resist adversarial examples. ICLR, 2018.\n\nYang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. Pixeldefend:\nLeveraging generative models to understand and defend against adversarial examples. arXiv\npreprint arXiv:1710.10766, 2017.\n\nAndrew Ilyas, Ajil Jalal, Eirini Asteri, Constantinos Daskalakis, and Alexandros G Dimakis.\nThe robust manifold defense: Adversarial training using generative models. arXiv preprint\narXiv:1712.09196, 2017.\n\nHaifeng Qian and Mark N Wegman.\n\narXiv:1802.07896, 2018.\n\nL2-nonexpansive neural networks.\n\narXiv preprint\n\nAli Shafahi, Mahyar Najibi, Zheng Xu, John Dickerson, Larry S Davis, and Tom Goldstein. Universal\n\nadversarial training. arXiv preprint arXiv:1811.11304, 2018.\n\nGuneet S Dhillon, Kamyar Azizzadenesheli, Zachary C Lipton, Jeremy Bernstein, Jean Kossai\ufb01, Aran\nKhanna, and Anima Anandkumar. Stochastic activation pruning for robust adversarial defense.\narXiv preprint arXiv:1803.01442, 2018.\n\nSergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146,\n\n2016.\n\nDimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry.\n\nRobustness may be at odds with accuracy. ICLR, 1050:11, 2018.\n\n11\n\n\fHongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I Jordan.\n\nTheoretically principled trade-off between robustness and accuracy. ICML, 2019a.\n\nAli Shafahi, W Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein. Are adversarial\n\nexamples inevitable? ICLR, 2019b.\n\nNicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017\n\nIEEE Symposium on Security and Privacy (SP), pages 39\u201357. IEEE, 2017.\n\nJonathan Uesato, Brendan O\u2019Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk\n\nand the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666, 2018.\n\nDinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, and Bin Dong. You only propagate\nonce: Painless adversarial training using maximal principle. arXiv preprint arXiv:1905.00877,\n2019b.\n\nLogan Engstrom, Andrew Ilyas, and Anish Athalye. Evaluating and understanding the robustness of\n\nadversarial logit pairing. arXiv preprint arXiv:1807.10272, 2018.\n\nHao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape\n\nof neural nets. In NeurIPS, pages 6389\u20136399, 2018b.\n\nOlga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang,\nAndrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition\nchallenge. IJCV, 115(3):211\u2013252, 2015.\n\nChuan Guo, Mayank Rana, Moustapha Cisse, and Laurens van der Maaten. Countering adversarial\n\nimages using input transformations. arXiv preprint arXiv:1711.00117, 2017.\n\nFangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, and Jun Zhu. Defense against\nadversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE\nConference on Computer Vision and Pattern Recognition, pages 1778\u20131787, 2018.\n\nCihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects\n\nthrough randomization. arXiv preprint arXiv:1711.01991, 2017.\n\n12\n\n\f", "award": [], "sourceid": 1853, "authors": [{"given_name": "Ali", "family_name": "Shafahi", "institution": "University of Maryland"}, {"given_name": "Mahyar", "family_name": "Najibi", "institution": "University of Maryland"}, {"given_name": "Mohammad Amin", "family_name": "Ghiasi", "institution": "University of Maryland"}, {"given_name": "Zheng", "family_name": "Xu", "institution": "Google AI"}, {"given_name": "John", "family_name": "Dickerson", "institution": "University of Maryland"}, {"given_name": "Christoph", "family_name": "Studer", "institution": "Cornell University"}, {"given_name": "Larry", "family_name": "Davis", "institution": "University of Maryland"}, {"given_name": "Gavin", "family_name": "Taylor", "institution": "US Naval Academy"}, {"given_name": "Tom", "family_name": "Goldstein", "institution": "University of Maryland"}]}