{"title": "Unlabeled Data Improves Adversarial Robustness", "book": "Advances in Neural Information Processing Systems", "page_first": 11192, "page_last": 11203, "abstract": "We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy. Empirically, we augment CIFAR-10 with 500K unlabeled images sourced from 80 Million Tiny Images and use robust self-training to outperform state-of-the-art robust accuracies by over 5 points in (i) $\\ell_\\infty$ robustness against several strong attacks via adversarial training and (ii) certified $\\ell_2$ and $\\ell_\\infty$ robustness via randomized smoothing. On SVHN, adding the dataset's own extra training set with the labels removed provides gains of 4 to 10 points, within 1 point of the gain from using the extra labels.", "full_text": "Unlabeled Data Improves Adversarial Robustness\n\nYair Carmon\u21e4\n\nStanford University\n\nyairc@stanford.edu\n\nAditi Raghunathan*\nStanford University\n\naditir@stanford.edu\n\nLudwig Schmidt\n\nUC Berkeley\n\nludwig@berkeley.edu\n\nPercy Liang\n\nStanford University\n\npliang@cs.stanford.edu\n\nJohn C. Duchi\n\nStanford University\n\njduchi@stanford.edu\n\nAbstract\n\nWe demonstrate, theoretically and empirically, that adversarial robustness can\nsigni\ufb01cantly bene\ufb01t from semisupervised learning. Theoretically, we revisit the\nsimple Gaussian model of Schmidt et al. [41] that shows a sample complexity gap\nbetween standard and robust classi\ufb01cation. We prove that unlabeled data bridges\nthis gap: a simple semisupervised learning procedure (self-training) achieves high\nrobust accuracy using the same number of labels required for achieving high stan-\ndard accuracy. Empirically, we augment CIFAR-10 with 500K unlabeled images\nsourced from 80 Million Tiny Images and use robust self-training to outperform\nstate-of-the-art robust accuracies by over 5 points in (i) `1 robustness against sev-\neral strong attacks via adversarial training and (ii) certi\ufb01ed `2 and `1 robustness\nvia randomized smoothing. On SVHN, adding the dataset\u2019s own extra training set\nwith the labels removed provides gains of 4 to 10 points, within 1 point of the gain\nfrom using the extra labels.\n\n1 Introduction\n\nThe past few years have seen an intense research interest in making models robust to adversarial\nexamples [44, 4, 3]. Yet despite a wide range of proposed defenses, the state-of-the-art in adversarial\nrobustness is far from satisfactory. Recent work points towards sample complexity as a possible\nreason for the small gains in robustness: Schmidt et al. [41] show that in a simple model, learning\na classi\ufb01er with non-trivial adversarially robust accuracy requires substantially more samples than\nachieving good \u201cstandard\u201d accuracy. Furthermore, recent empirical work obtains promising gains\nin robustness via transfer learning of a robust classi\ufb01er from a larger labeled dataset [18]. While\nboth theory and experiments suggest that more training data leads to greater robustness, following\nthis suggestion can be dif\ufb01cult due to the cost of gathering additional data and especially obtaining\nhigh-quality labels.\nTo alleviate the need for carefully labeled data, in this paper we study adversarial robustness through\nthe lens of semisupervised learning. Our approach is motivated by two basic observations. First,\nadversarial robustness essentially asks that predictors be stable around naturally occurring inputs.\nLearning to satisfy such a stability constraint should not inherently require labels. Second, the added\nrequirement of robustness fundamentally alters the regime where semi-supervision is useful. Prior\nwork on semisupervised learning mostly focuses on improving the standard accuracy by leveraging\n\n\u21e4 Equal contribution.\n\nCodaLab at https://bit.ly/349WsAC.\n\nCode and data are available on GitHub at https://github.com/yaircarmon/semisup-adv and on\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\funlabeled data. However, in our adversarial setting the labeled data alone already produce accurate\n(but not robust) classi\ufb01ers. We can use such classi\ufb01ers on the unlabeled data and obtain useful\npseudo-labels, which directly suggests the use of self-training\u2014one of the oldest frameworks for\nsemisupervised learning [42, 8], which applies a supervised training method on the pseudo-labeled\ndata. We provide theoretical and experimental evidence that self-training is effective for adversarial\nrobustness.\nThe \ufb01rst part of our paper is theoretical and considers the simple d-dimensional Gaussian model\n[41] with `1-perturbations of magnitude \u270f. We scale the model so that n0 labeled examples allow\nnecessary for attaining any nontrivial robust accuracy. This implies a sample complexity gap in the\n\nfor learning a classi\ufb01er with nontrivial standard accuracy, and roughly n0\u00b7\u270f2pd/n0 examples are\nhigh-dimensional regime d n0\u270f4. In this regime, we prove that self training with O(n0\u00b7\u270f2pd/n0)\n\nunlabeled data and just n0 labels achieves high robust accuracy. Our analysis provides a re\ufb01ned\nperspective on the sample complexity barrier in this model: the increased sample requirement is\nexclusively on unlabeled data.\nOur theoretical \ufb01ndings motivate the second, empirical part of our paper, where we test the effect\nof unlabeled data and self-training on standard adversarial robustness benchmarks. We propose and\nexperiment with robust self-training (RST), a natural extension of self-training that uses standard\nsupervised training to obtain pseudo-labels and then feeds the pseudo-labeled data into a supervised\ntraining algorithm that targets adversarial robustness. We use TRADES [56] for heuristic `1-\nrobustness, and stability training [57] combined with randomized smoothing [9] for certi\ufb01ed `2-\nrobustness.\nFor CIFAR-10 [22], we obtain 500K unlabeled images by mining the 80 Million Tiny Images\ndataset [46] with an image classi\ufb01er. Using RST on the CIFAR-10 training set augmented with\nthe additional unlabeled data, we outperform state-of-the-art heuristic `1-robustness against strong\niterative attacks by 7%. In terms of certi\ufb01ed `2-robustness, RST outperforms our fully supervised\nbaseline by 5% and beats previous state-of-the-art numbers by 10%. Finally, we also match the\nstate-of-the-art certi\ufb01ed `1-robustness, while improving on the corresponding standard accuracy\nby over 16%. We show that some natural alternatives such as virtual adversarial training [30] and\naggressive data augmentation do not perform as well as RST. We also study the sensitivity of RST to\nvarying data volume and relevance.\nExperiments with SVHN show similar gains in robustness with RST on semisupervised data. Here,\nwe apply RST by removing the labels from the 531K extra training data and see 4\u201310% increases in\nrobust accuracies compared to the baseline that only uses the labeled 73K training set. Swapping\nthe pseudo-labels for the true SVHN extra labels increases these accuracies by at most 1%. This\ncon\ufb01rms that the majority of the bene\ufb01t from extra data comes from the inputs and not the labels.\nIn independent and concurrent work, Uesato et al. [48], Naja\ufb01 et al. [32] and Zhai et al. [55] also\nexplore semisupervised learning for adversarial robustness. See Section 6 for a comparison.\nBefore proceeding to the details of our theoretical results in Section 3, we brie\ufb02y introduce relevant\nbackground in Section 2. Sections 4 and 5 then describe our adversarial self-training approach and\nprovide comprehensive experiments on CIFAR-10 and SVHN. We survey related work in Section 6\nand conclude in Section 7.\n\n2 Setup\n\nSemi-supervised classi\ufb01cation task. We consider the task of mapping input x2X\u2713 Rd to label\ny2Y . Let Px,y denote the underlying distribution of (x,y) pairs, and let Px denote its marginal on\nX . Given training data consisting of (i) labeled examples (X,Y ) = (x1,y1),...(xn,yn)\u21e0 Px,y and\n(ii) unlabeled examples \u02dcX = \u02dcx1,\u02dcx2,...\u02dcx\u02dcn\u21e0 Px, the goal is to learn a classi\ufb01er f\u2713 :X!Y in a model\nfamily parameterized by \u27132 \u21e5.\nError metrics. The standard quality metric for classi\ufb01er f\u2713 is its error probability,\n\nWe also evaluate classi\ufb01ers on their performance on adversarially perturbed inputs. In this work, we\nconsider perturbations in a `p norm ball of radius \u270f around the input, and de\ufb01ne the corresponding\n\nerrstandard(f\u2713) :=P(x,y)\u21e0Px,yf\u2713(x)6= y.\n\n(1)\n\n2\n\n\frobust error probability,\n\nerrp,\u270f\n\n(2)\nIn this paper we study p = 2 and p =1. We say that a classi\ufb01er f\u2713 has certi\ufb01ed `p accuracy \u21e0 when\nwe can prove that errp,\u270f\n\n\u270f (x) :={x02X |k x0xkp\uf8ff \u270f}.\n\nrobust(f\u2713) :=P(x,y)\u21e0Px,y9x02B p\nrobust(f\u2713)\uf8ff 1\u21e0.\n\n\u270f (x),f\u2713(x0)6= y for Bp\n\nSelf-training. Consider a supervised learning algorithm A that maps a dataset (X,Y ) to parameter\n\u2713. Self-training is the straightforward extension of A to a semisupervised setting, and consists of the\nfollowing two steps. First, obtain an intermediate model \u02c6\u2713intermediate = A(X,Y ), and use it to generate\npseudo-labels \u02dcyi = f\u02c6\u2713intermediate\n(\u02dcxi) for i2 [\u02dcn]. Second, combine the data and pseudo-labels to obtain a\n\ufb01nal model \u02c6\u2713\ufb01nal = A([X, \u02dcX],[Y, \u02dcY ]).\n\n3 Theoretical results\nIn this section, we consider a simple high-dimensional model studied in [41], which is the only known\nformal example of an information-theoretic sample complexity gap between standard and robust\nclassi\ufb01cation. For this model, we demonstrate the value of unlabeled data\u2014a simple self-training\nprocedure achieves high robust accuracy, when achieving non-trivial robust accuracy using the labeled\ndata alone is impossible.\nGaussian model. We consider a binary classi\ufb01cation task where X =Rd, Y ={1,1}, y uniform\non Y and x|y \u21e0N (y\u00b5,2I) for a vector \u00b5 2 Rd and coordinate noise variance 2 > 0. We are\ninterested in the standard error (1) and robust error err1,\u270f\n\nrobust (2) for `1 perturbations of size \u270f.\n\nParameter setting. We choose the model parameters to meet the following desiderata: (i) there\nexists a classi\ufb01er that achieves very high robust and standard accuracies, (ii) using n0 examples\nwe can learn a classi\ufb01er with non-trivial standard accuracy and (iii) we require much more than\nn0 examples to learn a classi\ufb01er with nontrivial robust accuracy. As shown in [41], the following\nparameter setting meets the desiderata,\n\n\u270f2 (0, 1\n\n2 ), k\u00b5k2 = d and k\u00b5k2\n\n2 =r d\n\nn0 \n\n1\n\u270f2 .\n\n(3)\n\nWhen interpreting this setting it is useful to think of \u270f as \ufb01xed and of d/n0 as a large number, i.e. a\nhighly overparameterized regime.\n3.1 Supervised learning in the Gaussian model\nWe brie\ufb02y recapitulate the sample complexity gap described in [41] for the fully supervised setting.\n\nLearning a simple linear classi\ufb01er. We consider linear classi\ufb01ers of the form f\u2713 = sign(\u2713>x).\nGiven n labeled data (x1,y1),...,(xn,yn)\n\niid\u21e0 Px,y, we form the following simple classi\ufb01er\n\u02c6\u2713n :=\n\nyixi.\n\n(4)\n\nn n0 ) E\u02c6\u2713n\n\nWe achieve nontrivial standard accuracy using n0 examples; see Appendix A.2 for proof of the\nfollowing (as well as detailed rates of convergence).\n\nerrstandard\u21e3f\u02c6\u2713n\u2318\uf8ff\n\nProposition 1. There exists a universal constant r such that for all \u270f2pd/n0 r,\nnontrivial robust error without observinge\u2326(n0\u00b7\u270f2pd/n0) examples. Thus, a sample complexity gap\n\nforms as d grows.\nTheorem 1 ([41]). Let An be any learning rule mapping a dataset S2 (X\u21e5Y )n to classi\ufb01er An[S].\nThen,\n\nMoreover, as the following theorem states, no learning algorithm can produce a classi\ufb01er with\n\nand n n0\u00b74\u270f2r d\n\nrobust\u21e3f\u02c6\u2713n\u2318\uf8ff 103.\n\nn0 ) E\u02c6\u2713n\n\nerr1,\u270f\n\n1\n3\n\n1\n(1d1),\n(5)\nrobust(An[S])\n2\nwhere the expectation is with respect to the random draw of S\u21e0 P n\nx,y as well as possible randomization\nin An.\n\nn\uf8ff n0\n\n\u270f2pd/n0\n8logd ) Eerr1,\u270f\n\n1\nn\n\nnXi=1\n\n3\n\n\f3.2 Semi-supervised learning in the Gaussian model\nWe now consider the semisupervised setting with n labeled examples and \u02dcn additional unlabeled\nexamples. We apply the self-training methodology described in Section 2 on the simple learning\nrule (4); our intermediate classi\ufb01er is \u02c6\u2713intermediate := \u02c6\u2713n = 1\ni=1yixi, and we generate pseudo-labels\n\u02c6\u2713intermediate) for i = 1,...,\u02dcn. We then learning rule (4) to obtain our \ufb01nal\n\u02dcyi := f\u02c6\u2713intermediate\nsemisupervised classi\ufb01er \u02c6\u2713\ufb01nal := 1\ni=1 \u02dcyi \u02dcxi. The following theorem guarantees that \u02c6\u2713\ufb01nal achieves\nhigh robust accuracy.\n\nTheorem 2. There exists a universal constant \u02dcr such that for \u270f2pd/n0 \u02dcr, n n0 labeled data and\n\nadditional \u02dcn unlabeled data,\n\n(\u02dcxi) = sign(\u02dcx>i\n\nnPn\n\n\u02dcnP\u02dcn\n\u02dcn n0\u00b7288\u270f2r d\n\nn0 ) E\u02c6\u2713\ufb01nal\n\nerr1,\u270f\n\nrobust\u21e3f\u02c6\u2713\ufb01nal\u2318\uf8ff 103.\n\ni=1 \u02dcyiyi)\u00b5 + 1\n\n\u02dcnP\u02dcn\n\n\u02dcnP\u02dcn\n\n\u02dcnP\u02dcn\n\n6 while the variance of 1\n\nTherefore, compared to the fully supervised case, the self-training classi\ufb01er requires only a constant\n\nfactor more input examples, and roughly a factor \u270f2pd/n0 fewer labels. We prove Theorem 2\nin Appendix A.4, where we also precisely characterize the rates of convergence of the robust error;\nthe outline of our argument is as follows. We have \u02c6\u2713\ufb01nal = ( 1\ni=1 \u02dcyi\"i where\n\"i \u21e0N (0, 2I) is the noise in example i. We show (in Appendix A.4) that with high probability\n\u02dcnP\u02dcn\ni=1 \u02dcyi\"i goes to zero as \u02dcn grows, and therefore the\ni=1 \u02dcyiyi 1\n1\nangle between \u02c6\u2713\ufb01nal and \u00b5 goes to zero. Substituting into a closed-form expression for err1,\u270f\nrobust(f\u02c6\u2713\ufb01nal\n)\n(Eq. (11) in Appendix A.1) gives the desired upper bound. We remark that other learning techniques,\nsuch as EM and PCA, can also leverage unlabeled data in this model. The self-training procedure we\ndescribe is similar to 2 steps of EM [11].\n3.3 Semisupervised learning with irrelevant unlabeled data\nIn Appendix A.5 we study a setting where only \u21b5\u02dcn of the unlabeled data are relevant to the task,\nwhere we model the relevant data as before, and the irrelevant data as having no signal component,\ni.e., with y uniform on {1,1} and x \u21e0N (0, 2I) independent of y. We show that for any \ufb01xed\n\u21b5, high robust accuracy is still possible, but the required number of relevant examples grows by a\nfactor of 1/\u21b5 compared to the amount of unlabeled examples require to achieve the same robust\naccuracy when all the data is relevant. This demonstrates that irrelevant data can signi\ufb01cantly hinder\nself-training, but does not stop it completely.\n\n4 Semi-supervised learning of robust neural networks\nExisting adversarially robust training methods are designed for the supervised setting. In this section,\nwe use these methods to leverage additional unlabeled data by adapting the self-training framework\ndescribed in Section 2.\n\nMeta-Algorithm 1 Robust self-training\n\nInput: Labeled data (x1,y1,...,xn,yn) and unlabeled data (\u02dcx1,...,\u02dcx\u02dcn)\nParameters: Standard loss Lstandard, robust loss Lrobust and unlabeled weight w\n\n1: Learn \u02c6\u2713intermediate by minimizing\n2: Generate pseudo-labels \u02dcyi = f\u02c6\u2713intermediate\n3: Learn \u02c6\u2713\ufb01nal by minimizing\n\nnPi=1\n\nLrobust(\u2713,xi,yi)+w\n\nLstandard(\u2713,xi,yi)\n(\u02dcxi) for i = 1,2,...\u02dcn\n\nnPi=1\n\nLrobust(\u2713,\u02dcxi,\u02dcyi)\n\n\u02dcnPi=1\n\nMeta-Algorithm 1 summarizes robust-self training. In contrast to standard self-training, we use a\ndifferent supervised learning method in each stage, since the intermediate and the \ufb01nal classi\ufb01ers\nhave different goals. In particular, the only goal of \u02c6\u2713intermediate is to generate high quality pseudo-labels\nfor the (non-adversarial) unlabeled data. Therefore, we perform standard training in the \ufb01rst stage,\nand robust training in the second. The hyperparameter w allows us to upweight the labeled data,\nwhich in some cases may be more relevant to the task (e.g., when the unlabeled data comes form a\ndifferent distribution), and will usually have more accurate labels.\n\n4\n\n\fInstantiating robust self-training\n\n4.1\nBoth stages of robust self-training perform supervised learning, allowing us to borrow ideas from\nthe literature on supervised standard and robust training. We consider neural networks of the form\nf\u2713(x) = argmaxy2Y p\u2713(y| x), where p\u2713(\u00b7| x) is a probability distribution over the class labels.\nStandard loss. As in common, we use the multi-class logarithmic loss for standard supervised\nlearning,\n\nLstandard(\u2713,x,y) =logp\u2713(y| x).\n\nRobust loss. For the supervised robust loss, we use a robustness-promoting regularization term\nproposed in [56] and closely related to earlier proposals in [57, 30, 20]. The robust loss is\n\nLrobust(\u2713,x,y) = Lstandard(\u2713,x,y)+Lreg(\u2713,x),\n\nwhere Lreg(\u2713,x) := max\nx02Bp\n\n\u270f (x)\n\nDKL(p\u2713(\u00b7| x)k p\u2713(\u00b7| x0)).\n\n(6)\n\nThe regularization term2 Lreg forces predictions to remain stable within Bp\n\u270f (x), and the hyperparam-\neter balances the robustness and accuracy objectives. We consider two approximations for the\nmaximization in Lreg.\n\n1. Adversarial training: a heuristic defense via approximate maximization.\n\nWe focus on `1 perturbations and use the projected gradient method to approximate the regular-\nization term of (6),\n\n(7)\nwhere x0PG[x] is obtained via projected gradient ascent on r(x0) = DKL(p\u2713(\u00b7 | x) k p\u2713(\u00b7 | x0)).\nEmpirically, performing approximate maximization during training is effective in \ufb01nding classi\ufb01ers\nthat are robust to a wide range of attacks [29].\n\nLadv\nreg (\u2713,x) := DKL(p\u2713(\u00b7| x)k p\u2713(\u00b7| x0PG[x])),\n\n2. Stability training: a certi\ufb01ed `2 defense via randomized smoothing.\n\nAlternatively, we consider stability training [57, 26], where we replace maximization over small\nperturbations with much larger additive random noise drawn from N (0,2I),\n\nLstab\nreg (\u2713,x) :=Ex0\u21e0N (x,2I)DKL(p\u2713(\u00b7| x)k p\u2713(\u00b7| x0)).\n\n(8)\n\n(9)\n\nLet f\u2713 be the classi\ufb01er obtained by minimizing Lstandard + Lstab\nfollowing smoothed classi\ufb01er.\ng\u2713(x) := argmax\n\nq\u2713(y| x), where q\u2713(y| x) :=Px0\u21e0N (x,2I)(f\u2713(x0) = y).\n\nrobust. At test time, we use the\n\ny2Y\n\nImproving on previous work [24, 26], Cohen et al. [9] prove that robustness of f\u2713 to large random\nperturbations (the goal of stability training) implies certi\ufb01ed `2 adversarial robustness of the\nsmoothed classi\ufb01er g\u2713.\n\n5 Experiments\nIn this section, we empirically evaluate robust self-training (RST) and show that it leads to consistent\nand substantial improvement in robust accuracy, on both CIFAR-10 [22] and SVHN [53] and with\nboth adversarial (RSTadv) and stability training (RSTstab). For CIFAR-10, we mine unlabeled data\nfrom 80 Million Tiny Images and study in depth the strengths and limitations of RST. For SVHN, we\nsimulate unlabeled data by removing labels and show that with RST the harm of removing the labels is\nsmall. This indicates that most of the gain comes from additional inputs rather than additional labels.\nOur experiments build on open source code from [56, 9]; we release our data and code at https:\n//github.com/yaircarmon/semisup-adv and on CodaLab at https://bit.ly/349WsAC.\nEvaluating heuristic defenses. We evaluate RSTadv and other heuristic defenses on their perfor-\nmance against the strongest known `1 attacks, namely the projected gradient method [29], denoted\nPG and the Carlini-Wagner attack [7] denoted CW.\n\n2 Zhang et al. [56] write the regularization term DKL(p\u2713(\u00b7 | x0) k p\u2713(\u00b7 | x)), i.e. with p\u2713(\u00b7 | x0) rather than\n\np\u2713(\u00b7| x) taking role of the label, but their open source implementation follows (6).\n\n5\n\n\fModel\nRSTadv(50K+500K)\nTRADES [56]\nAdv. pre-training [18]\nMadry et al. [29]\nStandard self-training\n\nPGMadry\n63.1\n55.8\n57.4\n45.8\n-\n\nPGTRADES\n63.1\n56.6\n58.2\n-\n0.3\n\nPGOurs\n62.5\n55.4\n57.7\n-\n0\n\nCW [7] Best attack\n62.5 \u00b10.1\n64.9\n65.0\n55.4\n57.4\u2020\n-\n47.8\n45.8\n0\n-\n\nNo attack\n89.7 \u00b10.1\n84.9\n87.1\n87.3\n96.4\n\nTable 1: Heuristic defense. CIFAR-10 test accuracy under different optimization-based `1 attacks\nof magnitude \u270f = 8/255. Robust self-training (RST) with 500K unlabeled Tiny Images outperforms\nthe state-of-the-art robust models in terms of robustness as well as standard accuracy (no attack).\nStandard self-training with the same data does not provide robustness. \u2020: A projected gradient attack\nwith 1K restarts reduces the accuracy of this model to 52.9%, evaluated on 10% of the test set [18].\n\nModel\n\nRSTstab(50K+500K)\nBaselinestab(50K)\nWong et al. (single) [50]\nWong et al. (ensemble) [50]\nIBP [17]\n\n(b)\n\n`1 acc. at\n\u270f = 2\n255\n63.8 \u00b1 0.5\n58.6 \u00b1 0.4\n53.9\n63.6\n50.0\n\nStandard\nacc.\n80.7 \u00b1 0.3\n77.9 \u00b1 0.1\n68.3\n64.1\n70.2\n\n(a)\n\nFigure 1: Certi\ufb01ed defense. Guaranteed CIFAR-10 test accuracy under all `2 and `1 attacks.\nStability-based robust self-training with 500K unlabeled Tiny Images (RSTstab(50K+500K)) outper-\nforms stability training with only labeled data (Baselinestab(50K)). (a) Accuracy vs. `2 radius,\ncerti\ufb01ed via randomized smoothing [9]. Shaded regions indicate variation across 3 runs. Accuracy\nat `2 radius 0.435 implies accuracy at `1 radius 2/255. (b) The implied `1 certi\ufb01ed accuracy is\ncomparable to the state-of-the-art in methods that directly target `1 robustness.\n\nEvaluating certi\ufb01ed defenses. For RSTstab and other models trained against random noise, we\nevaluate certi\ufb01ed robust accuracy of the smoothed classi\ufb01er against `2 attacks. We perform the\ncerti\ufb01cation using the randomized smoothing protocol described in [9], with parameters N0 = 100,\nN = 104, \u21b5 = 103 and noise variance = 0.25.\nEvaluating variability. We repeat training 3 times and report accuracy as X \u00b1 Y, with X the\nmedian across runs and Y half the difference between the minimum and maximum.\n5.1 CIFAR-10\n5.1.1 Sourcing unlabeled data\n\nTo obtain unlabeled data distributed similarly to the CIFAR-10 images, we use the 80 Million Tiny\nImages (80M-TI) dataset [46], of which CIFAR-10 is a manually labeled subset. However, most\nimages in 80M-TI do not correspond to CIFAR-10 image categories. To select relevant images, we\ntrain an 11-way classi\ufb01er to distinguish CIFAR-10 classes and an 11th \u201cnon-CIFAR-10\u201d class using a\nWide ResNet 28-10 model [54] (the same as in our experiments below). For each class, we select\nadditional 50K images from 80M-TI using the trained model\u2019s predicted scores3\u2014this is our 500K\nimages unlabeled which we add to the 50K CIFAR-10 training set when performing RST. We provide\na detailed description of the data sourcing process in Appendix B.6.\n\n5.1.2 Bene\ufb01t of unlabeled data\n\nWe perform robust self-training using the unlabeled data described above. We use a Wide ResNet\n28-10 architecture for both the intermediate pseudo-label generator and \ufb01nal robust model. For\nadversarial training, we compute xPG exactly as in [56] with \u270f = 8/255, and denote the resulting\n\n3We exclude any image close to the CIFAR-10 test set; see Appendix B.6 for detail.\n\n6\n\n0.00.10.20.30.40.50.62radius5060708090cert.accuracy(%)RSTstab(50K+500K)Baselinestab(50K)Cohenetal.\fmodel as RSTadv(50K+500K). For stability training, we set the additive noise variance to to = 0.25\nand denote the result RSTstab(50K+500K). We provide training details in Appendix B.1.\n\nRobustness of RSTadv(50K+500K) against strong attacks.\nIn Table 1, we report the accuracy of\nRSTadv(50K+500K) and the best models in the literature against various strong attacks at \u270f = 8/255\n(see Appendix B.3 for details). PGTRADES and PGMadry correspond to the attacks used in [56] and [29]\nrespectively, and we apply the Carlini-Wagner attack CW [7] on 1,000 random test examples, where\nwe use the implementation [34] that performs search over attack hyperparameters. We also tune a PG\nattack against RSTadv(50K+500K) (to maximally reduce its accuracy), which we denote PGOurs (see\nAppendix B.3 for details).\nRSTadv(50K+500K) gains 7% over TRADES [56], which we can directly attribute to the unlabeled\ndata (see Appendix B.4). In Appendix C.7 we also show this gain holds over different attack radii.\nThe model of Hendrycks et al. [18] is based on ImageNet adversarial pretraining and is less directly\ncomparable to ours due to the difference in external data and training method. Finally, we perform\nstandard self-training using the unlabeled data, which offers a moderate 0.4% improvement in\nstandard accuracy over the intermediate model but is not adversarially robust (see Appendix C.6).\n\nCerti\ufb01ed robustness of RSTstab(50K+500K). Figure 1a shows the certi\ufb01ed robust accuracy\nas a function of `2 perturbation radius for different models. We compare RSTadv(50K+500K)\nwith [9], which has the highest reported certi\ufb01ed accuracy, and Baselinestab(50K), a model\nthat we trained using only the CIFAR-10 training set and the same training con\ufb01guration as\nRSTstab(50K+500K). RSTstab(50K+500K) improves on our Baselinestab(50K) by 3\u20135%. The\ngains of Baselinestab(50K) over the previous state-of-the-art are due to a combination of better\narchitecture, hyperparameters, and training objective (see Appendix B.5). The certi\ufb01ed `2 accuracy\nis strong enough to imply state-of-the-art certi\ufb01ed `1 robustness via elementary norm bounds. In\nFigure 1b we compare RSTstab(50K+500K) to the state-of-the-art in certi\ufb01ed `1 robustness, showing\na a 10% improvement over single models, and performance on par with the cascade approach of [50].\nWe also outperform the cascade model\u2019s standard accuracy by 16%.\n\n5.1.3 Comparison to alternatives and ablations studies\n\nConsistency-based semisupervised learning (Appendix C.1). Virtual adversarial training (VAT),\na state-of-the-art method for (standard) semisupervised training of neural network [30, 33], is easily\nadapted to the adversarially-robust setting. We train models using adversarial- and stability-\ufb02avored\nadaptations of VAT, and compare them to their robust self-training counterparts. We \ufb01nd that the VAT\napproach offers only limited bene\ufb01t over fully-supervised robust training, and that robust self-training\noffers 3\u20136% higher accuracy.\n\nData augmentation (Appendix C.2).\nIn the low-data/standard accuracy regime, strong data aug-\nmentation is competitive against and complementary to semisupervised learning [10, 51], as it\neffectively increases the sample size by generating different plausible inputs. It is therefore natural\nto compare state-of-the-art data augmentation (on the labeled data only) to robust self-training. We\nconsider two popular schemes: Cutout [13] and AutoAugment [10]. While they provide signi\ufb01cant\nbene\ufb01t to standard accuracy, both augmentation schemes provide essentially no improvements when\nwe add them to our fully supervised baselines.\n\nRelevance of unlabeled data (Appendix C.3). The theoretical analysis in Section 3 suggests that\nself-training performance may degrade signi\ufb01cantly in the presence of irrelevant unlabeled data; other\nsemisupervised learning methods share this sensitivity [33]. In order to measure the effect on robust\nself-training, we mix out unlabeled data sets with different amounts of random images from 80M-TI\nand compare the performance of resulting models. We \ufb01nd that stability training is more sensitive\nthan adversarial training, and that both methods still yield noticeable robustness gains, even with only\n50% relevant data.\n\nAmount of unlabeled data (Appendix C.4). We perform robust self-training with varying amounts\nof unlabeled data and observe that 100K unlabeled data provide roughly half the gain provided by\n500K unlabeled data, indicating diminishing returns as data amount grows. However, as we report in\nAppendix C.4, hyperparameter tuning issues make it dif\ufb01cult to assess how performance trends with\ndata amount.\n\n7\n\n\fModel\n\nBaselineat(73K)\nRSTadv(73K+531K)\nBaselineat(604K)\n\nPGOurs\n75.3 \u00b1 0.4\n86.0 \u00b1 0.1\n86.4 \u00b1 0.2\n\nNo attack\n94.7 \u00b1 0.2\n97.1 \u00b1 0.1\n97.5 \u00b1 0.1\n\nFigure 3: SVHN test accuracy for robust training without the extra data, with unlabeled extra (self-\ntraining), and with the labeled extra data. Left: Adversarial training and accuracies under `1 attack\nwith \u270f = 4/255. Right: Stability training and certi\ufb01ed `2 accuracies as a function of perturbation\nradius. Most of the gains from extra data comes from the unlabeled inputs.\n\nAmount of labeled data (Appendix C.5). Finally, to explore the complementary question of the\neffect of varying the amount of labels available for pseudo-label generation, we strip the labels of all\nbut n0 CIFAR-10 images, and combine the remainder with our 500K unlabeled data. We observe that\nn0 = 8K labels suf\ufb01ce to to exceed the robust accuracy of the (50K labels) fully-supervised baselines\nfor both adversarial training and the PGOurs attack, and certi\ufb01ed robustness via stability training.\n5.2 Street View House Numbers (SVHN)\nThe SVHN dataset [53] is naturally split into a core training set of about 73K images and an \u2018extra\u2019\ntraining set with about 531K easier images. In our experiments, we compare three settings: (i) robust\ntraining on the core training set only, denoted Baseline*(73K), (ii) robust self-training with the core\ntraining set and the extra training images, denoted RST*(73K+531K), and (iii) robust training on all\nthe SVHN training data, denoted Baseline*(604K). As in CIFAR-10, we experiment with both\nadversarial and stability training, so \u21e4 stands for either adv or stab.\nBeyond validating the bene\ufb01t of additional data, our SVHN experiments measure the loss inherent in\nusing pseudo-labels in lieu of true labels. Figure 3 summarizes the results: the unlabeled provides\nsigni\ufb01cant gains in robust accuracy, and the accuracy drop due to using pseudo-labels is below 1%.\nThis reaf\ufb01rms our intuition that in regimes of interest, perfect labels are not crucial for improving\nrobustness. We give a detailed account of our SVHN experiments in Appendix D, where we also\ncompare our results to the literature.\n\n6 Related work\n\nSemisupervised learning. The literature on semisupervised learning dates back to beginning of\nmachine learning [42, 8]. A recent family of approaches operate by enforcing consistency in the\nmodel\u2019s predictions under various perturbations of the unlabeled data [30, 51], or over the course\nof training [45, 40, 23]. While self-training has shown some gains in standard accuracy [25],\nthe consistency-based approaches perform signi\ufb01cantly better on popular semisupervised learning\nbenchmarks [33]. In contrast, our paper considers the very different regime of adversarial robustness,\nand we observe that robust self-training offers signi\ufb01cant gains in robustness over fully-supervised\nmethods. Moreover, it seems to outperform consistency-based regularization (VAT; see Section C.1).\nWe note that there are many additional approaches to semisupervised learning, including transductive\nSVMs, graph-based methods, and generative modeling [8, 58].\n\nSelf-training for domain adaptation. Self-training is gaining prominence in the related setting of\nunsupervised domain adaptation (UDA). There, the unlabeled data is from a \u201ctarget\u201d distribution,\nwhich is different from the \u201csource\u201d distribution that generates labeled data. Several recent ap-\nproaches [cf. 27, 19] are based on approximating class-conditional distributions of the target domain\nvia self-training, and then learning feature transformations that match these conditional distributions\nacross the source and target domains. Another line of work [59, 60] is based on iterative self-training\ncoupled with re\ufb01nements such as class balance or con\ufb01dence regularization. Adversarial robustness\nand UDA share the similar goal of learning models that perform well under some kind of distribution\nshift; in UDA we access the target distribution through unlabeled data while in adversarial robustness,\nwe characterize target distributions via perturbations. The fact that self-training is effective in both\ncases suggests it may apply to distribution shift robustness more broadly.\n\n8\n\n0.00.10.20.30.40.50.62radius5060708090cert.accuracy(%)RSTstab(73K+531K)Baselinestab(604K)Baselinestab(73K)\fTraining robust classi\ufb01ers. The discovery of adversarial examples [44, 4, 3] prompted a \ufb02urry of\n\u201cdefenses\u201d and \u201cattacks.\u201d While several defenses were broken by subsequent attacks [7, 1, 6], the\ngeneral approach of adversarial training [29, 43, 56] empirically seems to offer gains in robustness.\nOther lines of work attain certi\ufb01ed robustness, though often at a cost to empirical robustness compared\nto heuristics [36, 49, 37, 50, 17]. Recent work by Hendrycks et al. [18] shows that even when pre-\ntraining has limited value for standard accuracy on benchmarks, adversarial pre-training is effective.\nWe complement this work by showing that a similar conclusion holds for semisupervised learning\n(both practically and theoretically in a stylized model), and extends to certi\ufb01ed robustness as well.\n\nSample complexity upper bounds. Recent works [52, 21, 2] study adversarial robustness from a\nlearning-theoretic perspective, and in a number of simpli\ufb01ed settings develop generalization bounds\nusing extensions of Rademacher complexity. In some cases these upper bounds are demonstrably\nlarger than their standard counterparts, suggesting there may be statistical barriers to robust learning.\n\nBarriers to robustness. Schmidt et al. [41] show a sample complexity barrier to robustness in a\nstylized setting. We observed that in this model, unlabeled data is as useful for robustness as labeled\ndata. This observation led us to experiment with robust semisupervised learning. Recent work also\nsuggests other barriers to robustness: Montasser et al. [31] show settings where improper learning\nand surrogate losses are crucial in addition to more samples; Bubeck et al. [5] and Degwekar and\nVaikuntanathan [12] show possible computational barriers; Gilmer et al. [16] show a high-dimensional\nmodel where robustness is a consequence of any non-zero standard error, while Raghunathan et al.\n[38], Tsipras et al. [47], Fawzi et al. [15] show settings where robust and standard errors are at odds.\nStudying ways to overcome these additional theoretical barriers may translate to more progress in\npractice.\n\nSemisupervised learning for adversarial robustness.\nIndependently and concurrently with our\nwork, Zhai et al. [55], Naja\ufb01 et al. [32] and Uesato et al. [48] also study the use of unlabeled data in\nthe adversarial setting. We brie\ufb02y describe each work in turn, and then contrast all three with ours.\nZhai et al. [55] study the Gaussian model of [41] and show a PCA-based procedure that successfully\nleverages unlabeled data to obtain adversarial robustness. They propose a training procedure that at\nevery step treats the current model\u2019s predictions as true labels, and experiment on CIFAR-10. Their\nexperiments include the standard semisupervised setting where some labels are removed, as well as\nthe transductive setting where the test set is added to the training set without labels.\nNaja\ufb01 et al. [32] extend the distributionally robust optimization perspective of [43] to a semisupervised\nsetting. They propose a training objective that replaces pseudo-labels with soft labels weighted\naccording to an adversarial loss, and report results on MNIST, CIFAR-10, and SVHN with some\ntraining labels removed. The experiments in [55, 32] do not augment CIFAR-10 with new unlabeled\ndata and do not improve the state-of-the-art in adversarial robustness.\nThe work of Uesato et al. [48] is the closest to ours\u2014they also study self-training in the Gaussian\nmodel and propose a version of robust self-training which they apply on CIFAR-10 augmented with\nTiny Images. Using the additional data they obtain new state-of-the-art results in heuristic defenses,\ncomparable to ours. As our papers are very similar, we provide a detailed comparison in Appendix E.\nOur paper offers a number of perspectives that complement [48, 55, 32]. First, in addition to\nheuristic defenses, we show gains in certi\ufb01ed robustness where we have a guarantee on robustness\nagainst all possible attacks. Second, we study the impact of irrelevant unlabeled data theoretically\n(Section 3.3) and empirically (Appendix C.3). Finally, we provide additional experimental studies of\ndata augmentation and of the impact of unlabeled data amount when using all labels from CIFAR-10.\n\n7 Conclusion\nWe show that unlabeled data closes a sample complexity gap in a stylized model and that robust self-\ntraining (RST) is consistently bene\ufb01cial on two image classi\ufb01cation benchmarks. Our \ufb01ndings open\nup a number of avenues for further research. Theoretically, is suf\ufb01cient unlabeled data a universal\ncure for sample complexity gaps between standard and adversarially robust learning? Practically,\nwhat is the best way to leverage unlabeled data for robustness, and can semisupervised learning\nsimilarly bene\ufb01t alternative (non-adversarial) notions of robustness? As the scale of data grows,\ncomputational capacities increase, and machine learning moves beyond minimizing average error, we\nexpect unlabeled data to provide continued bene\ufb01t.\n\n9\n\n\fReproducibility. Code, data, and experiments are available on GitHub at https://github.com/\nyaircarmon/semisup-adv and on CodaLab at https://bit.ly/349WsAC.\nAcknowledgments\n\nThe authors would like to thank an anonymous reviewer for proposing the label amount experiment\nin Appendix C.5. YC was supported by the Stanford Graduate Fellowship. AR was supported\nby the Google Fellowship and Open Philanthropy AI Fellowship. PL was supported by the Open\nPhilanthropy Project Award. JCD was supported by the NSF CAREER award 1553086, the Sloan\nFoundation and ONR-YIP N00014-19-1-2288.\n\nReferences\n[1] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security:\n\nCircumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.\n\n[2] I. Attias, A. Kontorovich, and Y. Mansour. Improved generalization bounds for robust learning.\n\nIn Algorithmic Learning Theory, pages 162\u2013183, 2019.\n\n[3] B. Biggio and F. Roli. Wild patterns: Ten years after the rise of adversarial machine learning.\n\nPattern Recognition, 84:317\u2013331, 2018.\n\n[4] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. \u0160rndi\u00b4c, P. Laskov, G. Giacinto, and F. Roli.\nEvasion attacks against machine learning at test time. In Joint European conference on machine\nlearning and knowledge discovery in databases, pages 387\u2013402, 2013.\n\n[5] S. Bubeck, E. Price, and I. Razenshteyn. Adversarial examples from computational constraints.\n\nIn International Conference on Machine Learning (ICML), 2019.\n\n[6] N. Carlini and D. Wagner. Adversarial examples are not easily detected: Bypassing ten detection\n\nmethods. arXiv, 2017.\n\n[7] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE\n\nSymposium on Security and Privacy, pages 39\u201357, 2017.\n\n[8] O. Chapelle, A. Zien, and B. Scholkopf. Semi-Supervised Learning. MIT Press, 2006.\n[9] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter. Certi\ufb01ed adversarial robustness via randomized\n\nsmoothing. In International Conference on Machine Learning (ICML), 2019.\n\n[10] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. Autoaugment: Learning\naugmentation policies from data. In Computer Vision and Pattern Recognition (CVPR), 2019.\n[11] S. Dasgupta and L. Schulman. A probabilistic analysis of EM for mixtures of separated,\n\nspherical Gaussians. Journal of Machine Learning Research (JMLR), 8, 2007.\n\n[12] A. Degwekar and V. Vaikuntanathan. Computational limitations in robust classi\ufb01cation and\n\nwin-win results. arXiv preprint arXiv:1902.01086, 2019.\n\n[13] T. DeVries and G. W. Taylor. Improved regularization of convolutional neural networks with\n\ncutout. arXiv preprint arXiv:1708.04552, 2017.\n\n[14] L. Engstrom, A. Ilyas, and A. Athalye. Evaluating and understanding the robustness of\n\nadversarial logit pairing. arXiv preprint arXiv:1807.10272, 2018.\n\n[15] A. Fawzi, O. Fawzi, and P. Frossard. Analysis of classi\ufb01ers\u2019 robustness to adversarial perturba-\n\ntions. Machine Learning, 107(3):481\u2013508, 2018.\n\n[16] J. Gilmer, L. Metz, F. Faghri, S. S. Schoenholz, M. Raghu, M. Wattenberg, and I. Goodfellow.\n\nAdversarial spheres. arXiv preprint arXiv:1801.02774, 2018.\n\n[17] S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, T. Mann, and P. Kohli. On\nthe effectiveness of interval bound propagation for training veri\ufb01ably robust models. arXiv\npreprint arXiv:1810.12715, 2018.\n\n[18] D. Hendrycks, K. Lee, and M. Mazeika. Using pre-training can improve model robustness and\n\nuncertainty. In International Conference on Machine Learning (ICML), 2019.\n\n[19] N. Inoue, R. Furuta, T. Yamasaki, and K. Aizawa. Cross-domain weakly-supervised object\ndetection through progressive domain adaptation. In Proceedings of the IEEE conference on\ncomputer vision and pattern recognition, pages 5001\u20135009, 2018.\n\n10\n\n\f[20] H. Kannan, A. Kurakin, and I. Goodfellow. Adversarial logit pairing. arXiv preprint\n\narXiv:1803.06373, 2018.\n\n[21] J. Khim and P. Loh. Adversarial risk bounds for binary classi\ufb01cation via function transformation.\n\narXiv preprint arXiv:1810.09519, 2018.\n\n[22] A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report,\n\nUniversity of Toronto, 2009.\n\n[23] S. Laine and T. Aila. Temporal ensembling for semi-supervised learning. In International\n\nConference on Learning Representations (ICLR), 2017.\n\n[24] M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana. Certi\ufb01ed robustness to adversarial\nexamples with differential privacy. In In IEEE Symposium on Security and Privacy (SP), 2019.\n[25] D. Lee. Pseudo-label: The simple and ef\ufb01cient semi-supervised learning method for deep neural\n\nnetworks. In International Conference on Machine Learning (ICML), 2013.\n\n[26] B. Li, C. Chen, W. Wang, and L. Carin. Second-order adversarial attack and certi\ufb01able\n\nrobustness. arXiv preprint arXiv:1809.03113, 2018.\n\n[27] M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu. Transfer feature learning with joint distribution\nadaptation. In Proceedings of the IEEE international conference on computer vision, pages\n2200\u20132207, 2013.\n\n[28] I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with warm restarts. In Interna-\n\ntional Conference on Learning Representations (ICLR), 2017.\n\n[29] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models\nIn International Conference on Learning Representations\n\nresistant to adversarial attacks.\n(ICLR), 2018.\n\n[30] T. Miyato, S. Maeda, S. Ishii, and M. Koyama. Virtual adversarial training: a regularization\nmethod for supervised and semi-supervised learning. IEEE transactions on pattern analysis\nand machine intelligence, 2018.\n\n[31] O. Montasser, S. Hanneke, and N. Srebro. VC classes are adversarially robustly learnable, but\n\nonly improperly. arXiv preprint arXiv:1902.04217, 2019.\n\n[32] A. Naja\ufb01, S. Maeda, M. Koyama, and T. Miyato. Robustness to adversarial perturbations in\n\nlearning from incomplete data. arXiv preprint arXiv:1905.13021, 2019.\n\n[33] A. Oliver, A. Odena, C. A. Raffel, E. D. Cubuk, and I. Goodfellow. Realistic evaluation of deep\nsemi-supervised learning algorithms. In Advances in Neural Information Processing Systems\n(NeurIPS), pages 3235\u20133246, 2018.\n\n[34] N. Papernot, F. Faghri, N. C., I. Goodfellow, R. Feinman, A. Kurakin, C. X., Y. Sharma,\nT. Brown, A. Roy, A. M., V. Behzadan, K. Hambardzumyan, Z. Z., Y. Juang, Z. Li, R. Sheatsley,\nA. G., J. Uesato, W. Gierke, Y. Dong, D. B., P. Hendricks, J. Rauber, and R. Long. Technical\nreport on the cleverhans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768,\n2018.\n\n[35] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,\n\nL. Antiga, and A. Lerer. Automatic differentiation in pytorch, 2017.\n\n[36] A. Raghunathan, J. Steinhardt, and P. Liang. Certi\ufb01ed defenses against adversarial examples. In\n\nInternational Conference on Learning Representations (ICLR), 2018.\n\n[37] A. Raghunathan, J. Steinhardt, and P. Liang. Semide\ufb01nite relaxations for certifying robustness\nto adversarial examples. In Advances in Neural Information Processing Systems (NeurIPS),\n2018.\n\n[38] A. Raghunathan, S. M. Xie, F. Yang, J. C. Duchi, and P. Liang. Adversarial training can hurt\n\ngeneralization. arXiv preprint arXiv:1906.06032, 2019.\n\n[39] B. Recht, R. Roelofs, L. Schmidt, and V. Shankar. Do CIFAR-10 classi\ufb01ers generalize to\n\nCIFAR-10? arXiv, 2018.\n\n[40] M. Sajjadi, M. Javanmardi, and T. Tasdizen. Regularization with stochastic transformations and\nperturbations for deep semi-supervised learning. In Advances in Neural Information Processing\nSystems (NeurIPS), pages 1163\u20131171, 2016.\n\n11\n\n\f[41] L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry. Adversarially robust general-\nization requires more data. In Advances in Neural Information Processing Systems (NeurIPS),\npages 5014\u20135026, 2018.\n\n[42] H. Scudder. Probability of error of some adaptive pattern-recognition machines. IEEE Transac-\n\ntions on Information Theory, 11(3):363\u2013371, 1965.\n\n[43] A. Sinha, H. Namkoong, and J. Duchi. Certi\ufb01able distributional robustness with principled\nadversarial training. In International Conference on Learning Representations (ICLR), 2018.\n[44] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intrigu-\ning properties of neural networks. In International Conference on Learning Representations\n(ICLR), 2014.\n\n[45] A. Tarvainen and H. Valpola. Mean teachers are better role models: Weight-averaged consistency\ntargets improve semi-supervised deep learning results. In Advances in neural information\nprocessing systems, pages 1195\u20131204, 2017.\n\n[46] A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for\nnonparametric object and scene recognition. IEEE transactions on pattern analysis and machine\nintelligence, 30(11):1958\u20131970, 2008.\n\n[47] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry. Robustness may be at odds\n\nwith accuracy. In International Conference on Learning Representations (ICLR), 2019.\n\n[48] J. Uesato, J. Alayrac, P. Huang, R. Stanforth, A. Fawzi, and P. Kohli. Are labels required for\n\nimproving adversarial robustness? arXiv preprint arXiv:1905.13725, 2019.\n\n[49] E. Wong and J. Z. Kolter. Provable defenses against adversarial examples via the convex outer\n\nadversarial polytope. In International Conference on Machine Learning (ICML), 2018.\n\n[50] E. Wong, F. Schmidt, J. H. Metzen, and J. Z. Kolter. Scaling provable adversarial defenses. In\n\nAdvances in Neural Information Processing Systems (NeurIPS), 2018.\n\n[51] Q. Xie, Z. Dai, E. Hovy, M. Luong, and Q. V. Le. Unsupervised data augmentation. arXiv\n\npreprint arXiv:1904.12848, 2019.\n\n[52] D. Yin, R. Kannan, and P. Bartlett. Rademacher complexity for adversarially robust gen-\neralization. In International Conference on Machine Learning (ICML), pages 7085\u20137094,\n2019.\n\n[53] N. Yuval, W. Tao, C. Adam, B. Alessandro, W. Bo, and N. A. Y. Reading digits in natural images\nwith unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised\nFeature Learning, 2011.\n\n[54] S. Zagoruyko and N. Komodakis. Wide residual networks. In British Machine Vision Conference,\n\n2016.\n\n[55] R. Zhai, T. Cai, D. He, C. Dan, K. He, J. Hopcroft, and L. Wang. Adversarially robust\n\ngeneralization just requires more unlabeled data. arXiv preprint arXiv:1906.00555, 2019.\n\n[56] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan. Theoretically principled\ntrade-off between robustness and accuracy. In International Conference on Machine Learning\n(ICML), 2019.\n\n[57] S. Zheng, Y. Song, T. Leung, and I. Goodfellow. Improving the robustness of deep neural\nnetworks via stability training. In Proceedings of the ieee conference on computer vision and\npattern recognition, pages 4480\u20134488, 2016.\n\n[58] X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian \ufb01elds and\nharmonic functions. In International Conference on Machine Learning (ICML), pages 912\u2013919,\n2003.\n\n[59] Y. Zou, Z. Yu, B. V. Kumar, and J. Wang. Unsupervised domain adaptation for semantic\nsegmentation via class-balanced self-training. In European Conference on Computer Vision\n(ECCV), pages 289\u2013305, 2018.\n\n[60] Y. Zou, Z. Yu, X. Liu, B. Kumar, and J. Wang. Con\ufb01dence regularized self-training. arXiv\n\npreprint arXiv:1908.09822, 2019.\n\n12\n\n\f", "award": [], "sourceid": 5993, "authors": [{"given_name": "Yair", "family_name": "Carmon", "institution": "Stanford University"}, {"given_name": "Aditi", "family_name": "Raghunathan", "institution": "Stanford University"}, {"given_name": "Ludwig", "family_name": "Schmidt", "institution": "UC Berkeley"}, {"given_name": "John", "family_name": "Duchi", "institution": "Stanford"}, {"given_name": "Percy", "family_name": "Liang", "institution": "Stanford University"}]}