{"title": "Scaling provable adversarial defenses", "book": "Advances in Neural Information Processing Systems", "page_first": 8400, "page_last": 8409, "abstract": "Recent work has developed methods for learning deep network classifiers that are \\emph{provably} robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks. In this paper, in an effort to scale these approaches to substantially larger models, we extend previous work in three main directly. First, we present a technique for extending these training procedures to much more general networks, with skip connections (such as ResNets) and general nonlinearities; the approach is fully modular, and can be implemented automatically analogously to automatic differentiation. Second, in the specific case of $\\ell_\\infty$ adversarial perturbations and networks with ReLU nonlinearities, we adopt a nonlinear random projection for training, which scales \\emph{linearly} in the number of hidden units (previous approached scaled quadratically). Third, we show how to further improve robust error through cascade models. On both MNIST and CIFAR data sets, we train classifiers that improve substantially on the state of the art in provable robust adversarial error bounds: from 5.8% to 3.1% on MNIST (with $\\ell_\\infty$ perturbations of $\\epsilon=0.1$), and from 80% to 36.4% on CIFAR (with $\\ell_\\infty$ perturbations of $\\epsilon=2/255$).", "full_text": "Scaling provable adversarial defenses\n\nEric Wong\n\nMachine Learning Department\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nericwong@cs.cmu.edu\n\nJan Hendrik Metzen\n\nBosch Center for Arti\ufb01cial Intelligence\n\nRenningen, Germany\n\njanhendrik.metzen@de.bosch.com\n\nFrank R. Schmidt\n\nBosch Center for Arti\ufb01cial Intelligence\n\nRenningen, Germany\n\nfrank.r.schmidt@de.bosch.com\n\nJ. Zico Kolter\n\nComputer Science Department\nCarnegie Mellon University and\n\nBosch Center for Arti\ufb01cial Intelligence\n\nPittsburgh, PA 15213\nzkolter@cs.cmu.edu\n\nAbstract\n\nRecent work has developed methods for learning deep network classi\ufb01ers that are\nprovably robust to norm-bounded adversarial perturbation; however, these methods\nare currently only possible for relatively small feedforward networks. In this paper,\nin an effort to scale these approaches to substantially larger models, we extend\nprevious work in three main directions. First, we present a technique for extending\nthese training procedures to much more general networks, with skip connections\n(such as ResNets) and general nonlinearities; the approach is fully modular, and can\nbe implemented automatically (analogous to automatic differentiation). Second,\nin the speci\ufb01c case of (cid:96)\u221e adversarial perturbations and networks with ReLU\nnonlinearities, we adopt a nonlinear random projection for training, which scales\nlinearly in the number of hidden units (previous approaches scaled quadratically).\nThird, we show how to further improve robust error through cascade models. On\nboth MNIST and CIFAR data sets, we train classi\ufb01ers that improve substantially on\nthe state of the art in provable robust adversarial error bounds: from 5.8% to 3.1%\non MNIST (with (cid:96)\u221e perturbations of \u0001 = 0.1), and from 80% to 36.4% on CIFAR\n(with (cid:96)\u221e perturbations of \u0001 = 2/255). Code for all experiments in the paper is\navailable at https://github.com/locuslab/convex_adversarial/.\n\n1\n\nIntroduction\n\nA body of recent work in adversarial machine learning has shown that it is possible to learn provably\nrobust deep classi\ufb01ers [Wong and Kolter, 2017, Raghunathan et al., 2018, Dvijotham et al., 2018].\nThese are deep networks that are veri\ufb01ably guaranteed to be robust to adversarial perturbations under\nsome speci\ufb01ed attack model; for example, a certain robustness certi\ufb01cate may guarantee that for a\ngiven example x, no perturbation \u2206 with (cid:96)\u221e norm less than some speci\ufb01ed \u0001 could change the class\nlabel that the network predicts for the perturbed example x + \u2206. However, up until this point, such\nprovable guarantees have only been possible for reasonably small-sized networks. It has remained\nunclear whether these methods could extend to larger, more representionally complex networks.\nIn this paper, we make substantial progress towards the goal of scaling these provably robust networks\nto realistic sizes. Speci\ufb01cally, we extend the techniques of Wong and Kolter [2017] in three key ways.\nFirst, while past work has only applied to pure feedforward networks, we extend the framework to\ndeal with arbitrary residual/skip connections (a hallmark of modern deep network architectures),\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fand arbitrary activation functions (Dvijotham et al. [2018] also worked with arbitrary activation\nfunctions, but only for feedforward networks, and just discusses network veri\ufb01cation rather than\nrobust training). Second, and possibly most importantly, computing the upper bound on the robust\nloss in [Wong and Kolter, 2017] in the worst case scales quadratically in the number of hidden units\nin the network, making the approach impractical for larger networks. In this work, we use a nonlinear\nrandom projection technique to estimate the bound in manner that scales only linearly in the size\nof the hidden units (i.e., only a constant multiple times the cost of traditional training), and which\nempirically can be used to train the networks with no degradation in performance from the previous\nwork. Third, we show how to further improve robust performance of these methods, though at the\nexpense of worse non-robust error, using multi-stage cascade models. Through these extensions, we\nare able to improve substantially upon the veri\ufb01ed robust errors obtained by past work.\n\n2 Background and related work\n\nWork in adversarial defenses typically falls in one of three primary categories. First, there is\nongoing work in developing heuristic defenses against adversarial examples: [Goodfellow et al.,\n2015, Papernot et al., 2016, Kurakin et al., 2017, Metzen et al., 2017] to name a few. While this work\nis largely empirical at this point, substantial progress has been made towards developing networks\nthat seem much more robust than previous approaches. Although a distressingly large number of\nthese defenses are quickly \u201cbroken\u201d by more advanced attacks [Athalye et al., 2018], there have\nalso been some methods that have proven empirically resistant to the current suite of attacks; the\nrecent NIPS 2017 adversarial example challenge [Kurakin et al., 2018], for example, highlights some\nof the progress made on developing classi\ufb01ers that appear much stronger in practice than many of\nthe ad-hoc techniques developed in previous years. Many of the approaches, though not formally\nveri\ufb01ed in the strict sense during training, nonetheless have substantial theoretical justi\ufb01cation for\nwhy they may perform well: Sinha et al. [2018] uses properties of statistical robustness to develop an\napproach that is not much more dif\ufb01cult to train and which empirically does achieve some measure\nof resistance to attacks; Madry et al. [2017] considers robustness to a \ufb01rst-order adversary, and shows\nthat a randomized projected gradient descent procedure is optimal in this setting. Indeed, in some\ncases the classi\ufb01ers trained via these methods can be veri\ufb01ed to be adversarially robust using the\nveri\ufb01cation techniques discussed below (though only for very small networks). Despite this progress,\nwe believe it is also crucially important to consider defenses that are provably robust, to avoid any\npossible attack.\nSecond, our work in this paper relates closely to techniques for the formal veri\ufb01cation of neural\nnetworks systems (indeed, our approach can be viewed as a convex procedure for veri\ufb01cation, coupled\nwith a method for training networks via the veri\ufb01ed bounds). In this area, most past work focuses\non using exact (combinatorial) solvers to verify the robustness properties of networks, either via\nSatis\ufb01ability Modulo Theories (SMT) solvers [Huang et al., 2017, Ehlers, 2017, Carlini and Wagner,\n2017] or integer programming approaches [Lomuscio and Maganti, 2017, Tjeng and Tedrake, 2017,\nCheng et al., 2017]. These methods have the bene\ufb01t of being able to reason exactly about robustness,\nbut at the cost of being combinatorial in complexity. This drawback has so far prevented these\nmethods from effectively scaling to large models or being used within a training setting. There\nhave also been a number of recent attempts to verify networks using non-combinatorial methods\n(and this current work \ufb01ts broadly in this general area). For example, Gehr et al. [2018] develop\na suite of veri\ufb01cation methods based upon abstract interpretations (these can be broadly construed\nas relaxations of combinations of activations that are maintained as they pass through the network).\nDvijotham et al. [2018] use an approach based upon analytically solving an optimization problem\nresulting from dual functions of the activations (which extends to activations beyond the ReLU).\nHowever, these methods apply to simple feedforward architectures without skip connections, and\nfocus only on veri\ufb01cation of existing networks.\nThird, and most relevant to our current work, there are several approaches that go beyond provable\nveri\ufb01cation, and also integrate the veri\ufb01cation procedure into the training of the network itself. For\nexample, Hein and Andriushchenko [2017] develop a formal bound for robustness to (cid:96)2 perturbations\nin two-layer networks, and train a surrogate of their bounds. Raghunathan et al. [2018] develop a\nsemide\ufb01nite programming (SDP) relaxation of exact veri\ufb01cation methods, and train a network by\nminimizing this bound via the dual SDP. And Wong and Kolter [2017] present a linear-programming\n(LP) based upper bound on the robust error or loss that can be suffered under norm-bounded\n\n2\n\n\fperturbation, then minimize this upper bound during training; the method is particularly ef\ufb01cient\nsince they do not solve the LP directly, but instead show that it is possible to bound the LP optimal\nvalue and compute elementwise bounds on the activation functions based on a backward pass through\nthe network. However, it is still the case that none of these approaches scale to realistically-sized\nnetworks; even the approach of [Wong and Kolter, 2017], which empirically has been scaled to the\nlargest settings of all the above approaches, in the worst case scales quadratically in the number of\nhidden units in the network and dimensions in the input. Thus, all the approaches so far have been\nlimited to relatively small networks and problems such as MNIST.\n\nContributions This paper \ufb01ts into this third category of integrating veri\ufb01cation into training, and\nmakes substantial progress towards scaling these methods to realistic settings. While we cannot yet\nreach e.g. ImageNet scales, even in this current work, we show that it is possible to overcome the\nmain hurdles to scalability of past approaches. Speci\ufb01cally, we develop a provably robust training\nprocedure, based upon the approach in [Wong and Kolter, 2017], but extending it in three key ways.\nThe resulting method: 1) extends to general networks with skip connections, residual layers, and\nactivations besides the ReLU; we do so by using a general formulation based on the Fenchel conjugate\nfunction of activations; 2) scales linearly in the dimensionality of the input and number of hidden\nunits in the network, using techniques from nonlinear random projections, all while suffering minimal\ndegradation in accuracy; and 3) further improves the quality of the bound with model cascades. We\ndescribe each of these contributions in the next section.\n\n3 Scaling provably robust networks\n\n3.1 Robust bounds for general networks via modular dual functions\n\nThis section presents an architecture for constructing provably robust bounds for general deep network\narchitectures, using Fenchel duality. Importantly, we derive the dual of each network operation in a\nfully modular fashion, simplifying the problem of deriving robust bounds of a network to bounding\nthe dual of individual functions. By building up a toolkit of dual operations, we can automatically\nconstruct the dual of any network architecture by iterating through the layers of the original network.\n\nThe adversarial problem for general networks We consider a generalized k \u201clayer\u201d neural\nnetwork f\u03b8 : R|x| \u2192 R|y| given by the equations\n\ni\u22121(cid:88)\n\nj=1\n\nzi =\n\nfij(zj),\n\nfor i = 2, . . . , k\n\n(1)\nwhere z1 = x, f\u03b8(x) \u2261 zk (i.e., the output of the network) and fij : R|zj| \u2192 R|zi| is some function\nfrom layer j to layer i. Importantly, this differs from prior work in two key ways. First, unlike the\nconjugate forms found in Wong and Kolter [2017], Dvijotham et al. [2018], we no longer assume\nthat the network consists of linear operations followed by activation functions, and instead opt to\nwork with an arbitrary sequence of k functions. This simpli\ufb01es the analysis of sequential non-linear\nactivations commonly found in modern architectures, e.g. max pooling or a normalization strategy\nfollowed by a ReLU,1 by analyzing each activation independently, whereas previous work would\nneed to analyze the entire sequence as a single, joint activation. Second, we allow layers to depend\nnot just on the previous layer, but also on all layers before it. This generalization applies to networks\nwith any kind of skip connections, e.g. residual networks and dense networks, and greatly expands\nthe set of possible architectures.\nLet B(x) \u2282 R|x|, represent some input constraint for the adversary. For this section we will focus\non an arbitrary norm ball B(x) = {x + \u2206 : (cid:107)\u2206(cid:107) \u2264 \u0001}. This is the constraint set considered for\nnorm-bounded adversarial perturbations, however other constraint sets can certainly be considered.\nThen, given an input example x, a known label y\u2217, and a target label ytarg, the problem of \ufb01nding the\nmost adversarial example within B (i.e., a so-called targeted adversarial attack) can be written as\n\nminimize\n\nzk\n\ncT zk, subject to zi =\n\nfij(zj),\n\nfor i = 2, . . . , k, z1 \u2208 B(x)\n\n(2)\n\ni\u22121(cid:88)\n\nj=1\n\n1Batch normalization, since it depends on entire minibatches, is formally not covered by the approach, but it\n\ncan be approximated by considering the scaling and shifting to be generic parameters, as is done at test time.\n\n3\n\n\fwhere c = ey(cid:63) \u2212 eytarg.\nDual networks via compositions of modular dual functions To bound the adversarial problem,\nwe look to its dual optimization problem using the machinery of Fenchel conjugate functions [Fenchel,\n1949], described in De\ufb01nition 1.\nDe\ufb01nition 1. The conjugate of a function f is another function f\u2217 de\ufb01ned by\n\nf\u2217(y) = max\n\nxT y \u2212 f (x)\n\n(3)\n\nSpeci\ufb01cally, we can lift the constraint zi+1 =(cid:80)i\n\nx\n\nj=1 fij(zj) from Equation 2 into the objective with\nan indicator function, and use conjugate functions to obtain a lower bound. For brevity, we will use\nthe subscript notation (\u00b7)1:i = ((\u00b7)1, . . . , (\u00b7)i), e.g. z1:i = (z1, . . . , zi). Due to the skip connections,\nthe indicator functions are not independent, so we cannot directly conjugate each individual indicator\nfunction. We can, however, still form its dual using the conjugate of a different indicator function\ncorresponding to the backwards direction, as shown in Lemma 1.\nLemma 1. Let the indicator function for the ith constraint be\n\nif zi =(cid:80)i\u22121\nfor i = 2, . . . , k, and consider the joint indicator function(cid:80)k\ni (\u2212\u03bdi, \u03bdi+1:k), where\nk(cid:88)\n\n(cid:26) 0\n1 z1 \u2212(cid:80)k\u22121\n\nlower bounded by max\u03bd1:k \u03bdT\n\nk zk \u2212 \u03bdT\n\u03c7\u2217\ni (\u03bdi:k) = max\n\n\u221e otherwise,\n\nj=1 fij(zj)\n\n\u03bdT\nj fji(zi)\n\ni=1 \u03c7\u2217\n\n\u03c7i(z1:i) =\n\n\u03bdT\ni zi +\n\nzi\n\nj=i+1\n\ni=2 \u03c7i(z1:i). Then, the joint indicator is\n\ni (\u03bdi:k) is the exact conjugate of the indicator for the set {xi:k :\n\nfor i = 1, . . . , k \u2212 1. Note that \u03c7\u2217\nxj = fji(xi) \u2200j > i}, which is different from the set indicated by \u03c7i. However, when\nthere are no skip connections (i.e. zi only depends on zi\u22121), \u03c7\u2217\nWe defer the proof of Lemma 1 to Appendix A.1. With structured upper bounds on these conjugate\nfunctions, we can bound the original adversarial problem using the dual network described in Theorem\n1. We can then optimize the bound using any standard deep learning toolkit using the same robust\noptimization procedure as in Wong and Kolter [2017] but using our bound instead. This amounts to\nminimizing the loss evaluated on our bound of possible network outputs under perturbations, as a\ndrop in replacement for the traditional network output. For the adversarial setting, note that the (cid:96)\u221e\nperturbation results in a dual norm of (cid:96)1.\nTheorem 1. Let gij and hi be any functions such that\n\ni is exactly the conjugate of \u03c7i.\n\nk(cid:88)\n\nj=i+1\n\n(4)\n\n(5)\n\n(6)\n\n(7)\n\ni (\u2212\u03bdi, \u03bdi+1:k) \u2264 hi(\u03bdi:k) subject to \u03bdi =\n\u03c7\u2217\n\ngij(\u03bdj)\n\nfor i = 1, . . . , k \u2212 1. Then, the adversarial problem from Equation 2 is lower bounded by\n\nJ(x, \u03bd1:k) = \u2212\u03bdT\n\n1 x \u2212 \u0001(cid:107)\u03bd1(cid:107)\u2217 \u2212 k\u22121(cid:88)\n\nhi(\u03bdi:k)\n\ni=1\n\nwhere (cid:107) \u00b7 (cid:107)\u2217 is the dual norm, and \u03bd1:k = g(c) is the output of a k layer neural network g on input c,\ngiven by the equations\n\n\u03bdk = \u2212c, \u03bdi =\n\ngij(\u03bdj+1),\n\nfor i = 1, . . . , k \u2212 1.\n\n(8)\n\nj=i\n\nWe denote the upper bound on the conjugate function from Equation 6 a dual layer, and defer the\nproof to Appendix A.2. To give a concrete example, we present two possible dual layers for linear\noperators and ReLU activations in Corollaries 1 and 2 (their derivations are in Appendix B), and we\nalso depict an example dual residual block in Figure 1.\n\n4\n\nk\u22121(cid:88)\n\n\fIdentity\n\nW1\n\nW3\n\nz1\n\nz2\n\nz3\n\nz4\n\nz5\n\nb1\n\nReLU2\n\nb3\n\nReLU4\n\nW T\n1\n\n\u03bd1\n\nIdentity\n\nW T\n3\n\n\u03bd2\n\n\u03bd3\n\nDualReLU2\n\n\u03bd4\n\n\u03bd5\n\nDualReLU4\n\nFigure 1: An example of the layers forming a typical residual block (left) and its dual (right), using\nthe dual layers described in Corollaries 1 and 2. Note that the bias terms of the residual network go\ninto the dual objective and are not part of the structure of the dual network, and the skip connections\nremain in the dual network but go in the opposite direction.\n\nCorollary 1. The dual layer for a linear operator \u02c6zi+1 = Wizi + bi is\n\n(9)\nCorollary 2. Suppose we have lower and upper bounds (cid:96)ij, uij on the pre-activations. The dual\nlayer for a ReLU activation \u02c6zi+1 = max(zi, 0) is\n\ni+1bi subject to \u03bdi = W T\n\ni \u03bdi+1.\n\n\u03c7\u2217\ni (\u03bdi:k) = \u03bdT\n\n(cid:96)i,j[\u03bdij]+ subject to \u03bdi = Di\u03bdi+1.\n\n(10)\n\ni (\u03bdi:k) \u2264 \u2212(cid:88)\n\n\u03c7\u2217\n\nj\u2208Ii\n\nwhere I\u2212\nrespectively, and where Di is a diagonal matrix with entries\n\ni ,I denote the index sets where the bounds are negative, positive or spanning the origin\n\ni ,I +\n\n\uf8f1\uf8f2\uf8f3 0\n\n(Di)jj =\n\n1\nui,j\u2212(cid:96)i,j\n\nui,j\n\ni\n\nj \u2208 I\u2212\nj \u2208 I +\nj \u2208 Ii\n\ni\n\n.\n\n(11)\n\nWe brie\ufb02y note that these dual layers recover the original dual network described in Wong and Kolter\n[2017]. Furthermore, the dual linear operation is the exact conjugate and introduces no looseness to\nthe bound, while the dual ReLU uses the same relaxation used in Ehlers [2017], Wong and Kolter\n[2017]. More generally, the strength of the bound from Theorem 1 relies entirely on the tightness of\nthe individual dual layers to their respective conjugate functions in Equation 6. While any gij, hi can\nbe chosen to upper bound the conjugate function, a tighter bound on the conjugate results in a tighter\nbound on the adversarial problem.\nIf the dual layers for all operations are linear, the bounds for all layers can be computed with a single\nforward pass through the dual network using a direct generalization of the form used in Wong and\nKolter [2017] (due to their similarity, we defer the exact algorithm to Appendix F). By trading off\ntightness of the bound with computational ef\ufb01ciency by using linear dual layers, we can ef\ufb01ciently\ncompute all bounds and construct the dual network one layer at a time. The end result is that we\ncan automatically construct dual networks from dual layers in a fully modular fashion, completely\nindependent of the overall network architecture (similar to how auto-differentiation tools proceed one\nfunction at a time to compute all parameter gradients using only the local gradient of each function).\nWith a suf\ufb01ciently comprehensive toolkit of dual layers, we can compute provable bounds on the\nadversarial problem for any network architecture.\nFor other dual layers, we point the reader to two resources. For the explicit form of dual layers\nfor hardtanh, batch normalization, residual connections, we direct the reader to Appendix B. For\nanalytical forms of conjugate functions of other activation functions such as tanh, sigmoid, and max\npooling, we refer the reader to Dvijotham et al. [2018].\n\n3.2 Ef\ufb01cient bound computation for (cid:96)\u221e perturbations via random projections\n\nReLU networks, it is computationally expensive to calculate (cid:107)\u03bd1(cid:107)1 and(cid:80)\n\nA limiting factor of the proposed algorithm and the work of Wong and Kolter [2017] is its computa-\ntional complexity: for instance, to compute the bounds exactly for (cid:96)\u221e norm bounded perturbations in\n(cid:96)ij[\u03bdij]+. In contrast\ni+1bi which require only sending a single bias vector through the dual network,\n\nto other terms like \u03bdT\n\nj\u2208Ii\n\n5\n\n\fAlgorithm 1 Estimating (cid:107)\u03bd1(cid:107)1 and(cid:80)\n\nj\u2208I (cid:96)ij[\u03bdij]+\n\ninput: Linear dual network operations gij, projection dimension r, lower bounds (cid:96)ij, dij from\nEquation 13, layer-wise sizes |zi|\nR(1)\n1\nfor i = 2, . . . , k do\n\n:= Cauchy(r,|z1|) // initialize random matrix for (cid:96)1 term\n\n// pass each term forward through the network\nfor j = 1, . . . , i \u2212 1 do\nk=1 gT\n\n:=(cid:80)i\u22121\n\nki(R(k)\n\nj\n\nki(S(k)\n\ni\n\n)\n\ni\n\nk=1 gT\n\n),(cid:80)i\u22121\n(cid:16)\u2212 median(|R(k)\n\nj , S(i)\n\nR(i)\nend for\nR(i)\ni\ni\nend for\noutput: median(|R(k)\n\n, S(i)\n\n1 |), 0.5\n\n:= diag(di)Cauchy(|zi|, r), di // initialize terms for layer i\n\n(cid:17)\n\n(cid:16)\u2212 median(|R(k)\n\nk |) + S(k)\n\nk\n\n(cid:17)\n\n2 |) + S(k)\n\n2\n\n, . . . , 0.5\n\nthe matrices \u03bd1 and \u03bdi,Ii must be explicitly formed by sending an example through the dual network\nfor each input dimension and for each j \u2208 Ii, which renders the entire computation quadratic in\nthe number of hidden units. To scale the method for larger, ReLU networks with (cid:96)\u221e perturbations,\nwe look to random Cauchy projections. Note that for an (cid:96)2 norm bounded adversarial perturbation,\nthe dual norm is also an (cid:96)2 norm, so we can use traditional random projections [Vempala, 2005].\nExperiments for the (cid:96)2 norm are explored further in Appendix H. However, for the remainder of this\nsection we focus on the (cid:96)1 case arising from (cid:96)\u221e perturbations.\n\n(cid:88)\n\n(cid:26)\n\n(cid:1) , di,j =\n\n(cid:0)\u2212 median(|\u03bdT\n\nnetworks, and use a variation to estimate(cid:80)\n\nEstimating with Cauchy random projections From the work of Li et al. [2007], we can use the\nsample median estimator with Cauchy random projections to directly estimate (cid:107)\u03bd1(cid:107)1 for linear dual\nj\u2208I (cid:96)ij[\u03bdij]+, as shown in Theorem 2 (the proof is in\n\nAppendix D.1).\nTheorem 2. . Let \u03bd1:k be the dual network from Equation 1 with linear dual layers and let r > 0 be\nthe projection dimension. Then, we can estimate\n\n(12)\nwhere R is a |z1| \u00d7 r standard Cauchy random matrix and the median is taken over the second axis.\nFurthermore, we can estimate\n\n(cid:107)\u03bd1(cid:107)1 \u2248 median(|\u03bdT\n\n1 R|)\n\nui,j\n\nui,j\u2212(cid:96)i,j\n0\n\nj (cid:54)\u2208 Ii\nj \u2208 Ii\n\n(cid:96)ij[\u03bdij]+ \u2248 1\n2\n\ni diag(di)R|) + \u03bdT\ni di\n\nj\u2208I\n\n(13)\nwhere R is a |zi| \u00d7 r standard Cauchy random matrix, and the median is taken over the second axis.\nThis estimate has two main advantages: \ufb01rst, it is simple to compute, as evaluating \u03bdT\n1 R involves\npassing the random matrix forward through the dual network (similarly, the other term requires\npassing a modi\ufb01ed random matrix through the dual network; the exact algorithm is detailed in 1).\nSecond, it is memory ef\ufb01cient in the backward pass, as the gradient need only propagate through the\nmedian entries.\nThese random projections reduce the computational complexity of computing these terms to piping r\nrandom Cauchy vectors (and an additional vector) through the network. Crucially, the complexity\nis no longer a quadratic function of the network size: if we \ufb01x the projection dimension to some\nconstant r, then the computational complexity is now linear with the input dimension and Ii. Since\nprevious work was either quadratic or combinatorially expensive to compute, estimating the bound\nwith random projections is the fastest and most scalable approach towards training robust networks\nthat we are aware of. At test time, the bound can be computed exactly, as the gradients no longer need\nto be stored. However, if desired, it is possible to use a different estimator (speci\ufb01cally, the geometric\nestimator) for the (cid:96)\u221e norm to calculate high probability bounds on the adversarial problem, which is\ndiscussed in Appendix E.1.\n\n3.3 Bias reduction with cascading ensembles\n\nA \ufb01nal major challenge of training models to minimize a robust bound on the adversarial loss, is that\nthe robustness penalty acts as a regularization. For example, in a two-layer ReLU network, the robust\n\n6\n\n\fTable 1: Number of hidden units, parameters, and time per epoch for various architectures.\n\nModel Dataset # hidden units # parameters Time (s) / epoch\n74\nSmall MNIST\n48\nCIFAR\nLarge MNIST\n667\n466\nCIFAR\n2174\nResnet MNIST\nCIFAR\n1685\n\n166406\n214918\n1974762\n2466858\n3254562\n4214850\n\n4804\n6244\n28064\n62464\n82536\n107496\n\nTable 2: Results on MNIST, and CIFAR10 with small networks, large networks, residual networks,\nand cascaded variants.\n\nSingle model error\n\nCascade error\n\nDataset Model\nMNIST\nSmall, Exact 0.1\nMNIST\n0.1\nSmall\nMNIST\n0.1\nLarge\n0.3\nMNIST\nSmall\nMNIST\nLarge\n0.3\n2/255\nCIFAR10 Small\n2/255\nCIFAR10 Large\n2/255\nCIFAR10 Resnet\nCIFAR10 Small\n8/255\n8/255\nCIFAR10 Large\nCIFAR10 Resnet\n8/255\n\nEpsilon Robust Standard Robust Standard\n-\n1.26%\n-\n4.48%\n3.13%\n1.37% 3.13%\n4.99%\n3.67% 1.08% 3.42%\n3.18%\n43.10% 14.87% 33.64% 33.64%\n45.66% 12.61% 41.62% 35.24%\n52.75% 38.91% 39.35% 39.35%\n46.59% 31.28% 38.84% 36.08%\n46.11% 31.72% 36.41% 35.93%\n79.25% 72.24% 71.71% 71.71%\n83.43%\n80.56 79.24% 79.14%\n78.22% 71.33% 70.95% 70.77%\n\nloss penalizes \u0001(cid:107)\u03bd1(cid:107)1 = \u0001(cid:107)W1D1W2(cid:107)1, which effectively acts as a regularizer on the network with\nweight \u0001. Because of this, the resulting networks (even those with large representational capacity),\nare typically overregularized to the point that many \ufb01lters/weights become identically zero (i.e., the\nnetwork capacity is not used).\nTo address this point, we advocate for using a robust cascade of networks: that is, we train a sequence\nof robust classi\ufb01ers, where later elements of the cascade are trained (and evaluated) only on those\nexamples that the previous elements of the cascade cannot certify (i.e., those examples that lie within\n\u0001 of the decision boundary). This procedure is formally described in the Appendix in Algorithm 2.\n\n4 Experiments\n\nDataset and Architectures We evaluate the techniques in this paper on two main datasets: MNIST\ndigit classi\ufb01cation [LeCun et al., 1998] and CIFAR10 image classi\ufb01cation [Krizhevsky, 2009].2 We\ntest on a variety of deep and wide convolutional architectures, with and without residual connec-\ntions. All code for these experiments is available at https://github.com/locuslab/convex_\nadversarial/. The small network is the same as that used in [Wong and Kolter, 2017], with\ntwo convolutional layers of 16 and 32 \ufb01lters and a fully connected layer of 100 units. The large\nnetwork is a scaled up version of it, with four convolutional layers with 32, 32, 64, and 64 \ufb01lters,\nand two fully connected layers of 512 units. The residual networks use the same structure used by\n[Zagoruyko and Komodakis, 2016] with 4 residual blocks with 16, 16, 32, and 64 \ufb01lters. We highlight\na subset of the results in Table 2, and brie\ufb02y describe a few key observations below. We leave\nmore extensive experiments and details regarding the experimental setup in Appendix G, including\nadditional experiments on (cid:96)2 perturbations. All results except where otherwise noted use random\nprojection of 50 dimensions.\n\n2We fully realize the irony of a paper with \u201cscaling\" in the title that currently maxes out on CIFAR10\nexperiments. But we emphasize that when it comes to certi\ufb01ably robust networks, the networks we consider here,\nas we illustrate below in Table 1, are more than an order of magnitude larger than any that have been considered\npreviously in the literature. Thus, our emphasis is really on the potential scaling properties of these approaches\nrather than large-scale experiments on e.g. ImageNet sized data sets.\n\n7\n\n\fFigure 2: Training and testing robust error curves over epochs on the MNIST dataset using k\nprojection dimensions. The \u0001 value for training is scheduled from 0.01 to 0.1 over the \ufb01rst 20 epochs.\nThe projections force the model to generalize over higher variance, reducing the generalization gap.\n\nFigure 3: Robust error curves as we add models to the cascade for the CIFAR10 dataset on a small\nmodel. The \u0001 value for training is scheduled to reach 2/255 after 20 epochs. The training curves are\nfor each individual model, and the testing curves are for the whole cascade up to the stage.\n\nSummary of results For the different data sets and models, the \ufb01nal robust and nominal test errors\nare given in Table 2. We emphasize that in all cases we report the robust test error, that is, our upper\nbound on the possible test set error that the classi\ufb01er can suffer under any norm-bounded attack (thus,\nconsidering different empirical attacks is orthogonal to our main presentation and not something\nthat we include, as we are focused on veri\ufb01ed performance). As we are focusing on the particular\nrandom projections discussed above, all experiments consider attacks with bounded (cid:96)\u221e norm, plus\nthe ReLU networks highlighted above. On MNIST, the (non-cascaded) large model reaches a \ufb01nal\nrobust error of 3.7% for \u0001 = 0.1, and the best cascade reaches 3.1% error. This contrasts with the best\nprevious bound of 5.8% robust error for this epsilon, from [Wong and Kolter, 2017]. On CIFAR10,\nthe ResNet model achieves 46.1% robust error for \u0001 = 2/255, and the cascade lowers this to 36.4%\nerror. In contrast, the previous best veri\ufb01ed robust error for this \u0001, from [Dvijotham et al., 2018], was\n80%. While the robust error is naturally substantially higher for \u0001 = 8/255 (the amount typically\nconsidered in empirical works), we are still able to achieve 71% provable robust error; for comparison,\nthe best empirical robust performance against current attacks is 53% error at \u0001 = 8/255 Madry et al.\n[2017], and most heuristic defenses have been broken to beyond this error Athalye et al. [2018].\n\nNumber of random projections\nIn the MNIST dataset (the only data set where it is trivial to\nrun exact training without projection), we have evaluated our approach using different projection\ndimensions as well as exact training (i.e., without random projections). We note that using substan-\ntially lower projection dimension does not have a signi\ufb01cant impact on the test error. This fact is\nhighlighted in Figure 2. Using the same convolutional architecture used by Wong and Kolter [2017],\nwhich previously required gigabytes of memory and took hours to train, it is suf\ufb01cient to use only 10\nrandom projections to achieve comparable test error performance to training with the exact bound.\nEach training epoch with 10 random projections takes less than a minute on a single GeForce GTX\n1080 Ti graphics card, while using less than 700MB of memory, achieving signi\ufb01cant speedup and\nmemory reduction over Wong and Kolter [2017]. The estimation quality and the corresponding\nspeedups obtained are explored in more detail in Appendix E.6.\n\nCascades Finally, we consider the performance of the cascaded versus non-cascaded models. In all\ncases, cascading the models is able to improve the robust error performance, sometimes substantially,\nfor instance decreasing the robust error on CIFAR10 from 46.1% to 36.4% for \u0001 = 2/255. However,\nthis comes at a cost as well: the nominal error increases throughout the cascade (this is to be expected,\nsince the cascade essentially tries to force the robust and nominal errors to match). Thus, there is\n\n8\n\n020406010\u22121100Robust error (train)020406010\u22121100Robust error (test)Exactk=10k=50k=100k=150k=2000501004\u00d710\u221216\u00d710\u22121100Robust error (train)0501004\u00d710\u221216\u00d710\u22121100Robust error (test)Cascade 1Cascade 2Cascade 3Cascade 4Cascade 5\fsubstantial value to both improving the single-model networks and integrating cascades into the\nprediction.\n\n5 Conclusion\n\nIn this paper, we have presented a general methodology for deriving dual networks from compositions\nof dual layers based on the methodology of conjugate functions to train classi\ufb01ers that are provably\nrobust to adversarial attacks. Importantly, the methodology is linearly scalable for ReLU based\nnetworks against (cid:96)\u221e norm bounded attacks, making it possible to train large scale, provably robust\nnetworks that were previously out of reach, and the obtained bounds can be improved further\nwith model cascades. While this marks a signi\ufb01cant step forward in scalable defenses for deep\nnetworks, there are several directions for improvement. One particularly important direction is\nbetter architecture development: a wide range of functions and activations not found in traditional\ndeep residual networks may have better robustness properties or more ef\ufb01cient dual layers that also\nallow for scalable training. But perhaps even more importantly, we also need to consider the nature\nof adversarial perturbations beyond just norm-bounded attacks. Better characterizing the space of\nperturbations that a network \u201cshould\u201d be resilient to represents one of the major challenges going\nforward for adversarial machine learning.\n\nReferences\nAnish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of\nsecurity: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.\n\nNicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In\n\nSecurity and Privacy (SP), 2017 IEEE Symposium on, pages 39\u201357. IEEE, 2017.\n\nChih-Hong Cheng, Georg N\u00fchrenberg, and Harald Ruess. Maximum resilience of arti\ufb01cial neural\nnetworks. In International Symposium on Automated Technology for Veri\ufb01cation and Analysis,\npages 251\u2013268. Springer, 2017.\n\nKrishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann, and Pushmeet Kohli. A\ndual approach to scalable veri\ufb01cation of deep networks. arXiv preprint arXiv:1803.06567, 2018.\n\nRuediger Ehlers. Formal veri\ufb01cation of piece-wise linear feed-forward neural networks. In Interna-\n\ntional Symposium on Automated Technology for Veri\ufb01cation and Analysis, 2017.\n\nWerner Fenchel. On conjugate convex functions. Canad. J. Math, 1(73-77), 1949.\n\nTimon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin\nVechev. AI2: Safety and robustness certi\ufb01cation of neural networks with abstract interpretation. In\nIEEE Conference on Security and Privacy, 2018.\n\nIan Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial\nexamples. In International Conference on Learning Representations, 2015. URL http://arxiv.\norg/abs/1412.6572.\n\nMatthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classi\ufb01er\nagainst adversarial manipulation. In Advances in Neural Information Processing Systems. 2017.\n\nXiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety veri\ufb01cation of deep neural\nnetworks. In International Conference on Computer Aided Veri\ufb01cation, pages 3\u201329. Springer,\n2017.\n\nAlex Krizhevsky. Learning multiple layers of features from tiny images. 2009.\n\nAlexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In\n\nInternational Conference on Learning Representations, 2017.\n\nAlexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu\nPang, Jun Zhu, Xiaolin Hu, Cihang Xie, et al. Adversarial attacks and defences competition. arXiv\npreprint arXiv:1804.00097, 2018.\n\n9\n\n\fYann LeCun, L\u00e9on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to\n\ndocument recognition. Proceedings of the IEEE, 86(11):2278\u20132324, 1998.\n\nPing Li, Trevor J Hastie, and Kenneth W Church. Nonlinear estimators and tail bounds for dimension\nreduction in l1 using cauchy random projections. Journal of Machine Learning Research, 8(Oct):\n2497\u20132532, 2007.\n\nAlessio Lomuscio and Lalit Maganti. An approach to reachability analysis for feed-forward relu\n\nneural networks. arXiv preprint arXiv:1706.07351, 2017.\n\nAleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\nTowards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083,\n2017.\n\nJan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial\n\nperturbations. In International Conference on Learning Representations, 2017.\n\nNicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a\ndefense to adversarial perturbations against deep neural networks. In Security and Privacy (SP),\n2016 IEEE Symposium on, pages 582\u2013597. IEEE, 2016.\n\nAditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial\n\nexamples. In International Conference on Learning Representations, 2018.\n\nAman Sinha, Hongseok Namkoong, and John Duchi. Certi\ufb01able distributional robustness with\nprincipled adversarial training. In International Conference on Learning Representations, 2018.\n\nVincent Tjeng and Russ Tedrake. Verifying neural networks with mixed integer programming. CoRR,\n\nabs/1711.07356, 2017. URL http://arxiv.org/abs/1711.07356.\n\nSantosh S Vempala. The random projection method, volume 65. American Mathematical Soc., 2005.\n\nEric Wong and J Zico Kolter. Provable defenses against adversarial examples via the convex outer\n\nadversarial polytope. arXiv preprint arXiv:1711.00851, 2017.\n\nSergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146,\n\n2016.\n\n10\n\n\f", "award": [], "sourceid": 5094, "authors": [{"given_name": "Eric", "family_name": "Wong", "institution": "Carnegie Mellon University"}, {"given_name": "Frank", "family_name": "Schmidt", "institution": "Robert Bosch GmbH"}, {"given_name": "Jan Hendrik", "family_name": "Metzen", "institution": "Robert Bosch GmbH"}, {"given_name": "J. Zico", "family_name": "Kolter", "institution": "Carnegie Mellon University / Bosch Center for AI"}]}