{"title": "A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 9835, "page_last": 9846, "abstract": "Verification of neural networks enables us to gauge their robustness against adversarial attacks. Verification algorithms fall into two categories: exact verifiers that run in exponential time and relaxed verifiers that are efficient but incomplete. In this paper, we unify all existing LP-relaxed verifiers, to the best of our knowledge, under a general convex relaxation framework. This framework works for neural networks with diverse architectures and nonlinearities and covers both primal and dual views of neural network verification. Next, we perform large-scale experiments, amounting to more than 22 CPU-years, to obtain exact solution to the convex-relaxed problem that is optimal within our framework for ReLU networks. We find the exact solution does not significantly improve upon the gap between PGD and existing relaxed verifiers for various networks trained normally or robustly on MNIST and CIFAR datasets. Our results suggest there is an inherent barrier to tight verification for the large class of methods captured by our framework. We discuss possible causes of this barrier and potential future directions for bypassing it.", "full_text": "A Convex Relaxation Barrier to Tight Robustness\n\nVeri\ufb01cation of Neural Networks\n\nHadi Salman\u2217\n\nMicrosoft Research AI\n\nhadi.salman@microsoft.com\n\nGreg Yang\n\nMicrosoft Research AI\n\ngregyang@microsoft.com\n\nHuan Zhang\n\nUCLA\n\nhuan@huan-zhang.com\n\nCho-Jui Hsieh\n\nUCLA\n\nchohsieh@cs.ucla.edu\n\nPengchuan Zhang\n\nMicrosoft Research AI\n\npenzhan@microsoft.com\n\nAbstract\n\nVeri\ufb01cation of neural networks enables us to gauge their robustness against ad-\nversarial attacks. Veri\ufb01cation algorithms fall into two categories: exact veri\ufb01ers\nthat run in exponential time and relaxed veri\ufb01ers that are ef\ufb01cient but incom-\nplete.\nIn this paper, we unify all existing LP-relaxed veri\ufb01ers, to the best of\nour knowledge, under a general convex relaxation framework. This framework\nworks for neural networks with diverse architectures and nonlinearities and covers\nboth primal and dual views of neural network veri\ufb01cation. Next, we perform\nlarge-scale experiments, amounting to more than 22 CPU-years, to obtain exact\nsolution to the convex-relaxed problem that is optimal within our framework for\nReLU networks. We \ufb01nd the exact solution does not signi\ufb01cantly improve upon\nthe gap between PGD and existing relaxed veri\ufb01ers for various networks trained\nnormally or robustly on MNIST and CIFAR datasets. Our results suggest there\nis an inherent barrier to tight veri\ufb01cation for the large class of methods captured\nby our framework. We discuss possible causes of this barrier and potential fu-\nture directions for bypassing it. Our code and trained models are available at\nhttp://github.com/Hadisalman/robust-verify-benchmark2.\n\nIntroduction\n\n1\nA classi\ufb01cation neural network f : Rn \u2192 RK (where fi(x) should be thought of as the ith logit) is\nconsidered adversarially robust with respect to an input x and its neighborhood Sin(x) if\n\ni\u2217 = arg max\n\nj\n\nx(cid:48)\u2208Sin(x),i(cid:54)=i\u2217 fi\u2217 (x) \u2212 fi(x(cid:48)) > 0, where\n\nmin\n\nfj(x).\n\n(1)\n\nMany recent works have proposed robustness veri\ufb01cation methods by lower-bounding eq. (1); the\npositivity of this lower bound proves the robustness w.r.t. Sin(x). A dominant approach thus far has\ntried to relax eq. (1) into a convex optimization problem, from either the primal view [Zhang et al.,\n2018, Gehr et al., 2018, Singh et al., 2018, Weng et al., 2018] or the dual view [Wong and Kolter,\n2018, Dvijotham et al., 2018b, Wang et al., 2018b]. In our \ufb01rst main contribution, we propose a\nlayer-wise convex relaxation framework that uni\ufb01es these works and reveals the relationships between\nthem (Fig. 1). We further show that the performance of methods within this framework is subject to a\ntheoretical limit: the performance of the optimal layer-wise convex relaxation.\nThis then begs the question: is the road to fast and accurate robustness veri\ufb01cation paved by just faster\nand more accurate layer-wise convex relaxation that approaches the theoretical limit? In our second\nmain contribution, we answer this question in the negative. We perform extensive experiments\n\n\u2217Work done as part of the Microsoft AI Residency Program.\n2Please see http://arxiv.org/abs/1902.08722 for the full and most recent version of this paper.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: Relationship between existing relaxed algorithms and our framework. See Appendix D for\ndetailed discussions of each unlabeled arrow from the \u201cPrimal view\u201d side.\n\nwith deep ReLU networks to compute the optimal layer-wise convex relaxation and compare with\nthe LP-relaxed dual formulation from Wong and Kolter [2018], the PGD attack from Madry et al.\n[2017], and the mixed integer linear programming (MILP) exact veri\ufb01er from Tjeng et al. [2019].\nOver different models, sizes, training methods, and datasets (MNIST and CIFAR-10), we \ufb01nd that (i)\nin terms of lower bounding the minimum l\u221e adversarial distortion3, the optimal layer-wise convex\nrelaxation only slightly improves the lower bound found by Wong and Kolter [2018], especially when\ncompared with the upper bound provided by the PGD attack, which is consistently 1.5 to 5 times\nlarger; (ii) in terms of upper bounding the robust error, the optimal layer-wise convex relaxation does\nnot signi\ufb01cantly close the gap between the PGD lower bound (or MILP exact answer) and the upper\nbound from Wong and Kolter [2018]. Therefore, there seems to be an inherent barrier blocking our\nprogress on this road of layer-wise convex relaxation, and we hope this work provokes much thought\nin the community on how to bypass it.\n\n2 Preliminaries and Related Work\n\nExact veri\ufb01ers and NP-completeness. For ReLU networks (piece-wise linear networks in general),\nexact veri\ufb01ers solve the robustness veri\ufb01cation problem (1) by typically employing MILP solvers\n[Cheng et al., 2017, Lomuscio and Maganti, 2017, Dutta et al., 2018, Fischetti and Jo, 2017, Tjeng\net al., 2019, Xiao et al., 2019] or Satis\ufb01ability Modulo Theories (SMT) solvers [Scheibler et al.,\n2015, Katz et al., 2017, Carlini et al., 2017, Ehlers, 2017]. However, due to the NP-completeness for\nsolving such a problem [Katz et al., 2017, Weng et al., 2018], it can be really challenging to scale\nthese to large networks. It can take Reluplex [Katz et al., 2017] several hours to \ufb01nd the minimum\ndistortion of an example for a ReLU network with 5 inputs, 5 outputs, and 300 neurons. A recent\nwork by Tjeng et al. [2019] uses MILP to exactly verify medium-size networks, but the veri\ufb01cation\ntime is very sensitive to how a network is trained; for example, it is fast for networks trained using\nthe LP-relaxed dual formulation of Wong and Kolter [2018], but much slower for normally trained\nnetworks. A concurrent work by Xiao et al. [2019] trains networks with the objective of speeding up\nthe MILP veri\ufb01cation problem, but this compromises on the performance of the network.\n\nRelaxed and ef\ufb01cient veri\ufb01ers. These veri\ufb01ers solve a relaxed, but more computationally ef\ufb01cient,\nversion of (1), and have been proposed from different perspectives. From the primal view, one can\nrelax the nonlinearity in (1) into linear inequality constraints. This perspective has been previously\nexplored as in the framework of \u201cabstract transformers\u201d [Singh et al., 2018, 2019a,b, Gehr et al.,\n\n3The radius of the largest l\u221e ball in which no adversarial examples can be found.\n\n2\n\nDeepZ(Singh et al., 2018)DeepPoly(Singh et al., 2019)Neurify(Wang et al., 2018)Fast-Lin(Weng & Zhang et al., 2018)CROWN(Zhang & Weng et al., 2018)Optimal Convex Relaxation \u201cProblem (C)\u201d With Eq. (6) & (7)Abstract TransformersLinear Outer BoundsPrimal ViewDual ViewLagrangianDual (Dvijotham et al., 2018Qin et al., 2019)LP-Relaxed Dual(Wong & Kolter, 2018)LegendTheorem 4.2Similar strengthWeaker / more relaxedNeural Network Verification\u201cProblem (O)\u201dCorollary 4.3Section 3Section 4GapConvex Relaxation Barrier\f2018, Mirman et al., 2018], via linear outer bounds of activation functions [Zhang et al., 2018, Weng\net al., 2018, Wang et al., 2018a,b], or via interval bound propagation [Gowal et al., 2018, Mirman\net al., 2018]. From the dual view, one can study the dual of the relaxed problem [Wong and Kolter,\n2018, Wong et al., 2018] or study the dual of the original nonconvex veri\ufb01cation problem [Dvijotham\net al., 2018b,a, Qin et al., 2019]. In this paper, we unify both views in a common convex relaxation\nframework for NN veri\ufb01cation, clarifying their relationships (as summarized in Fig. 1).\nRaghunathan et al. [2018b] formulates the veri\ufb01cation of ReLU networks as a quadratic programming\nproblem and then relaxes and solves this problem with a semide\ufb01nite programming (SDP) solver.\nWhile our framework does not cover this SDP relaxation, it is not clear to us how to extend the\nSDP relaxed veri\ufb01er to general nonlinearities, for example max-pooling, which can be done in our\nframework on the other hand. Other veri\ufb01ers have been proposed to certify via an intermediary\nstep of bounding the local Lipschitz constant [Hein and Andriushchenko, 2017, Weng et al., 2018,\nRaghunathan et al., 2018a, Zhang et al., 2019], and others have used randomized smoothing to certify\nwith high-probability [Lecuyer et al., 2018, Li et al., 2018, Cohen et al., 2019, Salman et al., 2019].\nThese are outside the scope of our framework.\nCombining exact and relaxed veri\ufb01ers, hybrid methods have shown some effectiveness [Bunel et al.,\n2018, Singh et al., 2019b]. In fact, many exact veri\ufb01ers also use relaxation as a subroutine to speed\nthings up, and hence can be viewed as hybrid methods as well. In this paper, we are not concerned\nwith such techniques but only focus on relaxed veri\ufb01ers.\n\n3 Convex Relaxation from the Primal View\n\nIn this paper, we assume that the neighborhood Sin(xnom) is a convex set. An\nProblem setting.\nexample of this is Sin(xnom) = {x : (cid:107)x \u2212 xnom(cid:107)\u221e \u2264 \u0001}, which is the constraint on x in the (cid:96)\u221e\nadversarial attack model. We also assume that f (x) is an L-layer feedforward NN. For notational\nsimplicity, we denote {0, 1, . . . , L \u2212 1} by [L] and {x(0), x(1), . . . , x(L\u22121)} by x[L]. We de\ufb01ne f (x)\nas,\n\nz\n\nand\n\nx(l+1) = \u03c3(l)(W(l)x(l) + b(l)) \u2200l \u2208 [L],\n\nf (x) := z(L) = W(L)x(L) + b(L),\n\n, x(0) := x \u2208 Rn(0) is the input, W(l) \u2208 Rn(l)\n\n(2)\nwhere x(l) \u2208 Rn(l), z(l) \u2208 Rn(l)\nz \u00d7n(l) and b(l) \u2208 Rn(l)\nz \u2192 Rn(l+1) is a (nonlinear)\nare the weight matrix and bias vector of the lth linear layer, and \u03c3(l) : Rn(l)\nactivation function like (leaky-)ReLU, the sigmoid family (including sigmoid, arctan, hyperbolic\ntangent, etc), and the pooling family (MaxPool, AvgPool, etc). Our results can be easily extended to\nnetworks with convolutional layers and skip connections as well, similar to what is done in Wong\net al. [2018], as these can be seen as special forms of (2).\nConsider the following optimization problem O(c, c0, L, z[L], z[L]):\n\nz\n\n(x[L+1],z[L])\u2208D c(cid:62)x(L) + c0\n\nmin\n\ns.t.\n\nz(l) = W(l)x(l) + b(l), l \u2208 [L],\nx(l+1) = \u03c3(l)(z(l)), l \u2208 [L],\n\n(O)\n\ndomain D is\n\nthe\n\nwhere\npreactivations\n{x(0), x(1), . . . , x(L), z(0), z(1), . . . , z(L\u22121)} satisfying the bounds z(l) \u2264 z(l) \u2264 z(l) \u2200l \u2208 [L], i.e.,\n\noptimization\n\nactivations\n\nand\n\nthe\n\nset\n\nof\n\nD =(cid:8)(x[L+1], z[L]) : x(0) \u2208 Sin(xnom),\n\nz(l) \u2264 z(l) \u2264 z(l), l \u2208 [L](cid:9).\n\ninom,: \u2212 W(L)\n\n(3)\n, z[L] = \u2212\u221e, and z[L] = \u221e, then (O) is equivalent to\nIf c(cid:62) = W(L)\nproblem (1). However, when we have better information about valid bounds z[l] and z[l] of z[l], we\ncan signi\ufb01cantly narrow down the optimization domain and, as will be detailed shortly, achieve tighter\nsolutions when we relax the nonlinearities. We denote the minimal value of O(c, c0, L, z[L], z[L]) by\np\u2217(c, c0, L, z[L], z[L]), or just p\u2217\n\nO when no confusion arises.\n\ni,: , c0 = b(L)\n\ninom \u2212 b(L)\n\ni\n\nObtaining lower and upper bounds (z[L], z[L]) by solving sub-problems. This can be done by\nrecursively solving (O) with speci\ufb01c choices of c and c0, which is a common technique used in many\n\n3\n\n\fFigure 2: Optimal convex relaxations for common nonlinearities. For tanh, the relaxation contains\ntwo linear segments and parts of the tanh function. For ReLU and the step function, the optimal\nrelaxations are written as 3 and 4 linear constraints, respectively. For z = max(x, y), the light orange\nshadow indicates the pre-activation bounds for x and y, and the optimal convex relaxation is lower\nbounded by the max function itself.\n\nworks [Wong and Kolter, 2018, Dvijotham et al., 2018b]. For example, one can obtain z((cid:96))\n, a lower\nj\nbound of z((cid:96))\n, (cid:96), z[(cid:96)], z[(cid:96)]); this shows that one can estimate z(l) and z(l)\ninductively in l. However, we may have millions of sub-problems to solve because practical networks\ncan have millions of neurons. Therefore, it is crucial to have ef\ufb01cient algorithms to solve (O).\n\n, by solving O(W((cid:96))\n\n(cid:62), b((cid:96))\n\nj,:\n\nj\n\nj\n\nConvex relaxation in the primal space. Due to the nonlinear activation functions \u03c3(l), the feasible\nset of (O) is nonconvex, which leads to the NP-completeness of the neural network veri\ufb01cation\nproblem [Katz et al., 2017, Weng et al., 2018]. One natural idea is to do convex relaxation of its\nfeasible set. Speci\ufb01cally, one can relax the nonconvex equality constraint x(l+1) = \u03c3(l)(z(l)) to\nconvex inequality constraints, i.e.,\n\ns.t.\n\n(cid:62)\n\nc\n\nmin\n\nx(L) + c0\n\nz(l) = W(l)x(l) + b(l), \u03c3(l)(z(l)) \u2264 x(l+1) \u2264 \u03c3(l)(z(l)),\u2200l \u2208 [L],\n\n(C)\n(x[L+1],z[L])\u2208D\nwhere \u03c3(l)(z) ( \u03c3(l)(z)) is convex (concave) and satis\ufb01es \u03c3(l)(z) \u2264 \u03c3(l)(z) \u2264 \u03c3(l)(z) for z(l) \u2264 z \u2264\nC. Naturally, we have that SC is\nz(l). We denote the feasible set of (C) by SC and its minimum by p\u2217\nconvex and p\u2217\nO. For example, Ehlers [2017] proposed the following relaxations for the ReLU\nfunction \u03c3ReLU (z) = max(0, z) and MaxPool \u03c3M P (z) = maxk zk:\n(cid:88)\n\n\u03c3ReLU (z) = max(0, z),\n\nz\u2212z (z \u2212 z) ,\n\n\u03c3ReLU (z) = z\n\nC \u2264 p\u2217\n\n(4)\n\n\u03c3M P (z) = max\n\nk\n\n(zk \u2212 zk) + max\n\nzk,\n\nk\n\nzk \u2265(cid:88)\n\nk\n\n\u03c3M P (z) =\n\nk\n\n(zk + zk) \u2212 max\n\nzk.\n\nk\n\n(5)\n\nThe optimal layer-wise convex relaxation. As a special case, we consider the optimal layer-wise\nconvex relaxation, where\n\n\u03c3opt(z) is the greatest convex function majored by \u03c3,\n\u03c3opt(z) is the smallest concave function majoring \u03c3.\n\n(6)\n\nA precise de\ufb01nition can be found in (12) in Appendix B. In Fig. 2, we show the optimal convex\nrelaxation for several common activation functions. It is easy to see that (4) is the optimal convex\nrelaxation for ReLU, but (5) is not optimal for the MaxPool function. Under mild assumptions\n(non-interactivity as de\ufb01ned in de\ufb01nition B.2), the optimal convex relaxation of a nonlinear layer\nx = \u03c3(z), i.e., its convex hull, is simply \u03c3opt(z) \u2264 x \u2264 \u03c3opt(z) (see proposition B.3). We denote the\ncorresponding optimal relaxed problem as Copt, with its objective p\u2217\nWe emphasize that by optimal, we mean the optimal convex relaxation of the single nonlinear\nconstraint x(l+1) = \u03c3(l)(z(l)) (see Proposition (B.3)) instead of the optimal convex relaxation of the\nnonconvex feasible set of the original problem (O). As such, techniques as in [Anderson et al., 2018,\nRaghunathan et al., 2018b] are outside our framework; see appendix C for more discussions.\n\nCopt.\n\nGreedily solving the primal with linear bounds. As another special case, when there are exactly\none linear upper bound and one linear lower bound for each nonlinear layer in (C) as follows:\n\n\u03c3(l)(z(l)) := a(l)z(l) + b\n\n(7)\nthe objective p\u2217\nC can be greedily bounded in a layer-by-layer manner. We can derive one linear\nupper and one linear lower bound of zL := cT xL + c0 with respect to z(L\u22121), using the fact that\n\n\u03c3(l)(z(l)) := a(l)z(l) + b(l).\n\n,\n\n(l)\n\n4\n\ntanhtanhtanh-2-112-1.0-0.50.51.0relurelurelu-1.0-0.50.51.00.20.40.60.81.01.2stepstepstep-1.0-0.50.51.0-0.20.20.40.60.81.01.2\fz(L) = cT \u03c3(L\u22121)(z(L\u22121)) + c0 and that \u03c3(L\u22121)(z(L\u22121)) is linearly upper and lower bounded by\n\u03c3(L\u22121)(z(L\u22121)) and \u03c3(L\u22121)(z(L\u22121)). Because a linear combination of linear bounds (coef\ufb01cients\nare related to the entries in c) can be relaxed to a single linear bound, we can apply this technique\nagain and replace z(L\u22121) with its upper and lower bounds with respect to z(L\u22122), obtaining the bound\nfor z(L) with respect to z(L\u22122). Applying this repeatedly eventually leads to linear lower and upper\nbounds of z(L) with respect to the input x(0) \u2208 Sin(xnom).\nThis perspective covers Fast-Lin [Weng et al., 2018], DeepZ [Singh et al., 2018] and Neurify [Wang\net al., 2018b], where the proposed linear lower bound has the same slope as the upper bound, i.e.,\na(l) = a(l). The resulting shape is referred to as a zonotope in Gehr et al. [2018] and Singh et al.\n[2018]. In CROWN [Zhang et al., 2018] and DeepPoly [Singh et al., 2019a], this restriction is lifted\nand they can achieve better veri\ufb01cation results than Fast-Lin and DeepZ. Fig. 1 summarizes the\nrelationships between these algorithms. Importantly, each of these works has its own merits on\nsolving the veri\ufb01cation problem; our focus here is to give a uni\ufb01ed view on how they perform convex\nrelaxation of the original veri\ufb01cation problem (O) in our framework. See Appendix D for more\ndiscussions and other related algorithms.\n\n4 Convex Relaxation from the Dual View\n\nWe now tackle the veri\ufb01cation problem from the dual view and connect it to the primal view.\nStrong duality for the convex relaxed problem. As in Wong and Kolter [2018], we introduce the\ndual variables for (C) and write its Lagrangian dual as\n\ngC(\u00b5[L], \u03bb[L], \u03bb\n\n[L]\n\n) :=\n\n(cid:62)\nc\n\nx(L) + c0 +\n\n\u00b5(l)(cid:62)\n\n(z(l) \u2212 W(l)x(l) \u2212 b(l))\n\nmin\n\n(x[L+1],z[L])\u2208D\n\n\u2212 L\u22121(cid:88)\n\n\u03bb(l)(cid:62)\n\n(x(l+1) \u2212 \u03c3(l)(z(l))) +\n\n(l)(cid:62)\n\n\u03bb\n\n(x(l+1) \u2212 \u03c3(l)(z(l))).\n\nL\u22121(cid:88)\nL\u22121(cid:88)\n\nl=0\n\n(8)\n\n(9)\n\nBy weak duality [Boyd and Vandenberghe, 2004],\n\nl=0\n\nl=0\n\nd\u2217\nC :=\n\nmax\n\n\u00b5[L],\u03bb[L]\u22650,\u03bb[L]\u22650\n\ngC(\u00b5[L], \u03bb[L], \u03bb\n\n[L]\n\n) \u2264 p\u2217\nC,\n\nbut in fact we can show strong duality under mild conditions as well (note that the following result\ncannot be obtained by trivially applying Slater\u2019s condition; see appendix E and \ufb01g. 4).\nTheorem 4.1 (p\u2217\ndomain [z(l), z(l)] for each l \u2208 [L]. Then strong duality holds between (C) and (9).\n\nC). Assume that both \u03c3(l) and \u03c3(l) have a \ufb01nite Lipschitz constant in the\n\nC = d\u2217\n\nThe optimal layer-wise dual relaxation. Theorem 4.1 shows that taking the dual of the layer-wise\nconvex relaxed problem (C) cannot do better than the original relaxation. To obtain a tighter dual\nproblem, one could directly study the Lagrangian dual of the original (O),\nL\u22121(cid:88)\n\ngO(\u00b5[L], \u03bb[L]) := minD c\n(10)\nwhere the min is taken over {(x[L+1], z[L]) \u2208 D}. This was \ufb01rst proposed in Dvijotham et al. [2018b].\nNote, again, by weak duality,\n\n(z(l) \u2212 W(l)x(l) \u2212 b(l)) +\n\n(x(l+1) \u2212 \u03c3(l)(z(l))),\n\nx(L) + c0 +\n\nL\u22121(cid:88)\n\n\u00b5(l)(cid:62)\n\n\u03bb(l)(cid:62)\n\nl=0\n\nl=0\n\n(cid:62)\n\nd\u2217\nO := max\n\u00b5[L],\u03bb[L]\n\ngO(\u00b5[L], \u03bb[L]) \u2264 p\u2217\nO,\n\n(11)\n\nO would seem to be strictly better than d\u2217\n\nand d\u2217\nTheorem 4.2 (d\u2217\nand the optimal layer-wise relaxation \u03c3(l)\nprovided by the dual of the optimal layer-wise convex-relaxed problem (9) and d\u2217\ndual of the original problem (11) are the same.\n\nCopt). Assume that the nonlinear layer \u03c3(l) is non-interactive (de\ufb01nition B.2)\nopt are de\ufb01ned in (6). Then the lower bound d\u2217\nCopt\nO provided by the\n\nC. Unfortunately, they turn out to be equivalent:\n\nopt and \u03c3(l)\n\nO = d\u2217\n\n5\n\n\fThe complete proof is in Appendix F 4. Theorem 4.2 combined with the strong duality result of\nTheorem 4.1 implies that the primal relaxation (C) and the two kinds of dual relaxations, (9) and (11),\nare all blocked by the same barrier. As concrete examples:\nO). Suppose that the nonlinear activation functions \u03c3(l) for all l \u2208 [L] are\nCorollary 4.3 (p\u2217\n(for example) among the following: ReLU, step, ELU, sigmoid, tanh, polynomials and max pooling\nwith disjoint windows. Assume that \u03c3(l)\nopt are de\ufb01ned in (6), respectively. Then we have that\nthe lower bound p\u2217\nO provided by\nthe dual relaxation (11) are the same.\n\nCopt provided by the primal optimal layer-wise relaxation (C) and d\u2217\n\nCopt = d\u2217\n\nopt and \u03c3(l)\n\nGreedily solving the dual with linear bounds. When the relaxed bounds \u03c3 and \u03c3 are linear as\nde\ufb01ned in (7), the dual objective (9) can be lower bounded as below:\n\n(cid:19)\n\n\u2212 b(l)(cid:62)\u00b5(l)\n\n+ c0 \u2212\n\nsup\n\nx\u2208Sin(xnom)\n\n(cid:16)\n\nW(0)(cid:62)\u00b5(0)(cid:17)(cid:62)\n\nx,\n\np\u2217\nC = d\u2217\n\n(cid:18)\n\nb\n\n(l)(cid:62)(cid:16)\n\n\u03bb(l)(cid:17)\nC \u2265 L\u22121(cid:88)\n\u03bb(L\u22121) = \u2212c, \u00b5(l) = a(l)(cid:16)\n\nl=0\n\n+\n\n\u2212 b(l)(cid:62)(cid:16)\n\u03bb(l)(cid:17)\n\n\u03bb(l)(cid:17)\n+ a(l)(cid:16)\n\n\u2212\n\n\u03bb(l)(cid:17)\n\nwhere the dual variables (\u00b5[L], \u03bb[L]) are determined by a backward propagation\n\n+\n\n\u03bb(l\u22121) = W(l)(cid:62)\u00b5(l) \u2200l \u2208 [L \u2212 1],\n\n\u2212 ,\n\nWe provide the derivation of this algorithm in Appendix G. It turns out that this algorithm can exactly\nrecover the algorithm proposed in Wong and Kolter [2018], where\n\n\u03c3(l)(z(l)) := \u03b1(l)z(l),\n\n\u03c3(l)(z(l)) := z(l)\n\nz(l)\u2212z(l) (z(l) \u2212 z(l)),\n\nand 0 \u2264 \u03b1(l) \u2264 1 represents the slope of the lower bound. When \u03b1(l) = z(l)\nz(l)\u2212z(l) , the greedy\nalgorithm also recovers Fast-Lin [Weng et al., 2018], which explains the arrow from Wong and Kolter\n[2018] to Weng et al. [2018] in Fig. 1. When \u03b1(l) is chosen adaptively as in CROWN [Zhang et al.,\n2018], the greedy algorithm then recovers CROWN, which explains the arrow from Wong and Kolter\n[2018] to Zhang et al. [2018] in Fig. 1. See Appendix D for more discussions on the relationship\nbetween the primal and dual greedy solvers.\n\n5 Optimal LP-relaxed Veri\ufb01cation\n\nIn the previous sections, we presented a framework that subsumes all existing layer-wise convex-\nrelaxed veri\ufb01cation algorithms except that of Raghunathan et al. [2018b]. For ReLU networks,\nbeing piece-wise linear, these correspond exactly to the set of all existing LP-relaxed algorithms, as\ndiscussed above. We showed the existence of a barrier, p\u2217\nC, that limits all such algorithms. Is this just\ntheoretical babbling or is this barrier actually problematic in practice?\nIn the next section, we perform extensive experiments on deep ReLU networks, evaluating the tightest\nconvex relaxation afforded by our framework (denoted LP-ALL) against a greedy dual algorithm\n(Algorithm 1 of Wong and Kolter [2018], denoted LP-GREEDY) as well as another algorithm LP-\nLAST, intermediate in speed and accuracy between them. Both LP-GREEDY and LP-LAST solve the\nbounds z[L], z[L] by setting the dual variables heuristically (see previous section), but LP-GREEDY\nsolves the adversarial loss in the same manner while LP-LAST solves this \ufb01nal LP exactly. We also\ncompare them with the opposite bounds provided by PGD attack [Madry et al., 2017], as well as\nexact results from MILP [Tjeng et al., 2019] 5.\nFor the rest of the main text, we are only concerned with ReLU networks, so (C) subject to (4) is in\nfact an LP.\n\n4Theorem 2 in Dvijotham et al. [2018b] is a special case of our Theorem 4.2, when applied to ReLU networks.\nOur proof makes use of the Fenchel-Moreau theorem to deal with general nonlinearities, which is different from\nthat in Dvijotham et al. [2018b].\n\n5Note that in practice (as in [Tjeng et al., 2019]), MILP has a time budget, and usually not every sample can\nbe veri\ufb01ed within that budget, so that in the end we still obtain only lower and upper bounds given by samples\nveri\ufb01ed to be robust or nonrobust\n\n6\n\n\f5.1 LP-ALL Implementation Details\n\nIn order to exactly solve the tightest LP-relaxed veri\ufb01cation problem of a ReLU network, two steps\nare required: (A) obtaining the tightest pre-activation upper and lower bounds of all the neurons in\nthe NN, excluding those in the last layer, then (B) solving the LP-relaxed veri\ufb01cation problem exactly\nfor the last layer of the NN.\n\nStep A: Obtaining Pre-activation Bounds. This can be done by solving sub-problems of the\norginial relaxed problem (C) subject to (4). Given a NN with L0 layers, for each layer l0 \u2208 [L0], we\n, for all neurons j \u2208 [n(l0)]. We do this\nobtain a lower (resp. upper) bound z(l0)\nby setting\n\n(resp. z(l0)\n\n) of z(l0)\n\nj\n\nj\n\nj\n\nL \u2190 l0,\n\nc(cid:62) \u2190 W(l0)\n\nj,:\n\n(resp. c(cid:62) \u2190 \u2212W(l0)\nj,: ),\n\nc0 \u2190 b(l0)\n\nj\n\n(resp. c0 \u2190 \u2212b(l0)\n\nj\n\n)\n\nin (C) and computing the exact optimum. However, we need to solve an LP for each neuron, and\npractical networks can have millions of them. We utilize the fact that in each layer l0, computing the\nfor each j \u2208 [n(l0)] can proceed independently in parallel. Indeed, we design a\nbounds z(l0)\nscheduler to do so on a cluster with 1000 CPU-nodes. See Appendix J for details.\n\nand z(l0)\n\nj\n\nj\n\nStep B: Solving the LP-relaxed Problem for the Last Layer. After obtaining the pre-activation\nbounds on all neurons in the network using step (A), we solve the LP in (C) subject to (4) for all\nj \u2208 [n(L0)]\\{jnom} obtained by setting\n\nL \u2190 L0,\n\nc(cid:62) \u2190 W(L0)\n\njnom,: \u2212 W(L0)\n\nj,:\n\nc0 \u2190 b(L0)\n\njnom \u2212 b(L0)\n\nj\n\n,\n\nagain in (C) and computing the exact minimum. Here, jnom is the true label of the data point xnom at\nwhich we are verifying the network. We can certify the network is robust around xnom iff the solutions\nof all such LPs are positive, i.e. we cannot make the true class logit lower than any other logits.\nAgain, note that these LPs are also independent of each other, so we can solve them in parallel.\nGiven any xnom, LP-ALL follows steps (A) then (B) to produce a certi\ufb01cate whether the network is\nrobust around a given datapoint or not. LP-LAST on the other hand solves only step (B), and instead\nof doing (A), it \ufb01nds the preactivation bounds greedily as in Algorithm 1 of Wong and Kolter [2018].\n\n6 Experiments\n\nWe conduct two experiments to assess the tightness of LP-ALL: 1) \ufb01nding certi\ufb01ed upper bounds\non the robust error of several NN classi\ufb01ers, 2) \ufb01nding certi\ufb01ed lower bounds on the minimum\nadversarial distortion \u0001 using different algorithms. All experiments are conducted on MNIST and/or\nCIFAR-10 datasets.\n\nArchitectures. We conduct experiments on a range of ReLU-activated feedforward networks.\nMLP-A and MLP-B refer to multilayer perceptrons: MLP-A has 1 hidden layer with 500 neurons,\nand MLP-B has 2 hidden layers with 100 neurons each. CNN-SMALL, CNN-WIDE-K, and CNN-\nDEEP-K are the ConvNet architectures used in Wong et al. [2018]. Full details are in Appendix I.1.\n\nTraining Modes. We conduct experiments on networks trained with a regular cross-entropy (CE)\nloss function and networks trained to be robust. These networks are identi\ufb01ed by a pre\ufb01x correspond-\ning to the method used to train them: LPD when the LP-relaxed dual formulation of Wong and Kolter\n[2018] is used for robust training, ADV when adversarial examples generated using PGD are used for\nrobust training, as in Madry et al. [2017], and NOR when the network is normally trained using the\nCE loss function. Training details are in Appendix I.2.\n\nExperimental Setup. We run experiments on a cluster with 1000 CPU-nodes. The total run time\namounts to more than 22 CPU-years. Appendix J provides additional details about the computational\nresources and the scheduling scheme used, and Appendix K provides statistics of the veri\ufb01cation\ntime in these experiments.\n\n7\n\n\fTable 1: Certi\ufb01ed bounds on the robust error on the test set of MNIST for normally and robustly\ntrained networks. The pre\ufb01x of each network corresponds to the training method used: ADV for PGD\ntraining [Madry et al., 2017], NOR for normal CE loss training, and LPD when the LP-relaxed dual\nformulation of Wong and Kolter [2018] is used for robust training.\n\nNETWORK\n\n\u0001\n\nADV-MLP-B 0.03\nADV-MLP-B 0.05\nADV-MLP-B\n0.1\nADV-MLP-A\n0.1\nNOR-MLP-B 0.02\nNOR-MLP-B 0.03\nNOR-MLP-B 0.05\nLPD-MLP-B\n0.1\n0.2\nLPD-MLP-B\n0.3\nLPD-MLP-B\nLPD-MLP-B\n0.4\n\nTEST\nERROR\n\nLOWER BOUND\nMILP\nLP-ALL\nMILP\nPGD\n4.18%\n4.17%\n5.78% 10.04%\n1.53%\n1.62%\n6.11% 11.38% 23.29%\n6.06%\n3.33% 15.86% 16.25% 34.37% 61.59%\n4.18% 11.51% 14.36% 30.81% 60.14%\n2.05% 10.06% 10.16% 13.48% 26.41%\n2.05% 20.37% 20.43% 48.67% 65.70%\n2.05% 53.37% 53.37% 94.04% 97.95%\n4.09% 13.39% 14.45% 14.45% 17.24%\n15.72% 33.85% 36.33% 36.33% 37.50%\n39.22% 57.29% 59.85% 59.85% 60.17%\n67.97% 81.85% 83.17% 83.17% 83.62%\n\nUPPER BOUND\n\nLP-GREEDY\n\n13.40%\n33.09%\n71.34%\n67.50%\n35.11%\n75.85%\n99.39%\n18.32%\n41.67%\n66.85%\n87.89%\n\n6.1 Certi\ufb01ed Bounds on the Robust Error\n\nTable 1 presents the clean test errors and (upper and lower) bounds on the true robust errors for a\nrange of classi\ufb01ers trained with different procedures on MNIST. For both ADV- and LPD-trained\nnetworks, the \u0001 in Table 1 denotes the l\u221e-norm bound used for training and robust testing; for\nNORmally-trained networks, \u0001 is only used for the latter.\nLower bounds on the robust error are calculated by \ufb01nding adversarial examples for inputs that are\nnot robust. This is done by using PGD, a strong \ufb01rst-order attack, or using MILP [Tjeng et al., 2019].\nUpper bounds on the robust error are calculated by providing certi\ufb01cates of robustness for input that\nis robust. This is done using MILP, the dual formulation (LP-GREEDY) presented by Wong and\nKolter [2018], or our LP-ALL algorithm.\nFor the MILP results, we use the code accompanying the paper by Tjeng et al. [2019]. We run the\ncode in parallel on a cluster with 1000 CPU-nodes, and set the MILP solver\u2019s time limit to 3600\nseconds. Note that this time limit is reached for ADV and NOR, and therefore the upper and lower\nbounds are separated by a gap that is especially large for some of the NORmally trained networks.\nOn the other hand, for LPD-trained networks, the MILP solver \ufb01nishes within the time limit, and\nthus the upper and lower bounds match.\nResults. For all NORmally and ADV-trained networks, we see that the certi\ufb01ed upper bounds using\nLP-GREEDY and LP-ALL are very loose when we compare the gap between them to the lower\nbounds found by PGD and MILP. As a sanity check, note that LP-ALL gives a tighter bound than\nLP-GREEDY in each case, as one would expect. Yet this improvement is not signi\ufb01cant enough to\nclose the gap with the lower bounds.\nThis sanity check also passes for LPD-trained networks, where the LP-GREEDY-certi\ufb01ed robust\nerror upper bound is, as expected, much closer to the true error (given by MILP here) than for other\nnetworks. For \u0001 = 0.1, the improvement of LP-ALL-certi\ufb01ed upper bound over LP-GREEDY is at\nmost modest, and the PGD lower bound is tighter to the true error. For large \u0001, the improvement is\nmuch more signi\ufb01cant in relative terms, but the absolute improvement is only 4 \u2212 7%. In this large \u0001\nregime, however, both the clean and robust errors are quite large, so the tightness of LP-ALL is less\nuseful.\n\n6.2 Certi\ufb01ed Bounds on the Minimum Adversarial Distortion \u0001\n\nWe are interested in searching for the minimum adversarial distortion \u0001, which is the radius of the\nlargest l\u221e ball in which no adversarial examples can be crafted. An upper bound on \u0001 is calculated\nusing PGD, and lower bounds are calculated using LP-GREEDY, LP-LAST, or our LP-ALL, all via\nbinary search. Since solving LP-ALL is expensive, we \ufb01nd the \u0001-bounds only for ten samples of the\nMNIST and CIFAR-10 datasets. In this experiment, both ADV- and LPD-networks are trained with\nan l\u221e maximum allowed perturbation of 0.1 and 8/255 on MNIST and CIFAR-10, respectively. See\nAppendix L.1 for details. Fig. 3 and 8 in the Appendix show the median percentage gap (de\ufb01ned in\n\n8\n\n\fFigure 3: The median percentage gap between the convex-relaxed algorithms (LP-ALL, LP-LAST,\nand LP-GREEDY) and PGD estimates of the minimum adversarial distortion \u0001 on ten samples of\nMNIST. The error bars correspond to 95% con\ufb01dence intervals. We highlight the 1.5\u00d7 and 5\u00d7 gaps\nbetween the \u0001 value estimated by PGD, and those estimated by the LP-relaxed algorithms. For more\ndetails, please refer to Table 2 in Appendix L.2.\n\nAppendix L.2) between the convex-relaxed algorithms and PGD bounds of \u0001 for MNIST and CIFAR,\nrespectively. Details are reported in Tables 2 and 3 in Appendix L.2.\nOn MNIST, the results show that for all networks trained NORmally or via ADV, the certi\ufb01ed lower\nbounds on \u0001 are 1.5 to 5 times smaller than the upper bound found by PGD; for LPD trained networks,\nbelow 1.5 times smaller. On CIFAR-10, the bounds are between 1.5 and 2 times smaller across all\nmodels. The smaller gap for LPD is of course as expected following similar observations in prior\nwork [Wong and Kolter, 2018, Tjeng et al., 2019]. Furthermore, the improvement of LP-ALL and\nLP-LAST over LP-GREEDY is not signi\ufb01cant enough to close the gap with the PGD upper bound.\nNote that similar results hold as well for randomly initialized networks (no training). To avoid clutter,\nwe report these in Appendix M.\n\n7 Conclusions and Discussions\n\nIn this work, we \ufb01rst presented a layer-wise convex relaxation framework that uni\ufb01es all previous\nLP-relaxed veri\ufb01ers, in both primal and dual spaces. Then we performed extensive experiments to\nshow that even the optimal convex relaxation for ReLU networks in this framework cannot obtain\ntight bounds on the robust error in all cases we consider here. Thus any method will face a convex\nrelaxation barrier as soon as it can be described by our framework. We look at how to bypass this\nbarrier in Appendix A.\nNote that different applications have different requirements for the tightness of the veri\ufb01cation, so\nour barrier could be a problem for some but not for others. In so far as the ultimate goal of robustness\nveri\ufb01cation is to construct a training method to lower certi\ufb01ed error, this barrier is not necessarily\nproblematic \u2014 some such method could still produce networks for which convex relaxation as\ndescribed by our framework produces accurate robust error bounds. An example is the recent work\nof Gowal et al. [2018] which shows that interval bound propagation, which often leads to loose\ncerti\ufb01cation bounds, can still be used for veri\ufb01ed training, and is able to achieve state-of-the-art\nveri\ufb01ed accuracy when carefully tuned. However, without a doubt, in all cases, tighter estimates\nshould lead to better results, and we reveal a de\ufb01nitive ceiling on most current methods.\n\n9\n\n\fReferences\nRoss Anderson, Joey Huchette, Christian Tjandraatmadja, and Juan Pablo Vielma. Strong convex\nrelaxations and mixed-integer programming formulations for trained neural networks. arXiv\npreprint arXiv:1811.01988, 2018.\n\nStephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.\n\nRudy R Bunel, Ilker Turkaslan, Philip Torr, Pushmeet Kohli, and Pawan K Mudigonda. A uni\ufb01ed\nview of piecewise linear neural network veri\ufb01cation. In Advances in Neural Information Processing\nSystems, pages 4795\u20134804, 2018.\n\nNicholas Carlini, Guy Katz, Clark Barrett, and David L Dill. Provably minimally-distorted adversarial\n\nexamples. arXiv preprint arXiv:1709.10207, 2017.\n\nChih-Hong Cheng, Georg N\u00fchrenberg, and Harald Ruess. Maximum resilience of arti\ufb01cial neural\nnetworks. In International Symposium on Automated Technology for Veri\ufb01cation and Analysis,\npages 251\u2013268. Springer, 2017.\n\nJeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certi\ufb01ed adversarial robustness via randomized\n\nsmoothing. arXiv preprint arXiv:1902.02918, 2019.\n\nSteven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex\n\noptimization. Journal of Machine Learning Research, 17(83):1\u20135, 2016.\n\nA. Domahidi, E. Chu, and S. Boyd. ECOS: An SOCP solver for embedded systems. In European\n\nControl Conference (ECC), pages 3071\u20133076, 2013.\n\nSouradeep Dutta, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. Output range analysis\nfor deep feedforward neural networks. In NASA Formal Methods Symposium, pages 121\u2013138.\nSpringer, 2018.\n\nKrishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O\u2019Donoghue,\nJonathan Uesato, and Pushmeet Kohli. Training veri\ufb01ed learners with learned veri\ufb01ers. arXiv\npreprint arXiv:1805.10265, 2018a.\n\nKrishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann, and Pushmeet Kohli. A\n\ndual approach to scalable veri\ufb01cation of deep networks. UAI, 2018b.\n\nRuediger Ehlers. Formal veri\ufb01cation of piece-wise linear feed-forward neural networks. In In-\nternational Symposium on Automated Technology for Veri\ufb01cation and Analysis, pages 269\u2013286.\nSpringer, 2017.\n\nMatteo Fischetti and Jason Jo. Deep neural networks as 0-1 mixed integer linear programs: A\n\nfeasibility study. arXiv preprint arXiv:1712.06174, 2017.\n\nTimon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin\nVechev. AI 2: Safety and robustness certi\ufb01cation of neural networks with abstract interpretation.\nIn 2018 IEEE Symposium on Security and Privacy (SP), 2018.\n\nSven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan\nUesato, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation\nfor training veri\ufb01ably robust models. arXiv preprint arXiv:1810.12715, 2018.\n\nMatthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classi\ufb01er\nagainst adversarial manipulation. In Advances in Neural Information Processing Systems (NIPS),\npages 2266\u20132276, 2017.\n\nGuy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: An ef\ufb01cient\nsmt solver for verifying deep neural networks. In International Conference on Computer Aided\nVeri\ufb01cation, pages 97\u2013117. Springer, 2017.\n\nMathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certi\ufb01ed\nrobustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471,\n2018.\n\n10\n\n\fBai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Second-order adversarial attack and\n\ncerti\ufb01able robustness. arXiv preprint arXiv:1809.03113, 2018.\n\nAlessio Lomuscio and Lalit Maganti. An approach to reachability analysis for feed-forward relu\n\nneural networks. arXiv preprint arXiv:1706.07351, 2017.\n\nAleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\nTowards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083,\n2017.\n\nMatthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably\nrobust neural networks. In International Conference on Machine Learning, pages 3575\u20133583,\n2018.\n\nChongli Qin, Krishnamurthy Dj Dvijotham, Brendan O\u2019Donoghue, Rudy Bunel, Robert Stanforth,\nSven Gowal, Jonathan Uesato, Grzegorz Swirszcz, and Pushmeet Kohli. Veri\ufb01cation of non-linear\nspeci\ufb01cations for neural networks. ICLR, 2019.\n\nAditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversar-\nInternational Conference on Learning Representations (ICLR), arXiv preprint\n\nial examples.\narXiv:1801.09344, 2018a.\n\nAditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semide\ufb01nite relaxations for certifying\nrobustness to adversarial examples. In Advances in Neural Information Processing Systems, pages\n10900\u201310910, 2018b.\n\nRalph Tyrell Rockafellar. Convex analysis. Princeton university press, 2015.\n\nHadi Salman, Jerry Li, Ilya Razenshteyn, Pengchuan Zhang, Huan Zhang, Sebastien Bubeck, and\nGreg Yang. Provably robust deep learning via adversarially trained smoothed classi\ufb01ers. In\nAdvances in Neural Information Processing Systems, pages 11289\u201311300, 2019.\n\nKarsten Scheibler, Leonore Winterer, Ralf Wimmer, and Bernd Becker. Towards veri\ufb01cation of\n\narti\ufb01cial neural networks. In MBMV, pages 30\u201340, 2015.\n\nGagandeep Singh, Timon Gehr, Matthew Mirman, Markus P\u00fcschel, and Martin Vechev. Fast and\neffective robustness certi\ufb01cation. In Advances in Neural Information Processing Systems, pages\n10825\u201310836, 2018.\n\nGagandeep Singh, Timon Gehr, Markus P\u00fcschel, and Martin Vechev. An abstract domain for\ncertifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41,\n2019a.\n\nGagandeep Singh, Timon Gehr, Markus P\u00fcschel, and Martin Vechev. Robustness certi\ufb01cation with\n\nre\ufb01nement. ICLR, 2019b.\n\nVincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed\ninteger programming. In International Conference on Learning Representations, 2019. URL\nhttps://openreview.net/forum?id=HyGIdiRqtm.\n\nShiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of formally\n\nrobust neural networks. arXiv preprint arXiv:1811.02625, 2018a.\n\nShiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Ef\ufb01cient formal safety\nIn Advances in Neural Information Processing Systems, pages\n\nanalysis of neural networks.\n6369\u20136379, 2018b.\n\nTsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S\nDhillon, and Luca Daniel. Towards fast computation of certi\ufb01ed robustness for ReLU networks. In\nInternational Conference on Machine Learning, 2018.\n\nEric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer\nadversarial polytope. In International Conference on Machine Learning (ICML), pages 5283\u20135292,\n2018.\n\n11\n\n\fEric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial\n\ndefenses. Advances in Neural Information Processing Systems (NIPS), 2018.\n\nKai Y. Xiao, Vincent Tjeng, Nur Muhammad (Mahi) Sha\ufb01ullah, and Aleksander Madry. Training for\nfaster adversarial robustness veri\ufb01cation via inducing reLU stability. In International Conference\non Learning Representations, 2019. URL https://openreview.net/forum?id=BJfIVjAcKm.\n\nHuan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Ef\ufb01cient neural network\nrobustness certi\ufb01cation with general activation functions. In Advances in Neural Information\nProcessing Systems (NIPS), dec 2018.\n\nHuan Zhang, Pengchuan Zhang, and Cho-Jui Hsieh. Recurjac: An ef\ufb01cient recursive algorithm for\nbounding jacobian matrix of neural networks and its applications. AAAI Conference on Arti\ufb01cial\nIntelligence, 2019.\n\n12\n\n\f", "award": [], "sourceid": 5207, "authors": [{"given_name": "Hadi", "family_name": "Salman", "institution": "Microsoft Research AI"}, {"given_name": "Greg", "family_name": "Yang", "institution": "Microsoft Research"}, {"given_name": "Huan", "family_name": "Zhang", "institution": "UCLA"}, {"given_name": "Cho-Jui", "family_name": "Hsieh", "institution": "UCLA"}, {"given_name": "Pengchuan", "family_name": "Zhang", "institution": "Microsoft Research"}]}