{"title": "Lower Bounds on Adversarial Robustness from Optimal Transport", "book": "Advances in Neural Information Processing Systems", "page_first": 7498, "page_last": 7510, "abstract": "While progress has been made in understanding the robustness of machine learning classifiers to test-time adversaries (evasion attacks), fundamental questions remain unresolved. In this paper, we use optimal transport to characterize the maximum achievable accuracy in an adversarial classification scenario. In this setting, an adversary receives a random labeled example from one of two classes, perturbs the example subject to a neighborhood constraint, and presents the modified example to the classifier. We define an appropriate cost function such that the minimum transportation cost between the distributions of the two classes determines the \\emph{minimum $0-1$ loss for any classifier}. When the classifier comes from a restricted hypothesis class, the optimal transportation cost provides a lower bound. We apply our framework to the case of Gaussian data with norm-bounded adversaries and explicitly show matching bounds for the classification and transport problems and the optimality of linear classifiers. We also characterize the sample complexity of learning in this setting, deriving and extending previously known results as a special case. Finally, we use our framework to study the gap between the optimal classification performance possible and that currently achieved by state-of-the-art robustly trained neural networks for datasets of interest, namely, MNIST, Fashion MNIST and CIFAR-10.", "full_text": "Lower Bounds on Adversarial Robustness from\n\nOptimal Transport\n\nArjun Nitin Bhagoji \u2217\n\nDepartment of Electrical Engineering\n\nPrinceton University\n\nabhagoji@princeton.edu\n\nDaniel Cullina \u2217,\u2020\n\nDepartment of Electrical Engineering\n\nPennsylvania State University\n\ncullina@psu.edu\n\nPrateek Mittal\n\nDepartment of Electrical Engineering\n\nPrinceton University\n\npmittal@princeton.edu\n\nAbstract\n\nWhile progress has been made in understanding the robustness of machine learning\nclassi\ufb01ers to test-time adversaries (evasion attacks), fundamental questions remain\nunresolved. In this paper, we use optimal transport to characterize the minimum\npossible loss in an adversarial classi\ufb01cation scenario. In this setting, an adversary\nreceives a random labeled example from one of two classes, perturbs the example\nsubject to a neighborhood constraint, and presents the modi\ufb01ed example to the\nclassi\ufb01er. We de\ufb01ne an appropriate cost function such that the minimum trans-\nportation cost between the distributions of the two classes determines the minimum\n0\u2212 1 loss for any classi\ufb01er. When the classi\ufb01er comes from a restricted hypothesis\nclass, the optimal transportation cost provides a lower bound. We apply our frame-\nwork to the case of Gaussian data with norm-bounded adversaries and explicitly\nshow matching bounds for the classi\ufb01cation and transport problems as well as\nthe optimality of linear classi\ufb01ers. We also characterize the sample complexity\nof learning in this setting, deriving and extending previously known results as a\nspecial case. Finally, we use our framework to study the gap between the optimal\nclassi\ufb01cation performance possible and that currently achieved by state-of-the-art\nrobustly trained neural networks for datasets of interest, namely, MNIST, Fashion\nMNIST and CIFAR-10.\n\n1\n\nIntroduction\n\nMachine learning (ML) has become ubiquitous due to its impressive performance in a wide variety of\ndomains such as image recognition [48,72], natural language and speech processing [22,25,37], game-\nplaying [12,59,71] and aircraft collision avoidance [42]. This ubiquity, however, provides adversaries\nwith both the opportunity and incentive to strategically fool machine learning systems during both\nthe training (poisoning attacks) [5, 9, 40, 60, 67] and test (evasion attacks) [8, 17, 34, 57, 58, 63, 77]\nphases. In an evasion attack, an adversary adds imperceptible perturbations to inputs in the test phase\nto cause misclassi\ufb01cation. A large number of adversarial example-based evasion attacks have been\nproposed against ML algorithms used for tasks such as image classi\ufb01cation [8, 17, 19, 34, 63, 77],\nobject detection [21, 53, 83], image segmentation [2, 31] and speech recognition [18, 86]; generative\n\n\u2217Equal contribution.\n\u2020Work done while at Princeton University\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fmodels for image data [45] and even reinforcement learning algorithms [38, 46]. These attacks have\nbeen carried out in black-box [7, 11, 20, 52, 61, 62, 77] as well as in physical settings [29, 49, 70, 74].\nA wide variety of defenses based on adversarial training [34, 54, 78], input de-noising through\ntransformations [6, 24, 28, 69, 84], distillation [65], ensembling [1, 4, 75] and feature nulli\ufb01cation [81]\nwere proposed to defend ML algorithms against evasion attacks, only for most to be rendered\nineffective by stronger attacks [3, 14\u201316]. Iterative adversarial training [54] is a current state-of-the-\nart empirical defense. Recently, defenses that rely on adversarial training and are provably robust to\nsmall perturbations have been proposed [35, 44, 66, 73] but are unable to achieve good generalization\nbehavior on standard datasets such as CIFAR-10 [47]. In spite of an active line of research that has\nworked to characterize the dif\ufb01culty of learning in the presence of evasion adversaries by analyzing\nthe sample complexity of learning classi\ufb01ers for known distributions [68] as well as in the distribution-\nfree setting [23, 56, 85], fundamental questions remain unresolved. One such question is, what is the\nbehavior of the optimal achievable loss in the presence of an adversary?\nIn this paper, we derive bounds on the 0\u2212 1 loss of classi\ufb01ers while classifying adversarially modi\ufb01ed\ndata at test time, which is often referred to as adversarial robustness. We \ufb01rst develop a framework\nthat relates classi\ufb01cation in the presence of an adversary and optimal transport with an appropriately\nde\ufb01ned adversarial cost function. For an arbitrary data distribution with two classes, we characterize\noptimal adversarial robustness in terms of the transportation distance between the classes. When the\nclassi\ufb01er comes from a restricted hypothesis class, we obtain a lower bound on the minimum possible\n0 \u2212 1 loss (or equivalently, an upper bound on the maximum possible classi\ufb01cation accuracy).\nWe then consider the case of a mixture of two Gaussians and derive matching upper and lower bounds\nfor adversarial robustness by framing it as a convex optimization problem and proving the optimality\nof linear classi\ufb01ers. For an (cid:96)\u221e adversary, we also present the explicit solution for this optimization\nproblem and analyze its properties. Further, we derive an expression for sample complexity with\nthe assumption of a Gaussian prior on the mean of the Gaussians which allows us to independently\nmatch and extend the results from Schmidt et al. [68] as a special case.\nFinally, in our experiments, we \ufb01nd transportation costs between the classes of empirical distributions\nof interest such as MNIST [50], Fashion-MNIST [82] and CIFAR-10 [47] for adversaries bounded\nby (cid:96)2 and (cid:96)\u221e distance constraints, and relate them to the classi\ufb01cation loss of state-of-the-art robust\nclassi\ufb01ers. Our results demonstrate that as the adversarial budget increases, the gap between current\nrobust classi\ufb01ers and the lower bound increases. This effect is especially pronounced for the CIFAR-\n10 dataset, providing a clear indication of the dif\ufb01culty of robust classi\ufb01cation for this dataset.\nWhat do these results imply? First, the effectiveness of any defense for a given dataset can be\ndirectly analyzed by comparing its robustness to the lower bound. In particular, this allows us to\nidentify regimes of interest where robust classi\ufb01cation is possible. Our bound can be used to decide\nwhether a particular adversarial budget is big or small. Second, since our lower bound does not require\nany distributional assumptions on the data, we are able to directly apply it to empirical distributions,\ncharacterizing whether robust classi\ufb01cation is possible.\nFurther, in the Gaussian setting, the optimal classi\ufb01er in the adversarial case depends explicitly on the\nadversary\u2019s budget. The optimal classi\ufb01er in the benign case (corresponding to a budget of 0), differs\nfrom that for non-zero budgets. This immediately establishes a trade-off between the benign accuracy\nand adversarial robustness achievable with a given classi\ufb01er. This raises interesting questions about\nwhich classi\ufb01er should actually be deployed and how large the trade-off is. From the explicit solution\nwe derive in the Gaussian setting, we observe that non-robust features occur during classi\ufb01cation due\nto a mismatch between the norms used by the adversary and that governing the data distribution. We\nexpand upon this observation in Section 4.1, which was also made independently by Ilyas et al. [39].\nContributions: We summarize our contributions in this paper as follows: i) we develop a framework\nfor \ufb01nding general lower bounds for classi\ufb01cation error in the presence of an adversary (adversarial\nrobustness) using optimal transport, ii) we show matching upper and lower bounds for adversarial\nrobustness as well as the sample complexity of attaining it for the case of Gaussian data and a convex,\norigin-symmetric constraint on the adversary and iii) we determine lower bounds on adversarial\nrobustness for empirical datasets of interest and compare them to those of robustly trained classi\ufb01ers.\n\n2\n\n\f2 Preliminaries and Notation\n\nIn this section, we set up the problem of learning in the presence of an evasion adversary. Such an\nadversary presents the learner with adversarially modi\ufb01ed examples at test time but does not interfere\nwith the training process [17, 34, 77]. We also de\ufb01ne notation for the rest of the paper and explain\nhow other work on adversarial examples \ufb01ts into our setting.\n\nSymbol\n\nX\n\u02dcX\n\nUsage\n\nSpace of natural examples\n\nN : X \u2192 2 \u02dcX\n\nSpace of examples produced by the adversary\nNeighborhood constraint function for adversary\nDistribution of labeled examples (on X \u00d7 {\u22121, 1})\n\nP\nTable 1: Basic notation for the adversarial learning problem\n\nWe summarize the basic notation in Table 1. We now formally describe the learning problem.\nThere is an unknown P \u2208 P(X \u00d7 {\u22121, 1}). The learner receives labeled training data (x, y) =\n((x0, y0), . . . , (xn\u22121, yn\u22121)) \u223c P n and must select a hypothesis h. The evasion adversary receives a\nlabeled natural example (xTest, yTest) \u223c P and selects \u02dcx \u2208 N (xTest), the set of adversarial examples\nin the neighborhood of xTest. The adversary gives \u02dcx to the learner and the learner must estimate yTest.\nTheir performance is measured by the 0-1 loss, (cid:96)(yTest, h(\u02dcx)).\nExamples produced by the adversary are elements of a space \u02dcX . In most applications, X = \u02dcX , but\nwe \ufb01nd it useful to distinguish them to clarify some de\ufb01nitions. We require N (x) to be nonempty so\nsome choice of \u02dcx is always available. By taking X = \u02dcX and N (x) = {x}, we recover the standard\nproblem of learning without an adversary. If N1, N2 are neighborhood functions and N1(x) \u2286 N2(x)\nfor all x \u2208 X , N2 represents a stronger adversary. When X = \u02dcX , a neighborhood function N can be\nde\ufb01ned using a distance d on X and an adversarial constraint \u03b2: N (x) = {\u02dcx : d(x, \u02dcx) \u2264 \u03b2}. This\nprovides an ordered family of adversaries of varying strengths used in previous work [17, 34, 68].\nThe learner\u2019s error rate under the data distribution P with an adversary constrained by the neighbor-\nhood function N is L(N, P, h) = E(x,y)\u223cP [max\u02dcx\u2208N (x) (cid:96)(h(\u02dcx), y)].\n\n3 Adversarial Robustness from Optimal transport\n\nIn this section, we explain the connections between adversarially robust classi\ufb01cation and optimal\ntransport. At a high level, these arise from the following idea: if a pair of examples, one from each\nclass, are adversarially indistinguishable, then any hypothesis can classify at most one of the examples\ncorrectly, By \ufb01nding families of such pairs, one can obtain lower bounds on classi\ufb01cation error rate.\nWhen the set of available hypotheses is as large as possible, the best of these lower bounds is tight.\nSection Roadmap: We will \ufb01rst review some basic concepts from optimal transport theory [80].\nThen, we will de\ufb01ne a cost function for adversarial classi\ufb01cation as well as its associated potential\nfunctions that are needed to establish Kantorovich duality. We show how a coupling between the\nconditional distributions of the two classes can be obtained by composing couplings derived from the\nadversarial strategy and the total variation distance, which links hypothesis testing and transportation\ncosts. Finally, we show that the potential functions have an interpretation in terms of classi\ufb01cation,\nwhich leads to our theorem connecting adversarial robustness to the optimal transport cost.\n\n3.1 Basic de\ufb01nitions from optimal transport\n\nIn this section, we use capital letters for random variables and lowercase letters for points in spaces.\nCouplings A coupling between probability distributions PX on X and PY on Y is a joint distribu-\ntion on X \u00d7 Y with marginals PX and PY . Let \u03a0(PX , PY ) be the set of such couplings.\nDe\ufb01nition 1 (Optimal transport cost). For a cost function c : X \u00d7 Y \u2192 R \u222a {+\u221e} and marginal\ndistributions PX and PY , the optimal transport cost is\n\nC(PX , PY ) =\n\ninf\n\nPXY \u2208\u03a0(PX ,PY )\n\nE(X,Y )\u223cPXY [c(X, Y )].\n\n(1)\n\n3\n\n\fPotential functions and Kantorovich duality There is a dual characterization of optimal transport\ncost in terms of potential functions which we use to make the connection between the transport and\nclassi\ufb01cation problems.\nDe\ufb01nition 2 (Potential functions). Functions f : X \u2192 R and g : Y \u2192 R are potential functions for\nthe cost c if g(y) \u2212 f (x) \u2264 c(x, y) for all (x, y) \u2208 X \u00d7 Y.\nA pair of potential functions provide a one-dimensional representation of the spaces X and Y. This\nrepresentation must be be faithful to the cost structure on the original spaces: if a pair of points (x, y)\nare close in transportation cost, then f (x) must be close to g(y). In the dual optimization problem for\noptimal transport cost, we search for a representation that separates PX from PY as much as possible:\n\nEY \u223cPY [g(Y )] \u2212 EX\u223cPX [f (X)].\n\nC(PX , PY ) = sup\nf,g\n\n(2)\nFor any choices of f, g, and PXY , it is clear that E[g(Y )] \u2212 E[f (X)] \u2264 E[c(X, Y )]. Kantorovich\nduality states that there are in fact choices for f and g that attain equality.\nDe\ufb01ne the dual of f relative to c to be f c(y) = inf x c(x, y) + f (x). This is the largest function that\nforms a potential for c when paired with with f. In (2), it is suf\ufb01cient to optimize over pairs (f, f c).\nCompositions The composition of cost functions c : X \u00d7 Y \u2192 R and c(cid:48) : Y \u00d7 Z \u2192 R is\n\ny\u2208Y c(x, y) + c(cid:48)(y, z).\nThe composition of optimal transport costs can be de\ufb01ned in two equivalent ways:\n\n(c \u25e6 c(cid:48)) : X \u00d7 Z \u2192 R\n\n(c \u25e6 c(cid:48))(x, z) = inf\n\n(C \u25e6 C(cid:48))(PX , PZ) = inf\n\nC(PX , PY ) + C(cid:48)(PY , PZ) = inf\n\nE[(c \u25e6 c(cid:48))(X, Z)]\n\nPY\n\nPXZ\n\nTotal variation distance The total variation distance between distributions P and Q is\n\nCTV(P, Q) = sup\nA\n\n(3)\nWe use this notation because it is the optimal transport cost for the cost function cTV : X \u00d7 X \u2192 R,\ncTV(x, x(cid:48)) = 1[x (cid:54)= x(cid:48)]. Observe that (3) is equivalent to (2) with the additional restrictions that\nf (x) \u2208 {0, 1} for all x, i.e. f is an indicator function for some set A and g = f cTV.\nFor binary classi\ufb01cation with a symmetric prior on the classes, a set A that achieves the optimum in\nEq. (3) corresponds to an optimal test for distinguishing P from Q.\n\nP (A) \u2212 Q(A).\n\n3.2 Adversarial cost functions and couplings\n\nWe now construct specialized version of costs and couplings that translate between robust classi\ufb01ca-\ntion and optimal transport.\n\nN )(x, x(cid:48)) = 1[N (x) \u2229 N (x(cid:48)) = \u2205].\n\nN is not a metric on X because (cN \u25e6 c(cid:62)\n\nN )(x, x(cid:48)) = 0, we say that the points are adversarially indistinguishible.\n\nCost functions for adversarial classi\ufb01cation The adversarial constraint information N can be\nencoded into the following cost function cN : X \u00d7 \u02dcX \u2192 R: cN (x, \u02dcx) = 1[\u02dcx (cid:54)\u2208 N (x)]. The com-\nposition of cN and c(cid:62)\nN (i.e. cN with the arguments \ufb02ipped) has simple combinatorial interpretation:\n(cN \u25e6 c(cid:62)\nPerhaps the most well-known example of optimal transport is the earth-mover\u2019s or 1-Wasserstein\ndistance, where the cost function is a metric on the underlying space. In general, the transportation\ncost cN \u25e6 c(cid:62)\nN )(x, x(cid:48)) = 0 does not necessarily imply x = x(cid:48).\nHowever, when (cN \u25e6 c(cid:62)\nCouplings from adversarial strategies Let a : X \u2192 \u02dcX be a function such that a(x) \u2208 N (x)\nfor all x \u2208 X . Then a is an admissible adversarial perturbation strategy. The adversarial\nexpected risk can be expressed as a maximization over adversarial strategies: L(N, P, h) =\nbetween\nsupa1,a\u22121\nPX1 and P \u02dcX1\nwith\nCN (PX1, P \u02dcX1\nWe de\ufb01ne P \u02dcX\u22121\nthe total variation coupling of P \u02dcX1\n\nE(x,c)\u223cP [(cid:96)(h(ac(x)), c)]. Let \u02dcX1 = a1(X1), so a1 gives a coupling PX1 \u02dcX1\n\n. By construction, CN (PX1 , P \u02dcX1\n) = 0 corresponds to a randomized adversarial strategy.\n\nanalogously. By composing the adversarial strategy coupling PX1 \u02dcX1\n\n) = 0. A general coupling between PX1 and P \u02dcX1\n\n, we obtain a coupling PX1X\u22121.\n\nand P \u02dcX\u22121\n\n, and P \u02dcX\u22121X\u22121\n\nand PX\u22121 \u02dcX\u22121\n\n,\n\n4\n\n\fPotential functions from classi\ufb01ers Now we\ncan explore the relationship between transport\nand classi\ufb01cation. Consider a given hypoth-\nesis h : \u02dcX \u2192 {\u22121, 1}. A labeled adver-\nsarial example (\u02dcx, y) is classi\ufb01ed correctly if\n\u02dcx \u2208 h\u22121(y). A labeled example (x, y) is clas-\nsi\ufb01ed correctly if N (x) \u2286 h\u22121(y). Following\nCullina et al. [23], we de\ufb01ne degraded hypothe-\nses \u02dch : X \u2192 {\u22121, 1,\u22a5},\n\n\u02dch(x) =\n\n: N (x) \u2286 h\u22121(y)\n\n\u22a5 : otherwise.\n\n(cid:26)y\n\n\u02dcX\n\nX\n\n1\n\nP \u02dcX\u22121\n\nP \u02dcX1\n\nh(x) = \u22121\n\nPX\u22121\n\nh(x) = 1\n\nPX1\n\n\u02dch(x) = \u22121\n\n\u02dch(x) = \u22a5\n\n\u02dch(x) = 1\n\nf\n\ng\n\n0\n\nFigure 1: The relationships between a classi\ufb01er\nh : X \u2192 {1,\u22121}, a degraded classi\ufb01er \u02dch : \u02dcX \u2192\n{1,\u22121,\u22a5}, and potential functions f, g : X \u2192 R.\nN )(x, x(cid:48)) + 1. Thus the functions f (x) = 1 \u2212 1[\u02dch(x) = 1] and g(x) = 1[\u02dch(x) = \u22121] are\n\nThis allows us to express the adversarial classi-\n\ufb01cation accuracy of h, 1 \u2212 L(N, h, P ), as\n(E[1[\u02dch(X1) = 1]] + E[1[\u02dch(X\u22121) = \u22121]]).\n1\n2\nObserve that 1[\u02dch(x) = 1] + 1[\u02dch(x(cid:48)) = \u22121] \u2264\n(cN \u25e6 c(cid:62)\nadmissible potentials for cN \u25e6 c(cid:62)\nOur \ufb01rst theorem characterizes optimal adversarial robustness when h is allowed to be any classi\ufb01er.\nTheorem 1. Let X and \u02dcX be Polish spaces and let N : X \u2192 2 \u02dcX be an upper-hemicontinuous\nneighborhood function such that N (x) is nonempty and closed for all x. For any pair of distributions\nPX1,PX\u22121 on X ,\n\nN . This is illustrated in Figure 1.\n\n(CN \u25e6 C(cid:62)\n\nN )(PX1, PX\u22121 ) = 1 \u2212 2 inf\n\nL(N, h, P )\n\nh\n\nwhere h : \u02dcX \u2192 {1,\u22121} can be any measurable function. Furthermore there is some h that achieves\nthe in\ufb01mum.\n\nIn the case of \ufb01nite spaces, this theorem is essentially equivalent to the K\u00f6nig-Egerv\u00e1ry theorem on\nsize of a maximum matching in a bipartite graph. The full proof is in Section A of the Supplementary.\nIf instead of all measurable functions, we consider h \u2208 H, a smaller hypothesis class, Theorem 1\nprovides a lower bound on inf h\u2208H L(N, h, P ).\n\n4 Gaussian data: Optimal loss\n\nIn this section, we consider the case when the data is generated from a mixture of two Gaussians with\nidentical covariances and means that differ in sign. Directly applying (1) or (2), requires optimizing\nover either all classi\ufb01ers or all transportation plans. However, a classi\ufb01er and a coupling that achieve\nthe same cost must both be optimal. We use this to show that optimizing over linear classi\ufb01ers and\n\u2018translate and pair\u2019 transportation plans characterizes adversarial robustness in this case.\nProblem setup: Consider a labeled example (X, Y ) \u2208 Rd \u00d7 {\u22121, 1} such that the example X has\na Gaussian conditional distribution, X|(Y = y) \u223c N (y\u00b5, \u03a3), and Pr(Y = 1) = Pr(Y = \u22121) = 1\n2.\nLet B \u2286 Rd be a closed, convex, absorbing, origin-symmetric set. The adversary is constrained to\nadd perturbations to a data point x contained within \u03b2B, where \u03b2 is an adversarial budget parameter.\nThat is, for all x, N (x) = x + \u03b2B. This includes (cid:96)p-constrained adversaries as the special case\nB = {z : (cid:107)z(cid:107)p \u2264 1}. For N and P of this form, we will determine inf h L(N, P, h) where h can be\nany measurable function.\nWe \ufb01rst de\ufb01ne the following convex optimization problem in order to state Theorem 2. In the proof\nof Theorem 2, it will become clear how it arises.\nDe\ufb01nition 3. Let \u03b1\u2217(\u03b2, \u00b5) be the solution to the following convex optimization problem:\nz + y = \u00b5\n\nwhere we use the seminorms (cid:107)y(cid:107)\u03a3 =(cid:112)y(cid:62)\u03a3\u22121y and (cid:107)z(cid:107)B = inf{\u03b2 : z \u2208 \u03b2B}.\n\n(z, y, \u03b1) \u2208 Rd+d+1\n\n(cid:107)y(cid:107)\u03a3 \u2264 \u03b1\n\n(cid:107)z(cid:107)B \u2264 \u03b2\n\n(4)\n\nmin \u03b1\n\ns.t.\n\n5\n\n\f)\n\u00b5\n\n,\n\n\u03b2\n(\n\n\u2217\n\n\u03b1\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n\u03b1\u2217(\u03b2, \u00b5), d = 10\n\n)\n\u00b5\n\n,\n\n\u03b2\n(\n\n\u2217\n\n\u03b1\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n\u03b1\u2217(\u03b2, \u00b5), d = 1000\n\n0\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\n0.6\n\n0\n\n0.02\n\n0.04\n\n0.06\n\n0.08\n\n0.1\n\n0.12\n\n\u03b2\n\n\u03b2\n\nFigure 2: Variation in \u03b1\u2217 w.r.t. \u03b2 for an (cid:96)\u221e adversary with d = 10 (left) and d = 1000 (right). \u03b1\u2217 is\nthe point at which the primal transport problem and the dual classi\ufb01cation problem have matching\nsolutions, given by 1 \u2212 2Q(\u03b1\u2217). The classi\ufb01cation loss at this point is simply Q(\u03b1\u2217).\nTheorem 2. Let N (x) = x + \u03b2B. Then (CN \u25e6 C(cid:62)\nwhere Q is the complementary cumulative distribution function for N (0, 1).\n\nN )(N (\u00b5, \u03a3),N (\u2212\u00b5, \u03a3)) = 1 \u2212 2Q(\u03b1\u2217(\u03b2, \u00b5)),\n\nThe crucial properties of the solution to (4) are characterized in the following lemma.\nLemma 1. Let \u00b5 \u2208 Rd, \u03b2 \u2265 0, and \u03b1 = \u03b1\u2217(\u03b2, x). There are y, z, w \u2208 Rd such that y + z = \u00b5 and\n\n(cid:107)y(cid:107)\u03a3 = \u03b1\n\n(cid:107)z(cid:107)B = \u03b2\n\n(cid:107)w(cid:107)\u03a3\u2217 = 1\n\n(cid:107)w(cid:107)B\u2217 = \u03b3\n\nw(cid:62)y = \u03b1\n\nw(cid:62)z = \u03b2\u03b3.\n\nThe proof of Lemma 1 is in Section B.1 of the Supplementary.\n\nProof of Theorem 2. We start from the de\ufb01nition of optimal transport cost and consider the restricted\nclass of \u201ctranslate and pair in place\u201d couplings to get an upper bound. In these couplings, the\nadversarial attacks are translations by a constant: \u02dcX1 = X1 + z and \u02dcX\u22121 = X\u22121 \u2212 z. The total\nvariation coupling between \u02dcX1 and \u02dcX\u22121 does \u201cpairing in place\u201d.\n\n(CN \u25e6 C(cid:62)\n\nN )(PX1, PX\u22121) \u2264 inf\n\nz\u2208\u03b2B CT V (P \u02dcX1\n\n, P \u02dcX\u22121\n\n) = inf\n\nz\u2208\u03b2B sup\n\nw\n\n2Q\n\nThe in\ufb01mum is attained at w\u2217 = 2\u03a3\u22121(z \u2212 \u00b5) and its value is(cid:112)(z \u2212 \u00b5)(cid:124)\u03a3\u22121(z \u2212 \u00b5). The choice\n\nThe full computation of the total variation between Gaussians is in Section B.2 of the Supplementary..\nof z from Lemma 1 makes the upper bound 2Q(\u2212\u03b1\u2217(\u03b2, \u00b5)) \u2212 1 = 1 \u2212 2Q(\u03b1\u2217(\u03b2, \u00b5)).\nNow we consider the lower bounds on optimal transport cost from linear classi\ufb01cation functions\nof the form fw(x) = sgn (w\nIn the presence of an adversary, the classi\ufb01cation problem\nbecomes maxw P(x,y)\u223cP [fw(x + aw,y(x)) = y] . When y = 1, the correct classi\ufb01cation event is\nx \u2212 \u03b2(cid:107)w(cid:107)B\u2217 > 0. This ultimately gives the lower bound\nfw(x + aw,1(x)) = 1, or equivalently w\n\nx).\n\n(cid:124)\n\n(cid:124)\n\n\u2212 1.\n\n(cid:18) w\n\nz \u2212 w\n(cid:124)\n(cid:124)\n\u221a\nw(cid:124)\u03a3w\n\n\u00b5\n\n(cid:19)\n\n(cid:18) \u03b2(cid:107)w(cid:107)B\u2217 \u2212 w\n\n(cid:124)\n\n\u00b5\n\n(cid:19)\n\n(cid:107)w(cid:107)\u03a3\u2217\n\n(CN \u25e6 C(cid:62)\n\nN )(PX1, PX\u22121) \u2265 sup\n\nw\n\n1 \u2212 2Q\n\n.\n\n(5)\n\nThe full calculation appears in the supplementary material (Section B.3). From Lemma 1, there is a\nchoice of w that makes the bound in (5) equal to 1 \u2212 2Q(\u03b1\u2217(\u03b2, \u00b5)).\n\nThe proof of Theorem 2 shows that linear classi\ufb01ers are optimal for this problem. The choice of w\nprovided by Lemma 1 speci\ufb01es the orientation of the optimal classi\ufb01er.\n\n4.1 Special cases\nMatching norms for data and adversary: When B is the unit ball derived from \u03a3, the optimization\nproblem (4) has a very simple solution: \u03b1\u2217(\u03b2, \u00b5) = (cid:107)\u00b5(cid:107)\u03a3\u2212\u03b2, y = \u03b1\u00b5, z = \u03b2\u00b5, and w = 1(cid:107)\u00b5(cid:107)\u03a3\n\u03a3\u22121\u00b5.\nThus, the same classi\ufb01er is optimal for all adversarial budgets. In general, \u03b1\u2217(0, \u00b5) = (cid:107)\u00b5(cid:107)\u03a3 and\n\u03b1\u2217((cid:107)\u00b5(cid:107)B, \u00b5) = 0, but \u03b1\u2217(\u03b2, \u00b5) can be nontrivially convex for 0 \u2264 \u03b2 \u2264 (cid:107)\u00b5(cid:107)B. When there is a\ndifference between the two seminorms, the optimal modi\ufb01cation is not proportional to \u00b5, which can\nbe used by the adversary. The optimal classi\ufb01er varies with the adversarial budget, so there is a\ntrade-off between accuracy and robust accuracy.\n\n6\n\n\f(cid:96)\u221e adversaries: In Figure 2, we illustrate this phenomenon for an (cid:96)\u221e adversary. We plot \u03b1(\u03b2, \u00b5) for\n\u03a3 = I (so (cid:107)\u00b7(cid:107)\u03a3 = (cid:107)\u00b7(cid:107)2) and taking B to be the (cid:96)\u221e unit ball (so (cid:107)\u00b7(cid:107)B = (cid:107)\u00b7(cid:107)\u221e). In this case (4) has\nan explicit solution. For each coordinate zi, set zi = min(\u00b5i, \u03b2), which gives yi = \u00b5i \u2212 min(\u00b5i, \u03b2),\nwhich makes the constraints tight. Thus, as \u03b2 increases, more components of z equal those of \u00b5,\nreducing the marginal effect of an additional increase in \u03b2.\nDue to the mismatch between the seminorms governing the data and adversary, the value of \u03b2\ndetermines which features are useful for classi\ufb01cation, since features less than \u03b2 can be completely\nerased. Without an adversary, all of these features would be potentially useful for classi\ufb01cation,\nimplying that human-imposed adversarial constraints, with their mismatch from the underlying\ngeometry of the data distribution, lead to the presence of non-robust features that are nevertheless\nuseful for classi\ufb01cation. A similar observation was made in concurrent work by Ilyas et al. [39].\n\n5 Gaussian data: Sample complexity lower bound\n\nS(\u03c1, \u03b2\u03c1)\n\nS(\u03c1, 0)\n\nS(0, \u03b2\u03c1)\n\n.\n\nx\n\nIn the learning problem described\nthe minimum loss of any learning rule is\n\nIn this section, we use the characterization of the optimal loss in the Gaussian robust classi\ufb01cation\nproblem to establish the optimality of a rule for learning from a \ufb01nite number of samples. This allows\nfor precise characterization of sample complexity in the learning problem.\nConsider the following Bayesian learning problem, which generalizes a problem considered by\nSchmidt et al. [68]. We start from the classi\ufb01cation problem de\ufb01ned in Section 4. There, the choice\nof the classi\ufb01er h could directly depend on \u00b5 and \u03a3. Now we give \u00b5 the distribution N (0, 1\nm I). A\nlearner who knows this prior but not the value of \u00b5 is provided with n i.i.d. labeled training examples\nsamples. The learner selects any measurable classi\ufb01cation function \u02c6hn : Rd \u2192 {\u22121, 1} by applying\nsome learning algorithm to the training data with the goal of minimizing E[L(N, P, \u02c6hn)].\nThe optimal transport approach allows us to determine the\nexact optimal loss for this problem for each n as well as\nthe optimal learning algorithm. To characterize this loss,\nwe need the following de\ufb01nitions. Let A be the (cid:96)2 unit\nball: {y \u2208 Rd : (cid:107)y(cid:107)2 \u2264 1}. Let S(\u03b1, \u03b2) = {(x, t) \u2208\nRd \u00d7 R : x \u2208 t\u03b1A + \u03b2B}.\nTheorem 3.\nabove,\nPrV \u223cN (0,I) [V \u2208 S(\u03c1, \u03b2\u03c1)], where \u03c12 = m(m+n)\nn\nThe proof is in Section C of the Supplementary.\nThe special case where B is an (cid:96)\u221e ball was considered\nby Schmidt et al. [68]. They obtained a lower bound\nFigure 3: S(\u03c1, \u03b2\u03c1) is the set appearing\non loss that can be expressed in our notation as Pr[V \u2208\nin the statement of Theorem 3. S(\u03c1, 0)\nS(0, \u03c1\u03b2)]. This bound essentially ignores the random\ncorresponds to the loss lower bound ob-\nnoise in the problem and computes the probability that\ntained by Schmidt et al.. S(0, \u03b2\u03c1) corre-\nafter seeing n training examples, the posterior distributions\nsponds to the loss in the non-adversarial\nfor Xn+1|(Yn+1 = 1) and Xn+1|(Yn+1 = \u22121) are adversarially indistinguishable. The true optimal\nversion of this classi\ufb01cation problem.\nloss takes into account the intermediate case in which these posterior distributions are dif\ufb01cult but\nnot impossible to distinguish in the presence of an adversary.\nSchmidt et al. investigate sample complexity in the following parameter regime: m = c1d 1\n2 which\nby design is a low noise regime. In this regime, they establish upper and lower bounds on sample\nlog d \u2264 n \u2264 C(cid:48)\u03b22d. By taking into\ncomplexity of learning an adversarially robust classi\ufb01er: C \u03b22d\naccount the effect of the random noise, our characterization of the loss loses this gap. For larger\nvalues of m, the difference between Pr[Y \u2208 S(0, \u03c1\u03b2)] and Pr[Y \u2208 S(\u03c1, \u03c1\u03b2)] becomes more\nsigni\ufb01cant, so our analysis is useful over a much broader range of parameters.\n\nt\n\n6 Experimental Results\n\nIn this section, we use Theorem 1 to \ufb01nd lower bounds on adversarial robustness for empirical\ndatasets of interest. We also compare these bounds to the performance of robustly trained classi\ufb01ers\n\n7\n\n\fOptimal minimum loss\nRobust classi\ufb01er loss\n\ns\ns\no\nl\n\nn\no\ni\nt\na\nc\n\ufb01\ni\ns\ns\na\nl\nC\n\n0.5\n0.4\n0.3\n0.2\n0.1\n0\n\nOptimal minimum loss\nRobust classi\ufb01er loss\n\ns\ns\no\nl\n\nn\no\ni\nt\na\nc\n\ufb01\ni\ns\ns\na\nl\nC\n\n0.5\n0.4\n0.3\n0.2\n0.1\n0\n\ns\ns\no\nl\n\nn\no\ni\nt\na\nc\n\ufb01\ni\ns\ns\na\nl\nC\n\n0.5\n0.4\n0.3\n0.2\n0.1\n0\n\nOptimal minimum loss\nRobust classi\ufb01er loss\n\n0\n\n1\n\n2\n\n3\n\n4\n\n5\n\n0\n\n\u03b2\n\n(a) MNIST\n\n2\n\n3\n\n4\n\n1\n6\n(b) Fashion MNIST\n\n\u03b2\n\n5\n\n7\n\n0\n\n2\n\n4\n\n6\n\n8\n\n10\n\n\u03b2\n\n(c) CIFAR-10\n\nFigure 4: Variation in minimum 0 \u2212 1 loss (adversarial robustness) as \u03b2 is varied for \u20183 vs. 7\u2019. For all\ndatasets, the loss of a robustly classi\ufb01er (trained with iterative adversarial training [54]) is also shown\nfor a PGD adversary with an (cid:96)2 constraint.\n\non adversarial examples and \ufb01nd a gap for larger perturbation values. For reproducibility purposes,\nour code is available at https://github.com/inspire-group/robustness-via-transport.\n\n6.1 Experimental Setup\n\nWe consider the adversarial classi\ufb01cation problem on three widely used image datasets, namely\nMNIST [50], Fashion-MNIST [82] and CIFAR-10 [47], and obtain lower bounds on the adversarial\nrobustness for any classi\ufb01er for these datasets. For each dataset, we use data from classes 3 (PX1)\nand 7 (PX\u22121) to obtain a binary classi\ufb01cation problem. This choice is arbitrary and similar results are\nobtained with other choices, which we omit for brevity. We use 2000 images from the training set of\neach class to compute the lower bound on adversarial robustness when the adversary is constrained\nusing the (cid:96)2 norm. For the (cid:96)\u221e norm, these pairs of classes are very well separated, making the lower\nbounds less interesting (results in Section D of the Supplementary).\nFor the MNIST and Fashion MNIST dataset, we compare the lower bound with the performance of\na 3-layer Convolutional Neural Network (CNN) that is robustly trained using iterative adversarial\ntraining [54] with the Adam optimizer [43] for 12 epochs. This network achieves 99.9% accuracy\non the \u20183 vs. 7\u2019 binary classi\ufb01cation task on both MNIST and Fashion-MNIST. For the CIFAR-10\ndataset, we use a ResNet-18 [36] trained for 200 epochs, which achieves 97% accuracy on the binary\nclassi\ufb01cation task. To generate adversarial examples both during the training process and to test\nrobustness, we use Projected Gradient Descent (PGD) with an (cid:96)2 constraint, random initialization\nand a minimum of 10 iterations. Since more powerful heuristic attacks may be possible against these\nrobustly trained classi\ufb01ers, the \u2018robust classi\ufb01er loss\u2019 reported here is a lower bound.\n\n6.2 Lower bounds on adversarial robustness for empirical distributions\n\nNow, we describe the steps we follow to obtain a lower bound on adversarial robustness for empirical\ndistributions through a direct application of Theorem 1. We \ufb01rst create a k \u00d7 k matrix D whose\nentries are (cid:107)xi \u2212 xj(cid:107)p, where k is the number of samples from each class and p de\ufb01nes the norm.\nNow, we threshold these entries to obtain Dthresh, the matrix of adversarial costs (cN \u25e6 c(cid:62)\nN )(xi, xj)\n(recall Section 3.2), whose (i, j)th entry is 1 if Dij > 2\u03b2 and 0 otherwise, where \u03b2 is the constraint\non the adversary. Finally, optimal coupling cost (CN \u25e6 C(cid:62)\nN )(PX1, PX\u22121) is computed by performing\nminimum weight matching over the bipartite graph de\ufb01ned by the cost matrix Dthresh using the Linear\nSum Assignment module from Scipy [41].\nIn Figure 4, we show the variation in the minimum possible 0 \u2212 1 loss (adversarial robustness) in\nthe presence of an (cid:96)2 constrained adversary as the attack budget \u03b2 is increased. We compare this\nloss value to that of a robustly trained classi\ufb01er [54] when the PGD attack is used (on the same\ndata). Until a certain \u03b2 value, robust training converges and the model attains a non-trivial adversarial\nrobustness value. Nevertheless, there is a gap between the empirically obtained and theoretically\npredicted minimum loss values. Further, after \u03b2 = 3.8 (MNIST), \u03b2 = 4.8 (Fashion MNIST) and\n\u03b2 = 1.5, we observe that robust training is unable to converge. We believe this occurs as a large\nfraction of the data at that value of \u03b2 is close to the boundary when adversarially perturbed, making\nthe classi\ufb01cation problem very challenging.\nWe note that in order to reduce the classi\ufb01cation accuracy to random for CIFAR-10, a much larger (cid:96)2\nbudget is needed compared to either MNIST or Fashion-MNIST, implying that the classes are better\nseparated.\n\n8\n\n\f7 Related work and Concluding Remarks\n\nWe only discuss the closest related work that analyzes evasion attacks theoretically. Extensive recent\nsurveys [10, 51, 64] provide a broader overview.\nDistribution-speci\ufb01c generalization analysis: Schimdt et al. [68] studied the sample complexity of\nlearning a mixture of Gaussians as well as Bernoulli distributed data in the presence of (cid:96)\u221e-bounded\nadversaries, which we recover as a special case of our framework in 5. Gilmer et al. [33] and Diochnos\net al. [26] analyzed the robustness of classi\ufb01ers for speci\ufb01c distributions, i.e. points distributed on\ntwo concentric spheres and points on the Boolean hypercube respectively. In contrast to these papers,\nour framework applies for any binary classi\ufb01cation problem as our lower bound applies to arbitrary\ndistributions.\nSample complexity in the PAC setting: Cullina et al. [23], Yin et al. [85] and Montasser et al. [56]\nderive the sample complexity needed to PAC-learn a hypothesis class in the presence of an evasion\nadversary. These approaches do not provide an analysis of the optimal loss under a given distribution,\nbut only of the number of samples needed to get \u0001-close to it, i.e. to learn the best empirical hypothesis.\nOptimal transport for bounds on adversarial robustness: Sinha et al. [73] constrain the adversary\nusing a Wasserstein distance bound on the distribution that results from perturbing the benign distri-\nbution and study the sample complexity of SGD for minimizing the relaxed Lagrangian formulation\nof the learning problem with this constraint. In contrast, we use a cost function that characterizes\nsample-wise adversarial perturbation exactly, which aligns with current practice and provide a lower\nbound on the 0 \u2212 1 loss with an adversary, while Sinha et al. minimize an upper bound to perform\nrobust training. Mahloujifar et al. [55] and Dohmatob [27] use the \u2018blowup\u2019 property exhibited by\ncertain data distributions to provide bounds on adversarial risk, given some level of ordinary risk.\nIn comparison, our assumptions on the example space, distribution, and adversarial constraints are\nmuch milder. Even in regimes where these frameworks are applicable, our approach provides two key\nadvantages. First, our bounds explicitly concern the adversarial robustness of the optimal classi\ufb01er,\nwhile theirs relate the adversarial robustness to the benign classi\ufb01cation error of a classi\ufb01er. Thus, our\nbounds can still be nontrivial even when there is a classi\ufb01er with a benign classi\ufb01cation error of zero,\nwhich is exactly the case in our MNIST experiments. Second, our bounds apply for any adversarial\nbudget while theirs become non-trivial only when the adversarial budget exceeds a critical threshold\ndepending on the properties of the space.\nPossibility of robust classi\ufb01cation: Bubeck et al. [13] show that there exist classi\ufb01cation tasks in\nthe statistical query model for which there is no ef\ufb01cient algorithm to learn robust classi\ufb01ers. Tsipras\net al. [79], Zhang et al. [87] and Suggala et al. [76] study the trade-offs between robustness and\naccuracy. We discuss this trade-off for Gaussian data in Section 4.\n\n7.1 Concluding remarks\n\nOur framework provides lower bounds on adversarial robustness through the use of optimal transport\nfor binary classi\ufb01cation problems, which we apply to empirical datasets of interest to analyze the\nperformance of current defenses. In future work, we will extend our framework to the multi-class\nclassi\ufb01cation setting. As a special case, we also characterize the learning problem exactly in the case\nof Gaussian data and study the relationship between noise in the learning problem and adversarial\nperturbations. Recent work [30, 32] has established an empirical connection between these two noise\nregimes and an interesting direction would be to precisely characterize which type of noise dominates\nthe learning process for a given adversarial budget. Another natural next step would be to consider\ndistributions beyond the Gaussian to derive expressions for optimal adversarial robustness as well as\nthe sample complexity of attaining it.\n\nAcknowledgements\n\nWe would like to thank Chawin Sitawarin for providing part of the code used in our experiments.\nThis research was sponsored by the National Science Foundation under grants CNS-1553437,\nCNS1704105, CIF-1617286 and EARS-1642962, by Intel through the Intel Faculty Research Award,\nby the Of\ufb01ce of Naval Research through the Young Investigator Program (YIP) Award, by the Army\nResearch Of\ufb01ce through the Young Investigator Program (YIP) Award and a Schmidt DataX Award.\nANB would like to thank Siemens for supporting him through the FutureMakers Fellowship.\n\n9\n\n\fReferences\n[1] Mahdieh Abbasi and Christian Gagn\u00e9. Robustness to adversarial examples through an ensemble of\n\nspecialists. arXiv preprint arXiv:1702.06856, 2017.\n\n[2] Anurag Arnab, Ondrej Miksik, and Philip H. S. Torr. On the robustness of semantic segmentation models\n\nto adversarial attacks. In CVPR, 2018.\n\n[3] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security:\nCircumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on\nMachine Learning, pages 274\u2013283, 2018.\n\n[4] Alexander Bagnall, Razvan Bunescu, and Gordon Stewart. Training ensembles to detect adversarial\n\nexamples. arXiv preprint arXiv:1712.04006, 2017.\n\n[5] Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mittal, and Seraphin Calo. Analyzing federated learning\n\nthrough an adversarial lens. In ICML, 2019.\n\n[6] Arjun Nitin Bhagoji, Daniel Cullina, and Prateek Mittal. Dimensionality reduction as a defense against\n\nevasion attacks on machine learning classi\ufb01ers. arXiv preprint arXiv:1704.02654, 2017.\n\n[7] Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. Practical black-box attacks on deep neural\nnetworks using ef\ufb01cient query mechanisms. In European Conference on Computer Vision, pages 158\u2013174.\nSpringer, 2018.\n\n[8] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim \u0160rndi\u00b4c, Pavel Laskov, Giorgio\nGiacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Joint European\nConference on Machine Learning and Knowledge Discovery in Databases, pages 387\u2013402. Springer, 2013.\n[9] Battista Biggio, Blaine Nelson, and Pavel Laskov. Poisoning attacks against support vector machines. In\nProceedings of the 29th International Conference on Machine Learning (ICML-12), pages 1807\u20131814,\n2012.\n\n[10] Battista Biggio and Fabio Roli. Wild patterns: Ten years after the rise of adversarial machine learning.\n\narXiv preprint arXiv:1712.03141, 2017.\n\n[11] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks\n\nagainst black-box machine learning models. In ICLR, 2018.\n\n[12] Noam Brown and Tuomas Sandholm. Superhuman ai for heads-up no-limit poker: Libratus beats top\n\nprofessionals. Science, page eaao1733, 2017.\n\n[13] S\u00e9bastien Bubeck, Eric Price, and Ilya Razenshteyn. Adversarial examples from computational constraints.\n\narXiv preprint arXiv:1805.10204, 2018.\n\n[14] Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv\n\npreprint arXiv:1607.04311, 2016.\n\n[15] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection\n\nmethods. In AISec, 2017.\n\n[16] Nicholas Carlini and David Wagner. Magnet and \u201cef\ufb01cient defenses against adversarial attacks\" are not\n\nrobust to adversarial examples. arXiv preprint arXiv:1711.08478, 2017.\n\n[17] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security\n\nand Privacy (SP), 2017 IEEE Symposium on, pages 39\u201357. IEEE, 2017.\n\n[18] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In\n\nDLS (IEEE SP), 2018.\n\n[19] Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. Ead: elastic-net attacks to deep\n\nneural networks via adversarial examples. In AAAI, 2018.\n\n[20] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization\nbased black-box attacks to deep neural networks without training substitute models. In Proceedings of the\n10th ACM Workshop on Arti\ufb01cial Intelligence and Security, pages 15\u201326. ACM, 2017.\n\n[21] Shang-Tse Chen, Cory Cornelius, Jason Martin, and Duen Horng Chau. Robust physical adversarial attack\n\non faster r-cnn object detector. arXiv preprint arXiv:1804.05810, 2018.\n\n[22] Ronan Collobert, Jason Weston, L\u00e9on Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa.\nNatural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493\u2013\n2537, 2011.\n\n[23] Daniel Cullina, Arjun Nitin Bhagoji, and Prateek Mittal. Pac-learning in the presence of adversaries. In\n\nAdvances in Neural Information Processing Systems, pages 230\u2013241, 2018.\n\n[24] Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Siwei Li, Li Chen, Michael E\nKounavis, and Duen Horng Chau. Shield: Fast, practical defense and vaccination for deep learning using\njpeg compression. arXiv preprint arXiv:1802.06816, 2018.\n\n10\n\n\f[25] Li Deng, Geoffrey Hinton, and Brian Kingsbury. New types of deep neural network learning for speech\nrecognition and related applications: An overview. In Acoustics, Speech and Signal Processing (ICASSP),\n2013 IEEE International Conference on, pages 8599\u20138603. IEEE, 2013.\n\n[26] Dimitrios Diochnos, Saeed Mahloujifar, and Mohammad Mahmoody. Adversarial risk and robustness:\nGeneral de\ufb01nitions and implications for the uniform distribution. In Advances in Neural Information\nProcessing Systems, pages 10359\u201310368, 2018.\n\n[27] Elvis Dohmatob. Generalized no free lunch theorem for adversarial robustness. In Proceedings of the 36th\n\nInternational Conference on Machine Learning, pages 1646\u20131654, 2019.\n\n[28] Gintare Karolina Dziugaite, Zoubin Ghahramani, and Daniel M Roy. A study of the effect of JPG\n\ncompression on adversarial images. arXiv preprint arXiv:1608.00853, 2016.\n\n[29] Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati,\n\nand Dawn Song. Robust physical-world attacks on machine learning models. In CVPR, 2018.\n\n[30] Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of classi\ufb01ers: from\n\nadversarial to random noise. In NIPS, 2016.\n\n[31] Volker Fischer, Mummadi Chaithanya Kumar, Jan Hendrik Metzen, and Thomas Brox. Adversarial\n\nexamples for semantic image segmentation. In ICLR Workshop, 2017.\n\n[32] Nic Ford, Justin Gilmer, Nicolas Carlini, and Dogus Cubuk. Adversarial examples are a natural consequence\n\nof test error in noise. In ICML, 2019.\n\n[33] Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wattenberg, and\n\nIan Goodfellow. Adversarial spheres. In ICLR, 2018.\n\n[34] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.\n\nIn International Conference on Learning Representations, 2015.\n\n[35] Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato,\nTimothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training\nveri\ufb01ably robust models. arXiv preprint arXiv:1810.12715, 2018.\n\n[36] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.\nIn Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770\u2013778, 2016.\n[37] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew\nSenior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic\nmodeling in speech recognition: The shared views of four research groups. IEEE Signal Processing\nMagazine, 29(6):82\u201397, 2012.\n\n[38] Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. Adversarial attacks on\n\nneural network policies. In ICLR, 2017.\n\n[39] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry.\n\nAdversarial examples are not bugs, they are features. arXiv preprint arXiv:1905.02175, 2019.\n\n[40] Matthew Jagielski, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, and Bo Li. Manipulating\nmachine learning: Poisoning attacks and countermeasures for regression learning. In IEEE Security and\nPrivacy, 2018.\n\n[41] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scienti\ufb01c tools for Python, 2001\u2013.\n\n[Online; accessed 05/23/2019].\n\n[42] Kyle D Julian, Jessica Lopez, Jeffrey S Brush, Michael P Owen, and Mykel J Kochenderfer. Policy\ncompression for aircraft collision avoidance systems. In Digital Avionics Systems Conference (DASC),\n2016 IEEE/AIAA 35th, pages 1\u201310. IEEE, 2016.\n\n[43] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[44] J Zico Kolter and Eric Wong. Provable defenses against adversarial examples via the convex outer\n\nadversarial polytope. In ICML, 2018.\n\n[45] Jernej Kos, Ian Fischer, and Dawn Song. Adversarial examples for generative models. arXiv preprint\n\narXiv:1702.06832, 2017.\n\n[46] Jernej Kos and Dawn Song. Delving into adversarial attacks on deep policies. In ICLR Workshop, 2017.\n[47] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009.\n[48] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classi\ufb01cation with deep convolutional\nneural networks. In Proceedings of the 25th International Conference on Neural Information Processing\nSystems - Volume 1, NIPS\u201912, pages 1097\u20131105, USA, 2012. Curran Associates Inc.\n\n[49] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv\n\npreprint arXiv:1607.02533, 2016.\n\n11\n\n\f[50] Yann LeCun and Corrina Cortes. The MNIST database of handwritten digits. 1998.\n[51] Qiang Liu, Pan Li, Wentao Zhao, Wei Cai, Shui Yu, and Victor CM Leung. A survey on security threats\nand defensive techniques of machine learning: A data driven view. IEEE access, 6:12103\u201312117, 2018.\n[52] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and\n\nblack-box attacks. In ICLR, 2017.\n\n[53] Jiajun Lu, Hussein Sibai, and Evan Fabry. Adversarial examples that fool detectors. arXiv preprint\n\narXiv:1712.02494, 2017.\n\n[54] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards\n\ndeep learning models resistant to adversarial attacks. In ICLR, 2018.\n\n[55] Saeed Mahloujifar, Dimitrios I Diochnos, and Mohammad Mahmoody. The curse of concentration in\nrobust learning: Evasion and poisoning attacks from concentration of measure. In Proceedings of the AAAI\nConference on Arti\ufb01cial Intelligence, volume 33, pages 4536\u20134543, 2019.\n\n[56] Omar Montasser, Steve Hanneke, and Nathan Srebro. Vc classes are adversarially robustly learnable, but\n\nonly improperly. arXiv preprint arXiv:1902.04217, 2019.\n\n[57] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversar-\n\nial perturbations. In CVPR, 2017.\n\n[58] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate\n\nmethod to fool deep neural networks. In CVPR, 2016.\n\n[59] Matej Morav\u02c7c\u00edk, Martin Schmid, Neil Burch, Viliam Lis`y, Dustin Morrill, Nolan Bard, Trevor Davis,\nKevin Waugh, Michael Johanson, and Michael Bowling. Deepstack: Expert-level arti\ufb01cial intelligence in\nheads-up no-limit poker. Science, 356(6337):508\u2013513, 2017.\n\n[60] Mehran Mozaffari-Kermani, Susmita Sur-Kolay, Anand Raghunathan, and Niraj K Jha. Systematic\npoisoning attacks on and defenses for machine learning in healthcare. IEEE journal of biomedical and\nhealth informatics, 19(6):1893\u20131905, 2015.\n\n[61] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from\n\nphenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.\n\n[62] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami.\nPractical black-box attacks against deep learning systems using adversarial examples. In Proceedings of\nthe 2017 ACM Asia Conference on Computer and Communications Security, 2017.\n\n[63] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami.\nThe limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security\nand Privacy (EuroS&P), pages 372\u2013387. IEEE, 2016.\n\n[64] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. Towards the science of\n\nsecurity and privacy in machine learning. arXiv preprint arXiv:1611.03814, 2016.\n\n[65] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense\nto adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE\nSymposium on, pages 582\u2013597. IEEE, 2016.\n\n[66] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial examples. In\n\nICLR, 2018.\n\n[67] Benjamin IP Rubinstein, Blaine Nelson, Ling Huang, Anthony D Joseph, Shing-hon Lau, Satish Rao,\nNina Taft, and JD Tygar. Stealthy poisoning attacks on pca-based anomaly detectors. ACM SIGMETRICS\nPerformance Evaluation Review, 37(2):73\u201374, 2009.\n\n[68] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially\n\nrobust generalization requires more data. arXiv preprint arXiv:1804.11285, 2018.\n\n[69] Uri Shaham, James Garritano, Yutaro Yamada, Ethan Weinberger, Alex Cloninger, Xiuyuan Cheng, Kelly\nStanton, and Yuval Kluger. Defending against adversarial images using basis functions transformations.\narXiv preprint arXiv:1803.10840, 2018.\n\n[70] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K Reiter. Accessorize to a crime: Real and\nstealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference\non Computer and Communications Security, pages 1528\u20131540. ACM, 2016.\n\n[71] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez,\nThomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human\nknowledge. Nature, 550(7676):354, 2017.\n\n[72] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni-\n\ntion. arXiv preprint arXiv:1409.1556, 2014.\n\n12\n\n\f[73] Aman Sinha, Hongseok Namkoong, and John Duchi. Certi\ufb01able distributional robustness with principled\n\nadversarial training. In ICLR, 2018.\n\n[74] Chawin Sitawarin, Arjun Nitin Bhagoji, Arsalan Mosenia, Prateek Mittal, and Mung Chiang. Rogue signs:\n\nDeceiving traf\ufb01c sign recognition with malicious ads and logos. In DLS (IEEE SP), 2018.\n\n[75] Charles Smutz and Angelos Stavrou. When a tree falls: Using diversity in ensemble classi\ufb01ers to identify\n\nevasion in malware detectors. In NDSS, 2016.\n\n[76] Arun Sai Suggala, Adarsh Prasad, Vaishnavh Nagarajan, and Pradeep Ravikumar. Revisiting adversarial\nrisk. In The 22nd International Conference on Arti\ufb01cial Intelligence and Statistics, pages 2331\u20132339,\n2019.\n\n[77] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and\n\nRob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.\n\n[78] Florian Tram\u00e8r, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick McDaniel. Ensemble adversar-\n\nial training: Attacks and defenses. In ICLR, 2018.\n\n[79] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. There is\nno free lunch in adversarial robustness (but there are unexpected bene\ufb01ts). arXiv preprint arXiv:1805.12152,\n2018.\n\n[80] C\u00e9dric Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.\n[81] Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G Ororbia II, Xinyu Xing, Xue Liu, and C Lee\nGiles. Adversary resistant deep neural networks with an application to malware detection. In Proceedings\nof the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages\n1145\u20131153. ACM, 2017.\n\n[82] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking\n\nmachine learning algorithms, 2017.\n\n[83] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples\nfor semantic segmentation and object detection. In International Conference on Computer Vision. IEEE,\n2017.\n\n[84] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural\n\nnetworks. In NDSS, 2018.\n\n[85] Dong Yin, Kannan Ramchandran, and Peter Bartlett. Rademacher complexity for adversarially robust\n\ngeneralization. In ICML, 2019.\n\n[86] Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing\nHuang, Xiaofeng Wang, and Carl A Gunter. Commandersong: A systematic approach for practical\nadversarial voice recognition. In USENIX Security, 2018.\n\n[87] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I Jordan.\nTheoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573,\n2019.\n\n13\n\n\f", "award": [], "sourceid": 4082, "authors": [{"given_name": "Arjun Nitin", "family_name": "Bhagoji", "institution": "Princeton University"}, {"given_name": "Daniel", "family_name": "Cullina", "institution": "Penn State University"}, {"given_name": "Prateek", "family_name": "Mittal", "institution": "Princeton University"}]}