{"title": "Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution", "book": "Advances in Neural Information Processing Systems", "page_first": 10359, "page_last": 10368, "abstract": "We study adversarial perturbations when the instances are uniformly distributed over {0,1}^n. We study both \"inherent\" bounds that apply to any problem and any classifier for such a problem as well as bounds that apply to specific problems and specific hypothesis classes.\n\nAs the current literature contains multiple definitions of adversarial risk and robustness, we start by giving a taxonomy for these definitions based on their direct goals; we identify one of them as the one guaranteeing misclassification by pushing the instances to the error region. We then study some classic algorithms for learning monotone conjunctions and compare their adversarial risk and robustness under different definitions by attacking the hypotheses using instances drawn from the uniform distribution. We observe that sometimes these definitions lead to significantly different bounds. Thus, this study advocates for the use of the error-region definition, even though other definitions, in other contexts with context-dependent assumptions, may coincide with the error-region definition.\n\nUsing the error-region definition of adversarial perturbations, we then study inherent bounds on risk and robustness of any classifier for any classification problem whose instances are uniformly distributed over {0,1}^n. Using the isoperimetric inequality for the Boolean hypercube, we show that for initial error 0.01, there always exists an adversarial perturbation that changes O(\u221an) bits of the instances to increase the risk to 0.5, making classifier's decisions meaningless. Furthermore, by also using the central limit theorem we show that when n\u2192\u221e, at most c\u221an bits of perturbations, for a universal constant c<1.17, suffice for increasing the risk to 0.5, and the same c\u221an bits of perturbations on average suffice to increase the risk to 1, hence bounding the robustness by c\u221an.", "full_text": "Adversarial Risk and Robustness: General De\ufb01nitions\n\nand Implications for the Uniform Distribution\n\nDimitrios I. Diochnos\u2217\nUniversity of Virginia\n\ndiochnos@virginia.edu\n\nSaeed Mahloujifar\u2217\nUniversity of Virginia\nsaeed@virginia.edu\n\nMohammad Mahmoody\u2020\n\nUniversity of Virginia\n\nmohammad@virginia.edu\n\nAbstract\n\nWe study adversarial perturbations when the instances are uniformly distributed\n\nover {0, 1}n. We study both \u201cinherent\u201d bounds that apply to any problem and any\n\nclassi\ufb01er for such a problem as well as bounds that apply to speci\ufb01c problems and\nspeci\ufb01c hypothesis classes.\nAs the current literature contains multiple de\ufb01nitions of adversarial risk and ro-\nbustness, we start by giving a taxonomy for these de\ufb01nitions based on their direct\ngoals; we identify one of them as the one guaranteeing misclassi\ufb01cation by push-\ning the instances to the error region. We then study some classic algorithms for\nlearning monotone conjunctions and compare their adversarial robustness under\ndifferent de\ufb01nitions by attacking the hypotheses using instances drawn from the\nuniform distribution. We observe that sometimes these de\ufb01nitions lead to signi\ufb01-\ncantly different bounds. Thus, this study advocates for the use of the error-region\nde\ufb01nition, even though other de\ufb01nitions, in other contexts with context-dependent\nassumptions, may coincide with the error-region de\ufb01nition.\nUsing the error-region de\ufb01nition of adversarial perturbations, we then study inher-\nent bounds on risk and robustness of any classi\ufb01er for any classi\ufb01cation problem\n\nwhose instances are uniformly distributed over {0, 1}n. Using the isoperimetric\ninequality for the Boolean hypercube, we show that for initial error 0.01, there\nalways exists an adversarial perturbation that changes O(\u221an) bits of the instances\nto increase the risk to 0.5, making classi\ufb01er\u2019s decisions meaningless. Furthermore,\nby also using the central limit theorem we show that when n \u2192 \u221e, at most c\u00b7\u221an\nbits of perturbations, for a universal constant c < 1.17, suf\ufb01ce for increasing the\nrisk to 0.5, and the same c\u221an bits of perturbations on average suf\ufb01ce to increase\nthe risk to 1, hence bounding the robustness by c \u00b7 \u221an.\n\n1\n\nIntroduction\n\nIn recent years, modern machine learning tools (e.g., neural networks) have pushed to new heights\nthe classi\ufb01cation results on traditional datasets that are used as testbeds for various machine learning\nmethods.1 As a result, the properties of these methods have been put into further scrutiny.\nIn\nparticular, studying the robustness of the trained models in various adversarial contexts has gained\nspecial attention, leading to the active area of adversarial machine learning.\n\nWithin adversarial machine learning, one particular direction of research that has gained attention\nin recent years deals with the study of the so-called adversarial perturbations of the test instances.\nThis line of work was particularly popularized, in part, by the work of Szegedy et al. [32] within\n\n\u2217Authors have contributed equally.\n\u2020Supported by NSF CAREER CCF-1350939 and University of Virginia SEAS Research Innovation Award.\n1For example, http://rodrigob.github.io/are_we_there_yet/build/ has a summary of state-of-the-art results.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fthe context of deep learning classi\ufb01ers, but the same problem can be asked for general classi\ufb01ers as\nwell. Brie\ufb02y, when one is given a particular instance x for classi\ufb01cation, an adversarial perturbation\nx\u2032 for that instance is a new instance with minimal changes in the features of x so that the resulting\nperturbed instance x\u2032 is misclassi\ufb01ed by the classi\ufb01er h. The perturbed instance x\u2032 is commonly\nreferred to as an adversarial example (for the classi\ufb01er h). Adversarial machine learning has its\nroots at least as back as in [19, 24, 17]. However, the work of [32] revealed pairs of images that\ndiffered slightly so that a human eye could not identify any real differences between the two, and\nyet, contrary to what one would naturally expect, machine learning classi\ufb01ers would predict different\nlabels for the classi\ufb01cations of such pairs of instances. It is perhaps this striking resemblance to the\nhuman eye of the pairs of images that were provided in [32] that really gave this new push for\nintense investigations within the context of adversarial perturbations. Thus, a very intense line of\nwork started, aiming to understand and explain the properties of machine learning classi\ufb01ers on such\nadversarial perturbations; e.g., [15, 23, 2, 8, 20]. These attacks are also referred to as evasion attacks\n[25, 4, 15, 8, 36]. There is also work that aims at making the classi\ufb01ers more robust under such\nattacks [27, 36], yet newer attacks of Carlini and Wagner [7] broke many proposed defenses.\n\nOur general goal. In this work, we study barriers against robust classi\ufb01cation of adversarial exam-\nples. We are particularly interested in foundational bounds that potentially apply to broad class of\nproblems and distributions. One can study this question from the perspectives of both risk and ro-\nbustness. In the case of risk, the adversary\u2019s goal is to increase the error probability of the classi\ufb01er\n(e.g., to reach risk 0.5) by small perturbations of the instances, and in the case of robustness, we are\ninterested in the average amount of perturbations needed for making the classi\ufb01er always fail.\n\nStudying the uniform distribution. We particularly study adversarial risk and robustness for learn-\ning problems where the input distribution is Un which is uniform over the hypercube {0, 1}n. We\ntance between the original and perturbed instances x, x\u2032 \u2208 {0, 1}n is the number of locations that\n\nmeasure the cost of perturbations using the natural metric of Hamming distance. Namely, the dis-\n\nthey are different. This class of distributions already include many learning problems of interest.\nSo, by studying adversarial risk and robustness for such a natural distribution, we can immediately\nobtain results for a broad class of problems. We believe it is crucial to understand adversarial risk\nand robustness for natural distributions (e.g., Un uniform over the hypercube) and metrics (e.g., the\nHamming distance) to develop a theory of adversarial risk and robustness that can ultimately shed\nlight on the power and limitations of robust classi\ufb01cation for practical data sets. Furthermore, natu-\nral distributions like Un model a broad class of learning problems directly; e.g., see [5, 28, 18, 30].\nThe hope is that understanding the limitations of robust learning for these basic natural distributions\nwill ultimately shed light on challenges related to addressing broader problems of interest.\n\nRelated work. The work of Gilmer et al. [14] studied the above problem for the special case\nof input distributions that are uniform over unit spheres in dimension n. They showed that for\nany classi\ufb01cation problem with such input distribution, so long as there is an initial constant error\nprobability \u00b5, the robustness under the \u21132 norm is at most O(\u221an). Fawzi et al. [11] studied the above\nquestion for Gaussian distributions in dimension n and showed that when the input distribution\nhas \u21132 norm \u2248 1, then by \u2248 \u221an perturbations in \u21132 norm, we can make the classi\ufb01er change\n\nits prediction (but doing this does not guarantee that the perturbed instance x\u2032 will be misclassi\ufb01ed).\nSchmidt et al. [29] proved limits on robustness of classifying uniform instances by speci\ufb01c classi\ufb01ers\nand using a de\ufb01nition based on \u201ccorrupted inputs\u201d (see Section 2), while we are mainly interested\nin bounds that apply to any classi\ufb01ers and guarantee misclassi\ufb01cation of the adversarial inputs.\n\nDiscussion. Our results, like all other current provable bounds in the literature for adversarial risk\nand robustness only apply to speci\ufb01c distributions that do not cover the case of image distributions.\nThese results, however, are \ufb01rst steps, and indicate similar phenomena (e.g., relation to isoperimetric\ninequalities). Thus, as pursued in [14], these works motivate a deeper study of such inequalities for\nreal data sets. Finally, as discussed in [11], such theoretical attacks could potentially imply direct\nattacks on real data, assuming the existence of smooth generative models for latent vectors with\ntheoretically nice distributions (such as Gaussian or uniform over the hypercube) into natural data.\n\n1.1 Our Contribution and Results\n\nAs mentioned above, our main goal is to understand inherent barriers against robust classi\ufb01cation\nof adversarial examples, and our focus is on the uniform distribution Un of instances. In order to\nachieve that goal, we both do a de\ufb01nitions study and prove technical limitation results.\n\n2\n\n\fGeneral de\ufb01nitions and a taxonomy. As the current literature contains multiple de\ufb01nitions of\nadversarial risk and robustness, we start by giving a taxonomy for these de\ufb01nitions based on their\ndirect goals. More speci\ufb01cally, suppose x is an original instance that the adversary perturbs into a\n\u201cclose\u201d instance x\u2032. Suppose h(x), h(x\u2032) are the predictions of the hypothesis h(\u00b7) and c(x), c(x\u2032)\nare the true labels of x, x\u2032 de\ufb01ned by the concept function c(\u00b7). To call x\u2032 a successful \u201cadversarial\nexample\u201d, a natural de\ufb01nition would compare the predicted label h(x\u2032) with some other \u201canticipated\nanswer\u201d. However, what h(x\u2032) is exactly compared to is where various de\ufb01nitions of adversarial\nexamples diverge. We observe in Section 2 that the three possible de\ufb01nitions (based on comparing\nh(x\u2032) with either of h(x), c(x) or c(x\u2032)) lead to three different ways of de\ufb01ning adversarial risk and\nrobustness. We then identify one of them (that compares h(x) with c(x\u2032)) as the one guaranteeing\nmisclassi\ufb01cation by pushing the instances to the error region. We also discuss natural conditions\nunder which these de\ufb01nitions coincide. However, these conditions do not hold in general.\n\nA comparative study through monotone conjunctions. We next ask: how close/far are these\nde\ufb01nitions in settings where, e.g., the instances are drawn from the uniform distribution? To answer\nthis question, we make a comparative study of adversarial risk and robustness for a particular case\n\nof learning monotone conjunctions under the uniform distribution Un (over {0, 1}n). A monotone\nconjunction f is a function of the form f = (xi1 \u2227 \u00b7\u00b7\u00b7 \u2227 xik ). This class of functions is perhaps one\nof the most natural and basic learning problems that are studied in computational learning theory as\nit encapsulates, in the most basic form, the class of functions that determine which features should\nbe included as relevant for a prediction mechanism. For example, Valiant in [35] used this class of\nfunctions under Un to exemplify the framework of evolvability. We attack monotone conjunctions\nunder Un in order to contrast different behavior of de\ufb01nitions of adversarial risk and robustness.\n\nIn Section 3, we show that previous de\ufb01nitions of robustness that are not based on the error region,\nlead to bounds that do not equate the bounds provided by the error-region approach. We do so by\n\ufb01rst deriving theorems that characterize the adversarial risk and robustness of a given hypothesis\nand a concept function under the uniform distribution. Subsequently, by performing experiments we\nshow that, on average, hypotheses computed by two popular algorithms (FIND-S [22] and SWAP-\nPING ALGORITHM [35]) also exhibit the behavior that is predicted by the theorems. Estimating\nthe (expected value of) the adversarial risk and robustness of hypotheses produced by other classic\nalgorithms under speci\ufb01c distributions, or for other concept classes, is an interesting future work.\n\nInherent bounds for any classi\ufb01cation task under the uniform distribution. Finally, after es-\ntablishing further motivation to use the error-region de\ufb01nition as the default de\ufb01nition for studying\nadversarial examples in general settings, we turn into studying inherent obstacles against robust\nclassi\ufb01cation when the instances are drawn from the uniform distribution. We prove that for any\nlearning problem P with input distribution Un (i.e., uniform over the hypercube) and for any clas-\nsi\ufb01er h for P with a constant error \u00b5, the robustness of h to adversarial perturbations (in Hamming\ndistance) is at most O(\u221an). We also show that by the same amount of O(\u221an) perturbations in the\nworst case, one can increase the risk to 0.99. Table 1 lists some numerical examples.\n\nTable 1: Each row focuses on the number of tampered bits to achieve its stated goal. The second\ncolumn shows results using direct calculations for speci\ufb01c dimensions. The third column shows that\nthese results are indeed achieved in the limit, and the last column shows bounds proved for all n.\n\nAdversarial goals\n\nFrom initial risk 0.01 to 0.99\nFrom initial risk 0.01 to 0.50\nRobustness for initial risk 0.01\n\nn = 103, 104, 105\n\n\u2248 2.34\u221an\n\u2248 1.17\u221an\n\u2248 1.17\u221an\n\nTypes of bounds\nn 7\u2192 \u221e\n< 2.34\u221an < 3.04\u221an\n< 1.17\u221an < 1.52\u221an\n< 1.17\u221an < 1.53\u221an\n\nall n\n\nTo prove results above, we apply the isoperimetric inequality of [26, 16] to the error region of the\nclassi\ufb01er h and the ground truth c. In particular, it was shown in [16, 26] that the subsets of the\nhypercube with minimum \u201cexpansion\u201d (under Hamming distance) are Hamming balls. This fact\nenables us to prove our bounds on the risk. We then prove the bounds on robustness by proving a\ngeneral connection between risk and robustness that might obe of independent interest. Using the\ncentral limit theorem, we sharpen our bounds for robustness and obtain bounds that closely match\nthe bounds that we also obtain by direct calculations (based on the isoperimetric inequalities and\npicking Hamming balls as error region) for speci\ufb01c values of dimension n = 103, 104, 105.\n\n3\n\n\fFull version. All proofs could be found in the full version of the paper2, which also includes results\nrelated to the adversarial risk of monotone conjunctions, complementing the picture of Section 3.\n\n2 General De\ufb01nitions of Risk and Robustness for Adversarial Perturbations\n\nNotation. We use calligraphic letters (e.g., X ) for sets and capital non-calligraphic letters (e.g.,\nD) for distributions. By x \u2190 D we denote sampling x from D. In a classi\ufb01cation problem P =\n(X ,Y,D,C,H), the set X is the set of possible instances, Y is the set of possible labels, D is\na set of distributions over X , C is a class of concept functions, and H is a class of hypotheses,\nwhere any f \u2208 C \u222a H is a mapping from X to Y. An example is a labeled instance. We did not\nstate the loss function explicitly, as we work with classi\ufb01cation problems, however all main three\nde\ufb01nitions of this section directly extend to arbitrary loss functions. For x \u2208 X , c \u2208 C, D \u2208 D, the\nrisk or error of a hypothesis h \u2208 H is the expected (0-1) loss of (h, c) with respect to D, namely\nRisk(h, c, D) = Prx\u2190D[h(x) 6= c(x)]. We are usually interested in learning problems with a \ufb01xed\ndistribution D = {D}, as we are particularly interested in robustness of learning under the uniform\ndistribution Un over {0, 1}n. Note that since we deal with negative results, \ufb01xing the distribution\nonly makes our results stronger. As a result, whenever D = {D}, we omit D from the risk notation\nand simply write Risk(h, c). We usually work with problems P = (X ,Y, D,C,H, d) that include\na metric d over the instances. For a set S \u2286 X we let d(x,S) = inf{d(x, y) | y \u2208 S}, and\nby Ballr(x) = {x\u2032 | d(x, x\u2032) \u2264 r} we denote the ball of radius r centered at x under the metric\nd. By HD we denote Hamming distance for pairs of instances from {0, 1}n. Finally, we use the\nterm adversarial instance to refer to an adversarially perturbed instance x\u2032 of an originally sampled\ninstance x when the label of the adversarial example is either not known or not considered.\n\nBelow we present our formal de\ufb01nitions of adversarial risk and robustness. In all of these de\ufb01nitions\nwe will deal with attackers who perturb the initial test instance x into a close adversarial instance\nx\u2032. We will measure how much an adversary can increase the risk by perturbing a given input x into\na close adversarial example x\u2032. When to exactly call x\u2032 a successful adversarial example is where\nthese de\ufb01nitions differ. First we formalize the main de\ufb01nition that we use in this work based on\nadversary\u2019s ability to push instances to the error region.\nDe\ufb01nition 2.1 (Error-region risk and robustness). Let P = (X ,Y, D,C,H, d) be a classi\ufb01cation\nproblem (with metric d de\ufb01ned over instances X ).\n\n\u2022 Risk. For any r \u2208 R+, h \u2208 H, c \u2208 C, the error-region risk under r-perturbation is\n\nr\n\nRiskER\n\n(h, c) = Pr\nx\u2190D\n\n[\u2203x\u2032 \u2208 Ballr(x), h(x\u2032) 6= c(x\u2032)] .\n(h, c) = Risk(h, c) becomes the standard notion of risk.\n\nFor r = 0, RiskER\n\u2022 Robustness. For any h \u2208 H, x \u2208 X , c \u2208 C, the error-region robustness is the expected\ndistance of a sampled instance to the error region, formally de\ufb01ned as follows\n\nr\n\nRobER(h, c) = E\nx\u2190D\n\n[inf{r : \u2203x\u2032 \u2208 Ballr(x), h(x\u2032) 6= c(x\u2032)}] .\n\nDe\ufb01nition 2.1 requires the adversarial instance x\u2032 to be\nmisclassi\ufb01ed, namely, h(x\u2032) 6= c(x\u2032). So, x\u2032 clearly be-\nlongs to the error region of the hypothesis h compared to\nthe ground truth c. This de\ufb01nition is implicit in the work\nof [14]. In what follows, we compare our main de\ufb01nition\nabove with previously proposed de\ufb01nitions of adversar-\nial risk and robustness found in the literature and discuss\nwhen they are (and when they are not) equivalent to Def-\ninition 2.1. Figure 1 summarizes the differences between\nthe three main de\ufb01nitions that have appeared in the liter-\nature, where we distinguish cases by comparing the clas-\nsi\ufb01er\u2019s prediction h(x\u2032) at the new point x\u2032 with either of\nh(x), c(x), or c(x\u2032), leading to three different de\ufb01nitions.\n\n2See https://arxiv.org/abs/1810.12272.\n\nh(x)\n\nc(x)\n\ne\ng\nn\na\nh\nC\nn\no\ni\nt\nc\ni\nd\ne\nr\nP\n\nC orruptedInstance\n\nh(x\u2032)\n\nError Region\n\nc(x\u2032)\n\nFigure 1: The three main de\ufb01nitions\nbased on what h(x\u2032) is compared with.\n\n4\n\n\fDe\ufb01nitions based on hypothesis\u2019s prediction change (PC risk and robustness). Many works,\nincluding the works of [32, 11] use a de\ufb01nition of robustness that compares classi\ufb01er\u2019s prediction\n\nh(x\u2032) with the prediction h(x) on the original instance x. Namely, they require h(x\u2032) 6= h(x)\nrather than h(x\u2032) 6= c(x\u2032) in order to consider x\u2032 an adversarial instance. Here we refer to this\nde\ufb01nition (that does not depend on the ground truth c) as prediction-change (PC) risk and robustness\n(h) and RobPC(h)). We note that this de\ufb01nition captures the error-region risk\n(denoted as RiskPC\nand robustness if we assume the initial correctness (i.e., h(x) = c(x)) of classi\ufb01er\u2019s prediction on\nall x \u2190 X and \u201ctruth proximity\u201d, i.e., that c(x) = c(x\u2032) holds for all x\u2032 that are \u201cclose\u201d to x. Both\nof these assumptions are valid in some natural scenarios. For example, when input instances consist\nof images that look similar to humans (if used as the ground truth c(\u00b7)) and if h is also correct on the\noriginal (non-adversarial) test examples, then the two de\ufb01nitions (based on error region or prediction\nchange) coincide. But, these assumptions do not hold in in general.\n\nr\n\nDe\ufb01nitions based on the notion of corrupted instance (CI risk and robustness). The works\nof [21, 12, 13, 1] study the robustness of learning models in the presence of corrupted inputs. A\nmore recent framework was developed in [20, 29] for modeling risk and robustness that is inspired\nby robust optimization [3] (with an underlying metric space) and model adversaries that corrupt the\nthe original instance in (exponentially more) ways. When studying adversarial perturbations using\ncorrupted instances, we de\ufb01ne adversarial risk by requiring the adversarial instance x\u2032 to satisfy\nh(x\u2032) 6= c(x). The term \u201ccorrupted instance\u201d is particularly helpful as it emphasizes on the fact that\nthe goal (of the classi\ufb01er) is to \ufb01nd the true label of the original (uncorrupted) instance x, while we\nare only given a corrupted version x\u2032. Hence, we refer to this de\ufb01nition as the corrupted instance\n(CI) risk and robustness and denote them by RiskCI\nr (h, c) and RobCI(h, c). The advantage of this\nde\ufb01nition compared to the prediction-change based de\ufb01nitions is that here, we no longer need to\nassume the initial correctness assumption. Namely, only if the \u201ctruth proximity\u201d assumption holds,\n\nthen we have c(x) = c(x\u2032) which together with the condition h(x\u2032) 6= c(x) we can conclude that x\u2032\nis indeed misclassi\ufb01ed. However, if small perturbations can change the ground truth, c(x\u2032) can be\ndifferent from c(x), in which case, it is no long clear whether x\u2032 is misclassi\ufb01ed or not.\n\nStronger de\ufb01nitions of risk and robustness with more restrictions on adversarial instance.\n\nThe corrupted-input de\ufb01nition requires an adversarial instance x\u2032 to satisfy h(x\u2032) 6= c(x), and the\nerror-region de\ufb01nition requires h(x\u2032) 6= c(x\u2032). What if we require both of these conditions to call\nx\u2032 a true adversarial instance? This is indeed the de\ufb01nition used in the work of Suggala et al. [31],\nthough more formally in their work, they subtract the original risk (without adversarial perturbation)\nfrom the adversarial risk. This de\ufb01nition is certainly a stronger guarantee for the adversarial instance.\nAs this de\ufb01nition is a hybrid of the error-region and corrupted-instance de\ufb01nitions, we do not make\na direct study of this de\ufb01nition and only focus on the other three de\ufb01nitions described above.\n\nHow about when the classi\ufb01er h is 100% correct? We emphasize that when h happens to be the\nsame function as c, (the error region) De\ufb01nition 2.1 implies h has zero adversarial risk and in\ufb01nite\nadversarial robustness RobER(h, c) = \u221e. This is expected, as there is no way an adversary can\nperturb any input x into a misclassi\ufb01ed x\u2032. However, both of the de\ufb01nitions of risk and robustness\nbased on prediction change [32] and corrupted instance [21, 20] could compute large risk and small\nrobustness for such h. In fact, in a recent work [33] it is shown that for de\ufb01nitions based on corrupted\ninput, correctness might be provably at odds with robustness in some cases. Therefore, even though\nall these de\ufb01nitions could perhaps be used to approximate the risk and robustness when we do not\nhave access to the ground truth c\u2032 on the new point x\u2032, in this work we separate the de\ufb01nition of risk\nand robustness from how to compute/approximate them, so we will use De\ufb01nition 2.1 by default.\n\n3 A Comparative Study through Monotone Conjunctions\n\nIn this section, we compare the risk and robustness under the three de\ufb01nitions of Section 2 through\na study of monotone conjunctions under the uniform distribution. Namely, we consider adversarial\n\nperturbations of truth assignments that are drawn from the uniform distribution Un over {0, 1}n\n\nwhen the concept class contains monotone conjunctions. As we will see, these de\ufb01nitions diverge in\nthis natural case. Below we \ufb01x the setup under which all the subsequent results are obtained.\nProblem Setup 1. Let Cn be the concept class of all monotone conjunctions formed by at least one\nand at most n Boolean variables. The target concept (ground truth) c that needs to be learned is\n\n5\n\n\fc =\n\nxi \u2227\n\n^i=1\n\n^k=1\n\nxi \u2227\n\n^i=1\n\n^\u2113=1\n\ndrawn from Cn. Let the hypothesis class be H = Cn and let h \u2208 H be the hypothesis obtained by a\nlearning algorithm after processing the training data. With |h| and |c| we denote the size of h and c\nrespectively; that is, number of variables that h and c contain.3 Now let,\n\nm\n\nu\n\nm\n\nw\n\nyk\n\nand\n\nh =\n\nz\u2113 .\n\n(1)\n\nWe will call the variables that appear both in h and c as mutual, the variables that appear in c but\nnot in h as undiscovered, and the variables that appear in h but not in c as wrong (or redundant).\nTherefore in (1) we have m mutual variables, u undiscovered and w wrong. We denote the error\nregion of a hypothesis h and the target concept c with E (h, c).\nThat is, E (h, c) = {x \u2208 {0, 1}n | h(x) 6= c(x)}. The probability mass of the error region between\nh and c, denoted by \u00b5, under the uniform distribution Un over {0, 1}n is then,\n[x \u2208 E (h, c)] = \u00b5 = (2w + 2u \u2212 2) \u00b7 2\u2212m\u2212u\u2212w .\n\nx\u2190Un\n\nPr\n\n(2)\n\nIn this problem setup we are interested in computing the adversarial risk and robustness that attack-\n\ners can achieve when instances are drawn from the uniform distribution Un over {0, 1}n.\nRemark 3.1. Note that \u00b5 is a variable that depends on the particular h and c.\n\nUsing the Problem Setup 1, in what follows we compute the adversarial robustness that an arbitrary\nhypothesis has against an arbitrary target using the error region (ER) de\ufb01nition that we advocate\nin contexts where the perturbed input is supposed to be misclassi\ufb01ed and do the same calculations\nfor adversarial risk and robustness that are based on the de\ufb01nitions of prediction change (PC) and\ncorrupted instance (CI). The important message is that the adversarial robustness of a hypothesis\nbased on the ER de\ufb01nition is \u0398 (min{|h| ,|c|}), whereas the adversarial robustness based on PC and\nCI is \u0398 (|h|). In the full version of the paper we also give theorems (that have similar \ufb02avor) for\ncalculating the adversarial risk based on the three main de\ufb01nitions (ER, PC, CI).\nTheorem 3.2. Consider the Problem Setup 1. Then, if h = c we have RobER(h, c) = \u221e, while if\nh 6= c we have min{|h| ,|c|}/16 \u2264 RobER(h, c) \u2264 1 + min{|h| ,|c|}.\nTheorem 3.3. Consider the Problem Setup 1. Then, RobPC(h) = |h| /2 + 2\u2212|h|.\nTheorem 3.4. Consider the Problem Setup 1. Then, |h| /4 < RobCI(h, c) < |h| + 1/2.\n\n3.1 Experiments for the Expected Values of Adversarial Robustness\n\nIn this part, we complement the theorems that we presented earlier with experiments. This way we\nare able to examine how some popular algorithms behave under attack, and we explore the extent to\nwhich the generated solutions of such algorithms exhibit differences in their (adversarial) robustness\non average against various target functions drawn from the class of monotone conjunctions.\n\nThe \ufb01rst algorithm is the standard Occam algorithm that starts from the full conjunction and elimi-\nnates variables from the hypothesis that contradict the positive examples received; this algorithm is\nknown as FIND-S in [22] but has appeared without a name earlier by Valiant in [34] and its roots are\nat least as old as in [6]. The second algorithm is the SWAPPING ALGORITHM from the framework\nof evolvability [35]. This algorithm searches for an \u03b5-optimal solution among monotone conjunc-\ntions that have at most \u2308lg(3/(2\u03b5))\u2309 variables in their representation using a local search method\nwhere hypotheses in the neighborhood are obtained by swapping in and out some variable(s) from\nthe current hypothesis; we follow the analysis that was used in [10] and is a special case of [9].\n\nIn each experiment, we \ufb01rst learn hypotheses by using the algorithms under Un against different\ntarget sizes. For both algorithms, during the learning process, we use \u03b5 = 0.01 and \u03b4 = 0.05 for\nthe learning parameters. We then examine the robustness of the generated hypotheses by drawing\nexamples again from the uniform distribution Un as this is the main theme of this paper. In particular,\nwe test against the 30 target sizes from the set {1, 2, . . . , 24, 25, 30, 50, 75, 99, 100}. For each such\ntarget size, we plot the average value, over 500 runs, of the robustness of the learned hypothesis that\n\n3 For example, h1 = x1 \u2227 x5 \u2227 x8 is a monotone conjunction of three variables in a space where we have\n\nn \u2265 8 variables and |h1| = 3.\n\n6\n\n\fwe obtain. In each run, we repeat the learning process using a random target of the particular size as\nwell as a fresh training sample and subsequently estimate the robustness of the learned hypothesis\nby drawing another 10, 000 examples from Un that we violate (depending on the de\ufb01nition). The\ndimension of the instances is n = 100.\n\nFigure 2 presents the values of the three robustness measures for the case of FIND-S. In the full\nversion of the paper we provide more details on the algorithms and more information regarding our\nexperiments. The message is that the adversarial robustness that is based on the de\ufb01nitions of pre-\ndiction change and corrupted instance is more or less the same, whereas the adversarial robustness\nbased on the error region de\ufb01nition may obtain wildly different values compared to the other two.\n\ns\ns\ne\nn\n\nt\ns\nu\nb\no\nr\n\n 50\n\n 45\n\n 40\n\n 35\n\n 30\n\n 25\n\n 20\n\n 15\n\n 10\n\n 5\n\n 0\n\n1 5 10 15 20 25 30\n\nprediction change\ncorrupted instance\nerror region\n 65 70 75 80 85\n\n 35 40 45 50 55 60\n\n 90 95 100\n\ntarget size |c|\n\nFigure 2: Experimental comparison of the different robustness measures. The values for PC and CI\nalmost coincide and they can hardly be distinguished. The value for ER robustness is completely\ndifferent compared to the other two. Note that ER robustness is \u221e when the target size |c| is in\n{1, . . . , 8}\u222a{100} and for this reason only the points between 9 and 99 are plotted. When |c| \u2265 20,\nalmost always the learned hypothesis is the initialized full conjunction. The reason is that positive\nexamples are very rare and our training set contains none. As a result no variable is eliminated\nfrom the initialized hypothesis h (full conjunction). Hence, when |c| \u2265 20 we see that PC and CI\nrobustness is about max{|h| ,|c|}/2 = |h|/2, whereas ER is roughly min{|h| ,|c|}/2 = |c|/2.\n\n4\n\nInherent Bounds on Risk and Robustness for the Uniform Distribution\n\nIn this section, we state our main theorems about error region adversarial risk and robustness of arbi-\ntrary learning problems whose instances are distributed uniformly over the n-dimension hypercube\n{0, 1}n. The proofs of the theorems below are available in the full version of the paper.\n\nWe \ufb01rst de\ufb01ne a useful notation for the size of the (partial) Hamming balls.\nDe\ufb01nition 4.1. For every n \u2208 N we de\ufb01ne the (partial) \u201cHamming Ball Size\u201d function\nBSizen : [n] \u00d7 [0, 1) \u2192 [0, 1) as follows\n\nk(cid:19)! .\ni(cid:19) + \u03bb \u00b7(cid:18)n\nNote that this function is a bijection and we use BSize\u22121(\u00b7) to denote its inverse. When n is clear\nfrom the context, we will simply use BSize(\u00b7,\u00b7) and BSize\u22121(\u00b7) instead.\n\nBSizen(k, \u03bb) = 2\u2212n \u00b7 k\u22121\nXi=0(cid:18)n\n\nThe following theorem, gives a general lower bound for the adversarial risk of any classi\ufb01cation\n\nproblem for uniform distribution Un over the hypercube {0, 1}n, depending on the original error.\nTheorem 4.2. Suppose P = ({0, 1}n,Y, Un,C,H, HD) is a classi\ufb01cation problem. For any h \u2208\nH, c \u2208 C and r \u2208 N, let \u00b5 = Risk(h, c) > 0 be the original risk and (k, \u03bb) = BSize\u22121 (\u00b5) be a\nfunction of the original risk. Then, the error-region adversarial risk under r-perturbation is at least\n\nRiskER\n\nr\n\n(h, c) \u2265 BSize(k + r, \u03bb).\n\n7\n\n\fThe following corollary determines an asymptotic lower bound for risk based on Theorem 4.2.\n\nCorollary 4.3 (Error-region risk for all n). Suppose P = ({0, 1}n,Y, Un,C,H, HD) is a classi\ufb01ca-\ntion problem. For any hypothesis h, c with risk \u00b5 \u2208 (0, 1\n2 ] in predicting a concept function c, we can\nincrease the risk of (h, c) from \u00b5 \u2208 (0, 1\n\n2 , 1] by changing at most\n\n2 ] to \u00b5\u2032 \u2208 [ 1\n\nbits in the input instances. Namely, by using the above r, we have RiskER\nincrease the error to 1\n\nr =r\u2212n \u00b7 ln \u00b5\n\n+r\u2212n \u00b7 ln(1 \u2212 \u00b5\u2032)\n2 we only need to change at most r\u2032 =q \u2212n\u00b7ln(\u00b5)\n\nbits.\n\n2\n\n2\n\n2\n\nExample. Corollary 4.3 implies that for classi\ufb01cation tasks over Un, by changing at most 3.04\u221an\nnumber of bits in each example we can increase the error of an hypothesis from 1% to 99%. Further-\nmore, for increasing the error just to 0.5 we need half of the number of bits, which is 1.52\u221an.\nAlso, the corollary bellow, gives a lower bound on the limit of adversarial risk when n 7\u2192 \u221e. This\nlower bound matches the bound we have in our computational experiments.\nCorollary 4.4 (Error-region risk for large n). Let \u00b5 \u2208 (0, 1] and \u00b5\u2032 \u2208 (\u00b5, 1] and P =\n({0, 1}n,Y, Un,C,H, HD) be a classi\ufb01cation problem. Then for any h \u2208 H, c \u2208 C such that\nRisk(h, c) \u2265 \u00b5 we have Riskr(h, c) \u2265 \u00b5\u2032 for\n\nr\n\n(h, c) \u2265 \u00b5\u2032. Also, to\n\nr \u2248 \u221an \u00b7\n\n\u03a6\u22121(\u00b5\u2032) \u2212 \u03a6\u22121(\u00b5)\n\n2\n\nwhen n 7\u2192 \u221e\n\nwhere \u03a6 is the CDF of the standard normal distribution.\n\nExample. Corollary 4.4 implies that for classi\ufb01cation tasks over Un, when n is large enough, we\ncan increase the error from 1% to 99% by changing at most 2.34\u221an bits, and we can we can increase\nthe error from 1% to 50% by changing at most 1.17\u221an bits in test instances.\nThe following theorem shows how to upper bound the adversarial robustness using the original risk.\n\nTheorem 4.5. Suppose P = ({0, 1}n,Y, Un,C,H, HD) is a classi\ufb01cation problem. For any h \u2208 H\nand c \u2208 C, if \u00b5 = Risk(h, c) and (k, \u03bb) = BSize\u22121(\u00b5) depends on the original risk, then the\n\nerror-region robustness is at most\n\nRobER(h, c) \u2264\n\nn\u2212k+1\n\nXr=0\n\n(1 \u2212 BSize(k + r, \u03bb)) .\n\nFollowing, using Theorem 4.5, we give an asymptotic lower bound for robustness .\n\nCorollary 4.6. Suppose P = ({0, 1}n,Y, Un,C,H, HD) is a classi\ufb01cation problem. For any\nhypothesis h with risk \u00b5 \u2208 (0, 1\n2 ], we can make h to give always wrong answers by changing\nr =p\u2212n \u00b7 ln \u00b5/2 + \u00b5 \u00b7pn/2 number of bits on average. Namely, we have\n\nRobER(h, c) \u2264r\u2212n \u00b7 ln \u00b5\n\n2\n\n+ \u00b5 \u00b7r n\n\n2\n\n.\n\nAnd the following Corollary gives a lower bound on the robustness in limit.\n\nCorollary 4.7. For any \u00b5 \u2208 (0, 1], classi\ufb01cation problem P = ({0, 1}n,Y, Un,C,H, HD), and any\nh \u2208 H, c \u2208 C such that Risk(h, c) \u2265 \u00b5, we have\n\nRobER(h, c) \u2264\n\n2\n\n\u03a6\u22121(\u00b5)\n\n\u00b7 \u221an + \u00b5 \u00b7r \u03c0 \u00b7 n\n\n8\n\nwhen n 7\u2192 \u221e,\n\nwhere \u03a6 is the CDF of the standard normall distribution.\nExample. By changing 1.53\u221an number of bits on average we can increase the error of an hypoth-\nesis from 1% to 100%. Also, if n 7\u2192 \u221e, by changing only 1.17\u221an number of bits on average we\n\ncan increase the error from 1% to 100%.\n\n8\n\n\fReferences\n\n[1] Idan Attias, Aryeh Kontorovich, and Yishay Mansour.\nrobust learning. arXiv preprint arXiv:1810.02180, 2018.\n\nImproved generalization bounds for\n\n[2] Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya V. Nori,\nand Antonio Criminisi. Measuring Neural Net Robustness with Constraints. In NIPS, pages\n2613\u20132621, 2016.\n\n[3] Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi S. Nemirovski. Robust Optimization. Prince-\n\nton Series in Applied Mathematics. Princeton University Press, October 2009.\n\n[4] Battista Biggio, Giorgio Fumera, and Fabio Roli. Security evaluation of pattern classi\ufb01ers\n\nunder attack. IEEE transactions on knowledge and data engineering, 26(4):984\u2013996, 2014.\n\n[5] Avrim Blum, Merrick L. Furst, Jeffrey C. Jackson, Michael J. Kearns, Yishay Mansour, and\nSteven Rudich. Weakly learning DNF and characterizing statistical query learning using\nFourier analysis. In STOC, pages 253\u2013262, 1994.\n\n[6] Jerome S. Bruner, Jacqueline J. Goodnow, and George A. Austin. A study of thinking. John\n\nWiley & Sons, New York, NY, USA, 1957.\n\n[7] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing\nten detection methods. In Proceedings of the 10th ACM Workshop on Arti\ufb01cial Intelligence\nand Security, pages 3\u201314. ACM, 2017.\n\n[8] Nicholas Carlini and David A. Wagner. Towards Evaluating the Robustness of Neural Net-\nworks. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May\n22-26, 2017, pages 39\u201357, 2017.\n\n[9] Dimitrios I. Diochnos. On the Evolution of Monotone Conjunctions: Drilling for Best Approx-\n\nimations. In ALT, pages 98\u2013112, 2016.\n\n[10] Dimitrios I. Diochnos and Gy\u00f6rgy Tur\u00e1n. On Evolvability: The Swapping Algorithm, Product\n\nDistributions, and Covariance. In SAGA, pages 74\u201388, 2009.\n\n[11] Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classi\ufb01er.\n\narXiv preprint arXiv:1802.08686, 2018.\n\n[12] Uriel Feige, Yishay Mansour, and Robert Schapire. Learning and inference in the presence of\n\ncorrupted inputs. In Conference on Learning Theory, pages 637\u2013657, 2015.\n\n[13] Uriel Feige, Yishay Mansour, and Robert E Schapire. Robust inference for multiclass classi\ufb01-\n\ncation. In Algorithmic Learning Theory, pages 368\u2013386, 2018.\n\n[14] Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wat-\n\ntenberg, and Ian Goodfellow. Adversarial spheres. arXiv preprint arXiv:1801.02774, 2018.\n\n[15] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harnessing Adver-\n\nsarial Examples. In ICLR, 2015.\n\n[16] Lawrence H Harper. Optimal numberings and isoperimetric problems on graphs. Journal of\n\nCombinatorial Theory, 1(3):385\u2013393, 1966.\n\n[17] Ling Huang, Anthony D. Joseph, Blaine Nelson, Benjamin I. P. Rubinstein, and J. D. Tygar.\nIn Proceedings of the 4th ACM Workshop on Security and\n\nAdversarial Machine Learning.\nArti\ufb01cial Intelligence, AISec 2011, Chicago, IL, USA, October 21, 2011, pages 43\u201358, 2011.\n\n[18] Jeffrey C. Jackson and Rocco A. Servedio. On Learning Random DNF Formulas Under the\n\nUniform Distribution. Theory of Computing, 2(8):147\u2013172, 2006.\n\n[19] Daniel Lowd and Christopher Meek. Adversarial learning. In KDD, pages 641\u2013647, 2005.\n\n9\n\n\f[20] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian\narXiv preprint\nto appear in International Conference on Learning Representations\n\nVladu. Towards deep learning models resistant to adversarial attacks.\narXiv:1706.06083;\n(ICLR), 2018.\n\n[21] Yishay Mansour, Aviad Rubinstein, and Moshe Tennenholtz. Robust probabilistic inference. In\nProceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms, pages\n449\u2013460. Society for Industrial and Applied Mathematics, 2015.\n\n[22] Thomas M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition,\n\n1997.\n\n[23] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. DeepFool: A Simple\n\nand Accurate Method to Fool Deep Neural Networks. In CVPR, pages 2574\u20132582, 2016.\n\n[24] Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, and J. D. Tygar.\n\nClassi\ufb01er Evasion: Models and Open Problems. In PSDM, pages 92\u201398, 2010.\n\n[25] Blaine Nelson, Benjamin IP Rubinstein, Ling Huang, Anthony D Joseph, Steven J Lee, Satish\nRao, and JD Tygar. Query strategies for evading convex-inducing classi\ufb01ers. Journal of Ma-\nchine Learning Research, 13(May):1293\u20131332, 2012.\n\n[26] R. G. Nigmatullin. Some metric relations in the unit cube (in russian). Diskretny Analiz 9,\n\nNovosibirsk, pages 47\u201358, 1967.\n\n[27] Nicolas Papernot, Patrick D. McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Dis-\ntillation as a Defense to Adversarial Perturbations Against Deep Neural Networks. In IEEE\nSymposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016, pages\n582\u2013597, 2016.\n\n[28] Yoshifumi Sakai and Akira Maruoka. Learning Monotone Log-Term DNF Formulas under the\n\nUniform Distribution. Theory of Computing Systems, 33(1):17\u201333, 2000.\n\n[29] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry.\nAdversarially robust generalization requires more data. arXiv preprint arXiv:1804.11285,\n2018.\n\n[30] Linda Sellie. Exact learning of random DNF over the uniform distribution. In STOC, pages\n\n45\u201354, 2009.\n\n[31] Arun Sai Suggala, Adarsh Prasad, Vaishnavh Nagarajan, and Pradeep Ravikumar. On Adver-\n\nsarial Risk and Training. arXiv preprint arXiv:1806.02924, 2018.\n\n[32] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Good-\n\nfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014.\n\n[33] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander\nMadry. Robustness May Be at Odds with Accuracy. arXiv preprint arXiv:1805.12152, 2018.\n\n[34] Leslie G. Valiant. A Theory of the Learnable. Communications of the ACM, 27(11):1134\u20131142,\n\n1984.\n\n[35] Leslie G. Valiant. Evolvability. Journal of the ACM, 56(1):3:1\u20133:21, 2009.\n\n[36] Weilin Xu, David Evans, and Yanjun Qi. Feature Squeezing: Detecting Adversarial Exam-\nples in Deep Neural Networks. arXiv preprint arXiv:1704.01155. To appear in Network and\nDistributed System Security Symposium (NDSS), 2018.\n\n10\n\n\f", "award": [], "sourceid": 6626, "authors": [{"given_name": "Dimitrios", "family_name": "Diochnos", "institution": "University of Virginia"}, {"given_name": "Saeed", "family_name": "Mahloujifar", "institution": "University of Virginia"}, {"given_name": "Mohammad", "family_name": "Mahmoody", "institution": "University of Virginia"}]}