{"title": "Functional Adversarial Attacks", "book": "Advances in Neural Information Processing Systems", "page_first": 10408, "page_last": 10418, "abstract": "We propose functional adversarial attacks, a novel class of threat models for crafting adversarial examples to fool machine learning models. Unlike a standard lp-ball threat model, a functional adversarial threat model allows only a single function to be used to perturb input features to produce an adversarial example. For example, a functional adversarial attack applied on colors of an image can change all red pixels simultaneously to light red. Such global uniform changes in images can be less perceptible than perturbing pixels of the image individually. For simplicity, we refer to functional adversarial attacks on image colors as ReColorAdv, which is the main focus of our experiments. We show that functional threat models can be combined with existing additive (lp) threat models to generate stronger threat models that allow both small, individual perturbations and large, uniform changes to an input. Moreover, we prove that such combinations encompass perturbations that would not be allowed in either constituent threat model. In practice, ReColorAdv can significantly reduce the accuracy of a ResNet-32 trained on CIFAR-10. Furthermore, to the best of our knowledge, combining ReColorAdv with other attacks leads to the strongest existing attack even after adversarial training.", "full_text": "Functional Adversarial Attacks\n\nCassidy Laidlaw\n\nUniversity of Maryland\n\nclaidlaw@umd.edu\n\nSoheil Feizi\n\nUniversity of Maryland\nsfeizi@cs.umd.edu\n\nAbstract\n\nWe propose functional adversarial attacks, a novel class of threat models for craft-\ning adversarial examples to fool machine learning models. Unlike a standard \u2113p-ball\nthreat model, a functional adversarial threat model allows only a single function to\nbe used to perturb input features to produce an adversarial example. For example,\na functional adversarial attack applied on colors of an image can change all red\npixels simultaneously to light red. Such global uniform changes in images can be\nless perceptible than perturbing pixels of the image individually. For simplicity,\nwe refer to functional adversarial attacks on image colors as ReColorAdv, which\nis the main focus of our experiments. We show that functional threat models can\nbe combined with existing additive (\u2113p) threat models to generate stronger threat\nmodels that allow both small, individual perturbations and large, uniform changes\nto an input. Moreover, we prove that such combinations encompass perturbations\nthat would not be allowed in either constituent threat model. In practice, ReCol-\norAdv can signi\ufb01cantly reduce the accuracy of a ResNet-32 trained on CIFAR-10.\nFurthermore, to the best of our knowledge, combining ReColorAdv with other\nattacks leads to the strongest existing attack even after adversarial training.\n\n1\n\nIntroduction\n\nThere is an extensive recent literature on adversarial examples, small perturbations to inputs of\nmachine learning algorithms that cause the algorithms to report an erroneous output, e.g. the incorrect\nlabel for a classi\ufb01er. Adversarial examples present serious security challenges for real-world systems\nlike self-driving cars, since a change in the environment that is not noticeable to a human may cause\nunexpected, unwanted, or dangerous behavior. Many methods of generating adversarial examples\n(called adversarial attacks) have been proposed [23, 5, 15, 17, 3]. Defenses against such attacks have\nalso been explored [18, 14, 30].\n\nMost existing attack and defense methods consider a threat model of adversarial attacks where\nadversarial examples can differ from normal inputs by a small \u2113p distance. However, using this\nthreat model that encompasses a simple de\ufb01nition of \"small perturbation\" misses other types of\nperturbations that may also be imperceptible to humans. For instance, small spatial perturbations\nhave been used to generate adversarial examples [4, 27, 26].\n\nIn this paper, we propose a new class of threat models for adversarial attacks, called functional threat\nmodels. Under a functional threat model, adversarial examples can be generated from a regular input\nto a classi\ufb01er by applying a single function to all features of the input:\n\nAdditive threat model:\n\nFunctional threat model:\n\n(x1, . . . , xn) \u2192 (x1 + \u03b41, . . . , xn + \u03b4n)\n(x1, . . . , xn) \u2192 (f (x1), . . . , f (xn))\n\nFor instance, the perturbation function f (\u00b7) could darken every red pixel in an image, or increase\nthe volume of every timestep in an audio sample. Functional threat models are in some ways\nmore restrictive because features cannot be perturbed individually. However, the uniformity of the\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: A visualization of an additive adversarial attack, a functional adversarial attack, and their\ncombination. The additive attack perturbs each feature (pixel) separately, whereas the functional\nattack applies the same function f (\u00b7) to every feature.\n\nperturbation in a functional threat model makes the change less perceptible, allowing for larger\nabsolute modi\ufb01cations. For example, one could darken or lighten an entire image by quite a bit\nwithout the change becoming noticeable. This stands in contrast to separate changes to each pixel,\nwhich must be smaller to avoid becoming perceptible. We discuss various regularizations that can be\napplied to the perturbation function f (\u00b7) to ensure that even large changes are imperceptible.\n\nThe advantages and disadvantages of additive (\u2113p) and functional threat models complement each\nother; additive threat models allow small, individual changes to every feature of an input while\nfunctional threat models allow large, uniform changes. Thus, we combine the threat models (see\n\ufb01gure 1) and show that the combination encompasses more potential perturbations than either one\nseparately, as we explain in the following theorem which is stated more precisely in section 3.2.\nTheorem 1 (informal). Let x be a grayscale image with n \u2265 2 pixels. Consider an additive threat\nmodel that allows changing each pixel by up to a certain amount, and a functional threat model that\nallows darkening or lightening the entire image by a greater amount. Then the combination of these\nthreat models allows potential perturbations that are not allowed in either constituent threat model.\n\nFunctional threat models can be used in a variety of domains such as images (e.g. by uniformly\nchanging image colors), speech/audio (e.g. by changing the \"accent\" of an audio clip), text (e.g. by\nreplacing a word in the entire document with its synonym), or fraud analysis (e.g. by uniformly\nmodifying an actor\u2019s \ufb01nancial activities). Moreover, because functional perturbations are large and\nuniform, they may also be easier to use for physical adversarial examples, where the small pixel-level\nchanges created in additive perturbations could be drowned out by environmental noise.\n\nIn this paper, we will focus on one such domain\u2014images\u2014and de\ufb01ne ReColorAdv, a functional\nadversarial attack on pixel colors (see \ufb01gure 2). In ReColorAdv, we use a \ufb02exibly parameterized\nfunction f to map each pixel color c in the input to a new pixel color f (c) in an adversarial example.\nWe regularize f (\u00b7) both to ensure that no color is perturbed by more than a certain amount, and to\nmake sure that the mapping is smooth, i.e. similar colors are perturbed in a similar way. We show that\nReColorAdv can use colors de\ufb01ned in the standard red, green, blue (RGB) color space and also in\nCIELUV color space, which results in less perceptually different adversarial examples (see \ufb01gure 4).\n\nWe experiment by attacking defended and undefended classi\ufb01ers with ReColorAdv, by itself and in\ncombination with other attacks. We \ufb01nd that ReColorAdv is a strong attack, reducing the accuracy of\na ResNet-32 trained on CIFAR-10 to 3.0%. Combinations of ReColorAdv and other attacks are yet\nmore powerful; one such combination lowers a CIFAR-10 classi\ufb01er\u2019s accuracy to 3.6%, even after\nadversarial training. This is lower than the previous strongest attack of Jordan et al. [10]. We also\ndemonstrate the fragility of adversarial defenses based on an additive threat model by reducing the\naccuracy of a classi\ufb01er trained with TRADES [30] to 5.7%. Although one might attempt to mitigate\nthe ReColorAdv attack by converting images to grayscale before classi\ufb01cation, which removes color\ninformation, we show that this simply decreases a classi\ufb01er\u2019s accuracy (both natural and adversarial).\nFurthermore, we \ufb01nd that combining ReColorAdv with other attacks improves the strength of the\nattack without increasing the perceptual difference, as measured by LPIPS [32], of the generated\nadversarial example.\n\n2\n\n\fFigure 2: Four ImageNet adversarial examples generated by ReColorAdv against an Inception-v4\nclassi\ufb01er. From left to right in each group: original image, adversarial example, magni\ufb01ed difference.\n\nOur contributions are summarized as follows:\n\n\u2022 We introduce a novel class of threat models, functional adversarial threat models, and combine\nthem with existing threat models. We also describe ways of regularizing functional threat models\nto ensure that generated adversarial examples are imperceptible.\n\n\u2022 Theoretically, we prove that additive and functional threat models combine to create a threat\n\nmodel that encompasses more potential perturbations than either threat model alone.\n\n\u2022 Experimentally, we show that ReColorAdv, which uses a functional threat model on images, is\na strong adversarial attack against image classi\ufb01ers. To the best of our knowledge, combining\nReColorAdv with other attacks leads to the strongest existing attack even after adversarial training.\n\n2 Review of Existing Threat Models\n\nIn this section, we de\ufb01ne the problem of generating an adversarial example and review existing\nadversarial threat models and attacks.\n\nProblem De\ufb01nition\n\nConsider a classi\ufb01er g : X n \u2192 Y from a feature space X n to a set of\n\nlabels Y. Given an input x \u2208 X n, an adversarial example is a slight perturbation ex of x such\nthat g(ex) 6= g(x); that is, ex is given a different label than x by the classi\ufb01er. Since the aim of\nan adversarial example is to be perceptually indistinguishable from a normal input, ex is usually\n\nconstrained to be close to x by some threat model. Formally, Jordan et al. [10] de\ufb01ne a threat model\nas a function t : P(X n) \u2192 P(X n), where P denotes the power set. The function t(\u00b7) maps a set\nof classi\ufb01er inputs S to a set of perturbed inputs t(S) that are imperceptibly different. With this\nde\ufb01nition, we can formalize the problem of generating an adversarial example from an input:\n\n\ufb01ndex such that g(ex) 6= g(x) and ex \u2208 t({x})\n\nAdditive Threat Model\nThe most common threat model used when generating adversarial ex-\namples is the additive threat model. Let x = (x1, . . . , xn), where each xi \u2208 X is a feature of x.\nFor instance, xi could correspond to a pixel in an image or the \ufb01lterbank energies for a timestep in\n\nan audio sample. In an additive threat model, we assume ex = (x1 + \u03b41, . . . , xn + \u03b4n); that is, a\nvalue \u03b4i is added to each feature of x to generate the adversarial exampleex. Under this threat model,\n\nperceptual similarity is usually enforced by a bound on the norm of \u03b4 = (\u03b41, . . . , \u03b4n). Thus, the\nadditive threat model is de\ufb01ned as\n\ntadd(S) , {(x1 + \u03b41, . . . , xn + \u03b4n) | (x1, . . . , xn) \u2208 S, k\u03b4k \u2264 \u01eb} .\n\nCommonly used norms include k \u00b7 k2 (Euclidean distance), which constrains the sum of squares of the\n\u03b4i, k \u00b7 k0, which constrains the number of features can be changed, and k \u00b7 k\u221e, which allows changing\neach feature by up to a certain amount. Note that all of the \u03b4i can be modi\ufb01ed individually to generate\na misclassi\ufb01cation, as long as the norm constraint is met. Thus, a small \u01eb is usually necessary because\notherwise the input could be made incomprehensible by noise.\n\nMost previous work on generating adversarial examples has employed the additive threat model.\nThis includes gradient-based methods like FGSM [5], DeepFool [15], and Carlini & Wagner [3], and\ngradient-free methods like SPSA [25] and the Boundary Attack [2].\n\n3\n\n\fOther Threat Models\nSome recent work has focused on spatial threat models, which allow for\nslight perturbations of the locations of features in an input rather than perturbations of the features\nthemselves [27, 26, 4]. Others have proposed threat models based on properties of a 3D renderer\n[29] and constructing adversarial examples with a GAN [22]. Finally, some research has focused on\ncoloring-based threat models through modi\ufb01cation of an image\u2019s hue and saturation [7], inverting\nimages [8], using a colorization network [1], and applying an af\ufb01ne transformation to colors followed\nby PGD [31]. See appendix E for a discussion of non-additive threat models and comparison to our\nproposed functional threat model.\n\n3 Functional Threat Model\n\nIn this section, we de\ufb01ne functional threat model and explore its combinations with existing threat\nmodels. Recall that in the additive threat model, each feature of an input can only be perturbed by a\nsmall amount. Because all the features are changed separately, larger changes could make the input\nunrecognizable. Our key insight is that larger perturbations to an input should be possible if the\ndependencies between features are considered.\n\nUnlike the additive threat model, in the functional threat model the features xi are transformed by a\nsingle function f : X \u2192 X , called the perturbation function. That is,\n\nex = f (x) = (f (x1), . . . , f (xn))\n\nUnder this threat model, features which have the same value in the input must be mapped to the same\nvalue in the adversarial example. Even large perturbations allowed by a functional threat model may\nbe imperceptible to human eyes because they preserve dependencies between features (for example,\nshape boundaries and shading in images, see \ufb01gure 1). Note that the features xi which are modi\ufb01ed\nby the perturbation function f (\u00b7) need not be scalars; depending on the application, vector-valued\nfeatures may be useful.\n\n3.1 Regularizing Functional Threat Models\n\nIn the functional threat model, various regularizations can be used to ensure that the change remains\nimperceptible. In general, we can enforce that f \u2208 F; F is a family of allowed perturbation functions.\nFor instance, we may want to bound by some small \u01eb the maximum difference between the input and\noutput of the perturbation function. In that case, we will have:\n\nFdiff , {f : X \u2192 X | \u2200xi \u2208 X kf (xi) \u2212 xik \u2264 \u01eb}\n\n(1)\n\nFdiff prevents absolute changes of more than a certain amount. Note that the \u01eb bound may be\nhigher than that of an additive model, since uniform changes are less perceptible. However, this\nregularization may not be enough to prevent noticeable changes. Fdiff still includes functions that\nmap similar (but not identical) features very differently. Therefore, a second constraint could be used\nthat forces similar features to be perturbed similarly:\n\nFsmooth , {f | \u2200xi, xj \u2208 X kxi \u2212 xjk \u2264 r \u21d2 k(f (xi) \u2212 xi) \u2212 (f (xj) \u2212 xj)k \u2264 \u01ebsmooth}\n\n(2)\n\nFsmooth requires that similar features are perturbed in the same \"direction\". For instance, if green\npixels in an image are lightened, then yellow-green pixels should be as well.\n\nDepending on the application, these constraints or others may be needed to maintain an imperceptible\nchange. We may want to choose F to be Fdiff, Fsmooth, Fdiff \u2229 Fsmooth, or an entirely different family\nof functions. Once we have chosen an F, we can de\ufb01ne a corresponding functional threat model as\n\ntfunc(S) , {(f (x1), . . . , f (xn)) | (x1, . . . , xn) \u2208 S, f \u2208 F}\n\n3.2 Combining Threat Models\n\nJordan et al. [10] argue that combining multiple threat models allows better approximation of the\ncomplete set of adversarial perturbations which are imperceptible. Here, we show that combining\nthe additive threat model with a simple functional threat model can allow adversarial examples\nwhich are not allowable by either model on its own. The following theorem (proved in appendix A)\n\n4\n\n\fFigure 3: ReColorAdv transforms each pixel in the input image x (left) by the same function f (\u00b7)\n\n(center) to produce an adversarial exampleex (right). The perturbation function f is shown as a vector\n\n\ufb01eld in CIELUV color space.\n\ndemonstrates this on images for a combination of an additive threat model which allows changing\neach pixel by a small, bounded amount and a functional threat model which allows darkening or\nlightening the entire image by up to a larger amount, both of which are arguably imperceptible\ntransformations.\nTheorem 1. Let x be a grayscale image with n \u2265 2 pixels, i.e. x \u2208 [0, 1]n = X n. Let tadd be an\nadditive threat model where the \u2113\u221e distance between input and adversarial example is bounded by\n\u01eb1, i.e. k(\u03b41, . . . , \u03b4n)k\u221e \u2264 \u01eb1. Let tfunc be a functional threat model where f (x) = c x for some\nc \u2208 [1 \u2212 \u01eb2, 1 + \u01eb2] such that \u01eb2 > \u01eb1 > 0. Let tcombined = tadd \u25e6 tfunc. Then the combined threat\nmodel allows adversarial perturbations which are not allowed by either constituent threat model.\nFormally, if S \u2286 X n contains an image x that is not dark, that is \u2203 xi s.t. xi > \u01eb1/\u01eb2, then\n\ntcombined(S) ) tadd(S) \u222a tfunc(S)\n\nor equivalently\n\n\u2203ex s.t. ex \u2208 tcombined(S)\n\nex /\u2208 tadd(S) \u222a tfunc(S)\n\n4 ReColorAdv: Functional Adversarial Attacks on Image Colors\n\nIn this section, we de\ufb01ne ReColorAdv, a novel adversarial attack against image classi\ufb01ers that\nleverages a functional threat model. ReColorAdv generates adversarial examples to fool image\nclassi\ufb01ers by uniformly changing colors of an input image. We treat each pixel xi in the input image\nx as a point in a 3-dimensional color space C \u2286 [0, 1]3. For instance, C could be the normal RGB\ncolor space. In section 4.1, we discuss our use of alternative color spaces. We leverage a perturbation\n\nfunction f : C \u2192 C to produce the adversarial example. Speci\ufb01cally, each pixel in the outputex is\n\nperturbed from the input x by applying f (\u00b7) to the color in that pixel:\n\nxi = (ci,1, ci,2, ci,3) \u2208 C \u2286 [0, 1]3 \u2192 exi = (eci,1,eci,2,eci,3) = f (ci,1, ci,2, ci,3)\n\nFor the purposes of \ufb01nding an f (\u00b7) that generates a successful adversarial example, we need a\nparameterization of the function that allows both \ufb02exibility and ease of computation. To accomplish\nthis, we let G = g1, . . . , gm \u2286 [0, 1]3 be a discrete grid of points (or point lattice) where f is explicitly\nde\ufb01ned. That is, we de\ufb01ne parameters \u03b81, . . . , \u03b8m and let f (gi) = \u03b8i. For points not on the grid, i.e.\nxi /\u2208 G, we de\ufb01ne f (xi) using trilinear interpolation. Trilinear interpolation considers the \"cube\" of\nthe lattice points gj surrounding the argument xi and linearly interpolates the explicitly de\ufb01ned \u03b8j\nvalues at the 8 corners of this cube to calculate f (xi).\n\nConstraints on the perturbation function We enforce two constraints on f (\u00b7) to ensure that the\ncrafted adversarial example is indistinguishable from the original image. These constraints are based\non slight modi\ufb01cations of Fdiff and Fsmooth de\ufb01ned in section 3.1. First, we ensure that no pixel can\nbe perturbed by more than a certain amount along each dimension in color space:\n\nFdiff-col , {f : C \u2192 C | \u2200(c1, c2, c3) \u2208 G |ci \u2212eci| < \u01ebi\n\nThis particular formulation allows us to set different bounds (\u01eb1, \u01eb2, \u01eb3) on the maximum perturbation\nalong each dimension in color space. We also de\ufb01ne a constraint based on Fsmooth, but instead of\nusing a radius parameter r as in (2) we consider the neighbors N (gj) of each lattice point gj in the\ngrid G:\n\ni = 1, 2, 3}\n\nFsmooth-col , {f : C \u2192 C | \u2200gj \u2208 X , gk \u2208 N (gj)\n\nk(f (gj) \u2212 gj) \u2212 (f (gk) \u2212 gk)k2 \u2264 \u01ebsmooth}\n\n5\n\n\fFigure 4: The color space used affects the adversarial example produced by ReColorAdv. The\noriginal image at center is attacked by ReColorAdv with the CIELUV color space (left) and RGB\ncolor space (right). The RGB color space results in noticeable bright green artifacts in the adversarial\nexample, while the perceptually accurate CIELUV color space produces a more realistic perturbation.\n\nIn the above, k \u00b7 k2 is the \u21132 (Euclidean) norm in the color space C. We de\ufb01ne our set of allowable\nperturbation functions as Fcol = Fdiff-col \u2229 Fsmooth-col with parameters (\u01eb1, \u01eb2, \u01eb3, \u01ebsmooth).\n\nOptimization\nTo generate an adversarial example with ReColorAdv, we wish to minimize\nLadv(f, x) subject to f \u2208 Fcol, where Ladv enforces the goal of generating an adversarial exam-\nple that is misclassi\ufb01ed and is de\ufb01ned as the f6 loss from Carlini and Wagner [3], where g(x)i\nrepresents the classi\ufb01er\u2019s ith logit:\n\nLadv(f, x) = max(cid:18)max\n\ni6=y\n\n(g(ex)i \u2212 g(ex)y), 0(cid:19)\n\n(3)\n\nWhen solving this constrained minimization problem, it is easy to constrain f \u2208 Fdiff-col by clipping\nthe perturbation of each color to be within the \u01ebi bounds. However, it is dif\ufb01cult to enforce f \u2208\nFsmooth-col directly. Thus, we instead solve a Lagrangian relaxation where the smoothness constraint\nis replaced by an additional regularization term:\n\narg min\nf \u2208Fdiff-col\n\nLadv(f, x) + \u03bb Lsmooth(f )\n\n(\u2217)\n\nLsmooth(f ) , Xgj \u2208G Xgk\u2208N (gj )\n\nk(f (gj) \u2212 gj) \u2212 (f (gk) \u2212 gk)k2\n\nOur Lsmooth is similar to the loss function used by Xiao et al. [27] to ensure a smooth \ufb02ow \ufb01eld. We\nuse the projected gradient descent (PGD) optimization algorithm to solve (\u2217).\n\n4.1 RGB vs. LUV Color Space\n\nMost image classi\ufb01ers take as input an array of pixels speci\ufb01ed in RGB color space, but the RGB\ncolor space has two disadvantages. The \u2113p distance between points in RGB color space is weakly\ncorrelated with the perceptual difference between the colors they represent. Also, RGB gives no\nseparation between the luma (brightness) and chroma (hue/colorfulness) of a color.\n\nIn contrast, the CIELUV color space separates luma from chroma and places colors such that the\nEuclidean distance between them is roughly equivalent to the perceptual difference [21]. CIELUV\npresents a color by three components (L, U, V ); L is the luma while U and V together de\ufb01ne the\nchroma. We run experiments using both RGB and CIELUV color spaces. CIELUV allows us to\nregularize the perturbation function f (\u00b7) perceptually accurately (see \ufb01gure 4 and appendix B.1).\nWe experimented with the hue, saturation, value (HSV) and YPbPr color spaces as well; however,\nneither is perceptually accurate and the HSV transformation from RGB is dif\ufb01cult to differentiate\n(see appendix C).\n\n6\n\n\fTable 1: Accuracy of adversarially trained models against various combinations of attacks on CIFAR-\n10. Columns correspond to attacks and rows correspond to models trained against a particular attack.\nC(-RGB) is ReColorAdv using CIELUV (RGB) color space, D is delta attack, and S is StAdv attack.\nTRADES is the method of Zhang et al. [30]. For classi\ufb01ers marked (B&W), the images are converted\nto black-and-white before classi\ufb01cation.\n\nNone C-RGB\n\nC\n\nD\n\nS C+S C+D S+D C+S+D\n\nAttack\n\n92.2\n88.7\n84.8\n82.7\n89.5\n88.5\n82.1\n88.9\n\n84.4\n\n88.3\n85.8\n\n5.9\n43.5\n74.9\n16.9\n31.7\n36.3\n66.9\n30.6\n\n3.0\n45.8\n50.6\n8.0\n23.0\n19.5\n42.7\n17.2\n\n0.0\n5.7\n30.6\n0.5\n0.7\n7.5\n35.4\n7.3\n\n0.9\n3.6\n16.0\n26.2\n10.9\n2.7\n21.9\n3.5\n\n0.8\n3.4\n11.7\n4.8\n8.7\n2.8\n13.4\n3.3\n\n81.3\n\n59.2\n\n53.6\n\n26.6\n\n17.5\n\n5.3\n40.8\n\n4.1\n38.9\n\n0.0\n4.2\n\n0.9\n2.5\n\n0.6\n2.5\n\n0.0\n0.9\n8.9\n0.0\n0.5\n5.2\n12.2\n5.5\n\n22.0\n\n0.0\n0.5\n\n0.0\n0.2\n2.7\n0.1\n0.6\n4.1\n7.6\n3.7\n\n8.6\n\n0.0\n0.1\n\n0.0\n0.2\n2.2\n0.0\n0.4\n4.6\n4.1\n3.6\n\n5.7\n\n0.0\n0.2\n\nDefense \u2193\n\nUndefended\nC\nD\nS\nC+S\nC+D\nS+D\nC+S+D\n\nTRADES\n\nUndefended (B&W)\nC (B&W)\n\n5 Experiments\n\nWe evaluate ReColorAdv against defended and undefended neural networks on CIFAR-10 [13] and\nImageNet [20]. For CIFAR-10 we evaluate the attack against a ResNet-32 [6] and for ImageNet we\nevaluate against an Inception-v4 network [24]. We also consider all combinations of ReColorAdv\nwith delta attacks, which use an \u2113\u221e additive threat model with bound \u01eb = 8/255, and the StAdv\nattack of Xiao et al. [27] that perturbs images spatially through a \ufb02ow \ufb01eld. See appendix B for a full\ndiscussion of the hyperparameters and computing infrastructure used in our experiments. We release\nour code at https://github.com/cassidylaidlaw/ReColorAdv.\n\n5.1 Adversarial Training\n\nWe \ufb01rst experiment by attacking adversarially trained models with ReColorAdv and other attacks. For\neach combination of attacks, we adversarially train a ResNet-32 on CIFAR-10 against that particular\ncombination. We attack each of these adversarially trained models with all combinations of attacks.\nThe results of this experiment are shown in the \ufb01rst part of table 1.\n\nCombination attacks are most powerful As expected, combinations of attacks are the strongest\nagainst defended and undefended classi\ufb01ers. In particular, the ReColorAdv + StAdv + delta attack\noften resulted in the lowest classi\ufb01er accuracy. The accuracy of only 3.6% after adversarially training\nagainst the ReColorAdv + StAdv + delta attack is the lowest we know of.\n\nTransferability of robustness across perturbation types While adversarial attack \"transferabil-\nity\" often refers to the ability of attacks to transfer between models [16], here we investigate to what\ndegree a model robust to one type of adversarial perturbation is robust to other types of perturbations,\nsimilarly to Kang et al. [11]. To some degree, the perturbations investigated are orthogonal; that is,\na model trained against a particular type of perturbation is less effective against others. StAdv is\nespecially separate from the other two attacks; models trained against StAdv attacks are still very\nvulnerable to ReColorAdv and delta attacks. However, the ReColorAdv and delta attacks allow more\ntransferable robustness between each other. These results are likely due to the fact that both the\ndelta and ReColorAdv attacks operate on a per-pixel basis, whereas the StAdv attack allows spatial\nmovement of features across pixels.\n\nEffect of color space\nThe ReColorAdv attack using CIELUV color space is stronger than that\nusing RGB color space. In addition, the CIELUV color space produces less perceptible perturbations\n(see \ufb01gure 4). This highlights the need for using perceptually accurate models of color when designing\nand defending against adversarial examples.\n\n7\n\n\fOrig.\n\nC\n\nD\n\nC+D\n\nC+S+D\n\nOrig.\n\nC\n\nD\n\nC+D\n\nC+S+D\n\nFigure 5: Adversarial examples generated with combinations of attacks against a CIFAR-10 WideRes-\nNet [28] trained using TRADES; the difference from the original is shown below each example.\nCombinations of attacks tend to produce less perceptible changes than the attacks do separately.\n\nFigure 6: The perceptual distortion (LPIPS) and strength (error rate) of combinations of ReColorAdv\nand delta attacks with various bounds. The annotated points mark the bounds used in other experi-\nments: C is ReColorAdv, D is a delta attack, and C+D is their combination. Combining the attacks\ndoes not increase perceptable change by much (left), but it greatly increases attack strength (right).\n\n5.2 Other Defenses\n\nTRADES\nTRADES is a training algorithm for deep neural networks that aims to improve robust-\nness to adversarial examples by optimizing a surrogate loss [30]. The algorithm is designed around\nan additive threat model, but we evaluate a TRADES-trained classi\ufb01er on all combinations of attacks\n(see the second part of table 1). This is the best defense method against almost all attacks, despite\nhaving been trained based on just an additive threat model. However, the combined ReColorAdv +\nStAdv + delta attack still reduces the accuracy of the classi\ufb01er to just 5.7%.\n\nGrayscale conversion\nSince ReColorAdv attacks input images by changing their colors, one\nmight attempt to mitigate the attack by converting all images to grayscale before classi\ufb01cation. This\ncould reduce the potential perturbations available to ReColorAdv since altering the chroma of a color\nwould not affect the grayscale image; only changes in luma would. We train models on CIFAR-10\nthat convert all images to grayscale as a preprocessing step both with and without adversarial training\nagainst ReColorAdv. The results of this experiment (see the third part of table 1) show that conversion\nto grayscale is not a viable defense against ReColorAdv. In fact, the natural accuracy and robustness\nagainst almost all attacks decreases when applying grayscale conversion.\n\n5.3 Perceptual Distance\n\nWe quantify the perceptual distortion caused by ReColorAdv attacks using the Learned Perceptual\nImage Patch Similarity (LPIPS) metric, a distance measure between images based on deep network\nactivations which has been shown to correlate with human perception [32]. We combine ReColorAdv\nand delta attacks and vary the bound of each attack (see \ufb01gure 6). We \ufb01nd that the attacks can be\ncombined without much increase, or with even sometimes a decrease, in perceptual difference. As\nJordan et al. [10] \ufb01nd for combinations of StAdv and delta attacks, the lowest perceptual difference\nat a particular attack strength is achieved by a combination of ReColorAdv and delta attacks.\n\n8\n\n\f6 Conclusion\n\nWe have presented functional threat models for adversarial examples, which allow large, uniform\nchanges to an input. They can be combined with additive threat models to provably increase the\npotential perturbations allowed in an adversarial attack. In practice, the ReColorAdv attack, which\nleverages a functional threat model against image pixel colors, is a strong adversarial attack on image\nclassi\ufb01ers. It can also be combined with other attacks to produce yet more powerful attacks\u2014even\nafter adversarial training\u2014without a signi\ufb01cant increase in perceptual distortion. Besides images,\nfunctional adversarial attacks could be designed for audio, text, and other domains. It will be crucial\nto develop defense methods against these attacks, which encompass a more complete threat model of\nwhich potential adversarial examples are imperceptible to humans.\n\nAcknowledgements\n\nThis work was supported in part by NSF award CDS&E:1854532 and award HR00111990077.\n\nReferences\n\n[1] Anand Bhattad, Min Jin Chong, Kaizhao Liang, Bo Li, and David A. Forsyth. Big but Impercep-\ntible Adversarial Perturbations via Semantic Manipulation. arXiv preprint arXiv:1904.06347,\n2019.\n\n[2] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-Based Adversarial Attacks:\nReliable Attacks Against Black-Box Machine Learning Models. International Conference on\nLearning Representations, February 2018.\n\n[3] Nicholas Carlini and David Wagner. Towards Evaluating the Robustness of Neural Networks.\n\nIn 2017 IEEE Symposium on Security and Privacy (SP), pages 39\u201357. IEEE, 2017.\n\n[4] Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. A\nRotation and a Translation Suf\ufb01ce: Fooling CNNs with Simple Transformations. arXiv preprint\narXiv:1712.02779, 2017.\n\n[5] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harnessing Adversar-\n\nial Examples. In International Conference on Learning Representations, 2015.\n\n[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for\nImage Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern\nRecognition, pages 770\u2013778, 2016.\n\n[7] Hossein Hosseini and Radha Poovendran. Semantic Adversarial Examples. In Proceedings of\nthe IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1614\u2013\n1619, 2018.\n\n[8] Hossein Hosseini, Baicen Xiao, Mayoore Jaiswal, and Radha Poovendran. On the Limitation of\nConvolutional Neural Networks in Recognizing Negative Images. In 16th IEEE International\nConference on Machine Learning and Applications (ICMLA), pages 352\u2013358. IEEE, 2017.\n\n[9] Yerlan Idelbayev. Proper ResNet Implementation for CIFAR10/CIFAR100 in pytorch, 2018.\nURL https://github.com/akamaster/pytorch_resnet_cifar10. original-date: 2018-\n01-15T09:50:56Z.\n\n[10] Matt Jordan, Naren Manoj, Surbhi Goel, and Alexandros G. Dimakis. Quantifying Perceptual\nDistortion of Adversarial Examples. arXiv preprint arXiv:1902.08265, February 2019. arXiv:\n1902.08265.\n\n[11] Daniel Kang, Yi Sun, Tom Brown, Dan Hendrycks, and Jacob Steinhardt. Transfer of Adversar-\n\nial Robustness Between Perturbation Types. arXiv preprint arXiv:1905.01034, 2019.\n\n[12] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv\n\npreprint arXiv:1412.6980, 2014.\n\n9\n\n\f[13] Alex Krizhevsky and Geoffrey Hinton. Learning Multiple Layers of Features from Tiny Images.\n\nTechnical report, Citeseer, 2009.\n\n[14] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\n\nTowards Deep Learning Models Resistant to Adversarial Attacks. 2018.\n\n[15] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a Simple\nand Accurate Method to Fool Deep Neural Networks. In Proceedings of the IEEE Conference\non Computer Vision and Pattern Recognition, pages 2574\u20132582, 2016.\n\n[16] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in Machine Learn-\ning: from Phenomena to Black-Box Attacks using Adversarial Samples. arXiv preprint\narXiv:1605.07277, 2016.\n\n[17] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and\nIn IEEE\n\nAnanthram Swami. The Limitations of Deep Learning in Adversarial Settings.\nEuropean Symposium on Security and Privacy (EuroS&P), pages 372\u2013387. IEEE, 2016.\n\n[18] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation\nas a Defense to Adversarial Perturbations Against Deep Neural Networks. In IEEE Symposium\non Security and Privacy (SP), pages 582\u2013597. IEEE, 2016.\n\n[19] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito,\nZeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic Differentiation in\nPyTorch. In NIPS-W, 2017.\n\n[20] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng\nHuang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. ImageNet Large Scale Visual\nRecognition Challenge. International Journal of Computer Vision, 115(3):211\u2013252, 2015.\n\n[21] Jim Schwiegerling. Field Guide to Visual and Ophthalmic Optics. SPIE Publications, Belling-\n\nham, Wash, November 2004. ISBN 978-0-8194-5629-8.\n\n[22] Yang Song, Rui Shu, Nate Kushman, and Stefano Ermon. Constructing Unrestricted Adversarial\nExamples with Generative Models. In Proceedings of the 32nd International Conference on\nNeural Information Processing Systems, pages 8322\u20138333. Curran Associates Inc., 2018.\n\n[23] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Good-\nfellow, and Rob Fergus. Intriguing Properties of Neural Networks. In International Conference\non Learning Representations, 2014.\n\n[24] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. Inception-v4,\nInception-ResNet and the Impact of Residual Connections on Learning. In Thirty-First AAAI\nConference on Arti\ufb01cial Intelligence, 2017.\n\n[25] Jonathan Uesato, Brendan O\u2019Donoghue, Pushmeet Kohli, and Aaron Oord. Adversarial Risk\nand the Dangers of Evaluating Against Weak Attacks. In International Conference on Ma-\nchine Learning, pages 5025\u20135034, July 2018. URL http://proceedings.mlr.press/v80/\nuesato18a.html.\n\n[26] Eric Wong, Frank R. Schmidt, and J. Zico Kolter. Wasserstein Adversarial Examples via\nProjected Sinkhorn Iterations. arXiv preprint arXiv:1902.07906, February 2019. arXiv:\n1902.07906.\n\n[27] Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. Spatially\n\nTransformed Adversarial Examples. arXiv preprint arXiv:1801.02612, 2018.\n\n[28] Sergey Zagoruyko and Nikos Komodakis. Wide Residual Networks. In British Machine Vision\n\nConference. British Machine Vision Association, 2016.\n\n[29] Xiaohui Zeng, Chenxi Liu, Yu-Siang Wang, Weichao Qiu, Lingxi Xie, Yu-Wing Tai, Chi Keung\nTang, and Alan L. Yuille. Adversarial Attacks Beyond the Image Space. In Proceedings of the\nIEEE Conference on Computer Vision and Pattern Recognition, 2019. arXiv: 1711.07183.\n\n10\n\n\f[30] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I.\nJordan. Theoretically Principled Trade-off Between Robustness and Accuracy. In ICML 2019,\n2019.\n\n[31] Huan Zhang, Hongge Chen, Zhao Song, Duane S. Boning, Inderjit S. Dhillon, and Cho-Jui\nHsieh. The Limitations of Adversarial Training and the Blind-Spot Attack. International\nConference on Learning Representations, 2019.\n\n[32] Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The Unrea-\nsonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE\nConference on Computer Vision and Pattern Recognition, pages 586\u2013595, 2018.\n\n11\n\n\f", "award": [], "sourceid": 5500, "authors": [{"given_name": "Cassidy", "family_name": "Laidlaw", "institution": "University of Maryland, College Park"}, {"given_name": "Soheil", "family_name": "Feizi", "institution": "University of Maryland"}]}