{"title": "Cross-Domain Transferability of Adversarial Perturbations", "book": "Advances in Neural Information Processing Systems", "page_first": 12905, "page_last": 12915, "abstract": "Adversarial examples reveal the blind spots of deep neural networks (DNNs) and represent a major concern for security-critical applications. The transferability of adversarial examples makes real-world attacks possible in black-box settings, where the attacker is forbidden to access the internal parameters of the model. The underlying assumption in most adversary generation methods, whether learning an instance-specific or an instance-agnostic perturbation, is the direct or indirect reliance on the original domain-specific data distribution. In this work, for the first time, we demonstrate the existence of domain-invariant adversaries, thereby showing common adversarial space among different datasets and models. To this end, we propose a framework capable of launching highly transferable attacks that crafts adversarial patterns to mislead networks trained on wholly different domains. For instance, an adversarial function learned on Paintings, Cartoons or Medical images can successfully perturb ImageNet samples to fool the classifier, with success rates as high as $\\sim$99\\% ($\\ell_{\\infty} \\le 10$). The core of our proposed adversarial function is a generative network that is trained using a relativistic supervisory signal that enables domain-invariant perturbations. Our approach sets the new state-of-the-art for fooling rates, both under the white-box and black-box scenarios. Furthermore, despite being an instance-agnostic perturbation function, our attack outperforms the conventionally much stronger instance-specific attack methods.", "full_text": "Cross-Domain Transferability of Adversarial\n\nPerturbations\n\nMuzammal Naseer1,2, Salman Khan2,1, Muhammad Haris Khan2,\n\nFahad Shahbaz Khan2,3, Fatih Porikli1\n\n1Australian National University, Canberra, Australia\n\n2Inception Institute of Arti\ufb01cial Intelligence, Abu Dhabi, UAE\n\n3CVL, Department of Electrical Engineering, Link\u00f6ping University, Sweden\n\n{muzammal.naseer,fatih.porikli}@anu.edu.au\n\n{salman.khan,muhammad.haris,fahad.khan}@inceptioniai.org\n\nAbstract\n\nAdversarial examples reveal the blind spots of deep neural networks (DNNs) and\nrepresent a major concern for security-critical applications. The transferability\nof adversarial examples makes real-world attacks possible in black-box settings,\nwhere the attacker is forbidden to access the internal parameters of the model. The\nunderlying assumption in most adversary generation methods, whether learning\nan instance-speci\ufb01c or an instance-agnostic perturbation, is the direct or indirect\nreliance on the original domain-speci\ufb01c data distribution. In this work, for the \ufb01rst\ntime, we demonstrate the existence of domain-invariant adversaries, thereby show-\ning common adversarial space among different datasets and models. To this end,\nwe propose a framework capable of launching highly transferable attacks that crafts\nadversarial patterns to mislead networks trained on entirely different domains. For\ninstance, an adversarial function learned on Paintings, Cartoons or Medical images\ncan successfully perturb ImageNet samples to fool the classi\ufb01er, with success rates\nas high as \u223c99% ((cid:96)\u221e \u2264 10). The core of our proposed adversarial function is a\ngenerative network that is trained using a relativistic supervisory signal that enables\ndomain-invariant perturbations. Our approach sets the new state-of-the-art for fool-\ning rates, both under the white-box and black-box scenarios. Furthermore, despite\nbeing an instance-agnostic perturbation function, our attack outperforms the con-\nventionally much stronger instance-speci\ufb01c attack methods. Code is available at:\nhttps://github.com/Muzammal-Naseer/Cross-domain-perturbations\n\n1\n\nIntroduction\n\nAlbeit displaying remarkable performance across a range of tasks, Deep Neural Networks (DNNs)\nare highly vulnerable to adversarial examples, which are carefully crafted examples generated by\nadding a certain degree of noise (a.k.a. perturbations) to the corresponding original images, typically\nappearing quasi-imperceptible to humans [1]. Importantly, these adversarial examples are transferable\nfrom one network to another, even when the other network fashions a different architecture and\npossibly trained on a different subset of training data [2, 3]. Transferability permits an adversarial\nattack, without knowing the internals of the target network, posing serious security concerns on the\npractical deployment of these models.\nAdversarial perturbations are either instance-speci\ufb01c or instance-agnostic. The instance-speci\ufb01c\nattacks iteratively optimize a perturbation pattern speci\ufb01c to an input sample (e.g., [4, 5, 6, 7, 8, 9,\n10, 11]). In comparison, the instance-agnostic attacks learn a universal perturbation or a function\nthat \ufb01nds adversarial patterns on a data distribution instead of a single sample. For example, [12]\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: Transferable Generative Adversarial Perturbation: We demonstrate that common adversaries\nexist across different image domains and introduce a highly transferable attack approach that carefully crafts\nadversarial patterns to fool classi\ufb01ers trained on totally different domains. Our generative scheme learns to\nreconstruct adversaries on paintings or comics (left) that can successfully fool natural image classi\ufb01ers with high\nfooling rates at the inference time (right).\n\nproposed universal adversarial perturbations that can fool a model on the majority of the source\ndataset images. To reduce dependency on the input data samples, [13] maximizes layer activations of\nthe source network while [14] extracts deluding perturbations using class impressions relying on the\nsource label space. To enhance the transferability of instance-agnostic approaches, recent generative\nmodels attempt to directly craft perturbations using an adversarially trained function [15, 16].\nWe observe that most prior works on crafting adversarial attacks suffer from two pivotal limitations\nthat restrict their transferability to real-world scenarios. (a) Existing attacks rely directly or indirectly\non the source (training) data, which hampers their transferability to other domains. From a practical\nstandpoint, source domain can be unknown, or the domain-speci\ufb01c data may be unavailable to the\nattacker. Therefore, a true \"black-box\" attack must be able to fool learned models across different\ntarget domains without ever being explicitly trained on those data domains. (b) Instance-agnostic\nattacks, compared with their counterparts, are far more scalable to large datasets as they avoid\nexpensive per-instance iterative optimization. However, they demonstrate weaker transferability rates\nthan the instance-speci\ufb01c attacks. Consequently, the design of highly transferable instance-agnostic\nattacks that also generalize across unseen domains is largely an unsolved problem.\nIn this work, we introduce \u2018domain-agnostic\u2019 generation of adversarial examples, with the aim of\nrelaxing the source data reliance assumption. In particular, we propose a \ufb02exible framework capable\nof launching vastly transferable adversarial attacks, e.g., perturbations found on paintings, comics\nor medical images are shown to trick natural image classi\ufb01ers trained on ImageNet dataset with\nhigh fooling rates. A distinguishing feature of our approach is the introduction of relativistic loss\nthat explicitly enforces learning of domain-invariant adversarial patterns. Our attack algorithm is\nhighly scalable to large-scale datasets since it learns a universal adversarial function that avoids\nexpensive iterative optimization from instance-speci\ufb01c attacks. While enjoying the ef\ufb01cient inference\ntime of instance-agnostic methods, our algorithm outperforms all existing attack methods (both\ninstance-speci\ufb01c and agnostic) by a signi\ufb01cant margin (\u223c 86.46% average increase in fooling rate\nfrom naturally trained Inception-v3 to adversarially trained models in comparison to state-of-the-art\n[10]) and sets the new state-of-the-art under both white-box and black-box settings. Figure 1 provides\nan overview of our approach.\n\n2 Related Work\n\nImage-dependent Perturbations: Several approaches target creation of image-dependent perturba-\ntions. [17] noticed that despite exhibiting impressive performance, neural networks can be fooled\nthrough maliciously crafted perturbations that appear quasi-imperceptible to humans. Following this\n\n2\n\n\f\ufb01nding, many approaches [4, 5, 6, 7, 8, 9] investigate the existence of these perturbations. They either\napply gradient ascent in the pixel space or solve complex optimizations. Recently, a few methods\n[18, 10] propose input or gradient transformation modules to improve the transferability of adversarial\nexamples. A common characteristic of the aforementioned approaches is their data-dependence; the\nperturbations are computed for each data-point separately in a mutually exclusive way. Further, these\napproaches render inef\ufb01ciently at inference time since they iterate on the input multiple times. In\ncontrast, we resort to a data-independent approach based on a generator, demonstrating improved\ninference-time ef\ufb01ciency along with high transferability rates.\nUniversal Adversarial Perturbation: Seminal work of [12] introduces the existence of Universal\nAdversarial Perturbation (UAP). It is a single noise vector which when added to a data-point can\nfool a pretrained model. [12] crafts UAP in an iterative fashion utilizing target data-points that is\ncapable of \ufb02ipping their labels. Though it can generate image-agnostic UAP, the success ratio of\ntheir attack is proportional to the number of training samples used for crafting UAP. [13] proposes\na so-called data-independent algorithm by maximizing the product of mean activations at multiple\nlayers given a universal perturbation as input. This method crafts a so-called data-independent\nperturbation, however, the attack success ratio is not comparable to [12]. Instead, we propose a fully\ndistribution-agnostic approach that crafts adversarial examples directly from a learned generator, as\nopposed to \ufb01rst generating perturbations followed by their addition to images.\nGenerator-oriented Perturbations: Another branch of attacks leverage generative models to craft\nadversaries. [15] learns a generator network to perturb images, however, the unbounded perturbation\nmagnitude in their case might render perceptible perturbations at test time. [19] trains conditional\ngenerators to learn original data manifold and searches the latent space conditioned on the human\nrecognizable target class that is mis-classi\ufb01ed by a target classier. [20] apply generative adversarial\nnetworks to craft visually realistic perturbations and build distilled network to perform black-box\nattack. Similarly, [16, 14] train generators to create adversaries to launch attacks; the former uses\ntarget data directly and the latter relies on class impressions.\nA common trait of prior work is that they either rely directly (or indirectly) upon the data distribution\nand/or entail access to its label space for creating adversarial examples (Table 1). In contrast, we\npropose a \ufb02exible, distribution-agnostic approach - inculcating relativistic loss - to craft adversarial\nexamples that achieves state-of-the-art results both under white-box and black-box attack settings.\n\nMethod\n\nFFF [13]\nAAA [14]\nUAP [12]\nGAP [16]\nRHP [11]\n\nData Type\n\nPretrained-net/data\nClass Impressions\n\nImageNet\nImageNet\nImageNet\n\nOurs\n\nArbitrary (Paintings, Comics, Medical scans etc.)\n\nTransfer Label Cross-domain\nStrength Agnostic\n\nAttack\n\nLow\n\nMedium\n\nLow\n\nMedium\nMedium\n\nHigh\n\n\u0013\n\n\u0017\n\n\u0017\n\n\u0017\n\n\u0017\n\n\u0013\n\n\u0017\n\n\u0017\n\n\u0017\n\n\u0017\n\n\u0017\n\n\u0013\n\nTable 1: A comparison of different attack methods based on their dependency on data distribution and labels.\n\n3 Cross-Domain Transferable Perturbations\n\nOur proposed approach is based on a generative model that is trained using an adversarial mechanism.\nAssume we have an input image xs belonging to a source domain Xs \u2208 (cid:82)s. We aim to train\na universal function that learns to add a perturbation pattern \u03b4 on the source domain which can\nsuccessfully fool a network trained on source Xs as well as any target domain Xt \u2282 (cid:82)t when fed\nwith perturbed inputs x(cid:48)\nt = xt + \u03b4. Importantly, our training is only performed on the unlabelled\nsource domain dataset with ns samples: {xi\ns}ns\ni=1 and the target domain is not used at all during\ntraining. For brevity, in the following discussion, we will only refer the input and perturbed images\nusing x and x(cid:48) respectively and the domain will be clear from the context.\nThe proposed framework consists of a generator G\u03b8(x) and a discriminator D\u03c8(x) parameterized by\n\u03b8 and \u03c8. In our case, we initialize discriminator with a pretrained network and the parameters \u03c8 are\nremained \ufb01xed while the G\u03b8 is learned. The output of G\u03b8 is scaled to have a \ufb01xed norm and it lies\n\nwithin a bound; x(cid:48) = clip(cid:0) min(x + \u0001, max(G\u03b8(x), x \u2212 \u0001))(cid:1). The perturbed images x(cid:48) as well as\n\n3\n\n\fFigure 2: The proposed generative framework seeks to maximize the \u2018fooling gap\u2019 that helps in achieving very\nhigh transferability rates across domains. The orange dashed line shows the \ufb02ow of gradients, notably only the\ngenerator is tuned in the whole pipeline to fool the pretrained discriminator.\n\nthe real images x are passed through the discriminator. The output of the discriminator denotes the\nclass probabilities D\u03c8(x, x(cid:48)) \u2208 [0, 1]c, where c is the number of classes. This is different from the\ntraditional GAN framework where a discriminator only estimate whether an input is real or fake. For\nan adversarial attack, the goal is to fool a network on most examples by making minor changes to its\ninputs, i.e.,\n\ns.t., (cid:80)(cid:0)argmaxj(D\u03c8(x(cid:48))j) (cid:54)= argmaxj(D\u03c8(x)j)(cid:1) > fr,\n\n(1)\nwhere, fr is the fooling ratio, y is the ground-truth label for the example x and the predictions on\nclean images x are given by, y = argmaxj(D\u03c8(x)j). Note that we do not necessarily require the\nground-truth labels of source domain images to craft a successful attack. In the case of adversarial\nattacks based on a traditional GAN framework, the following objective is maximized for the generator\nto achieve the maximal fooling rate:\n\n(cid:107) \u03b4 (cid:107)\u221e\u2264 \u0001,\n\n\u03b8\u2217 \u2190 argmax\n\nCROSSENTROPY(D\u03c8(x(cid:48)), (cid:49)y),\n\n(2)\n\n\u03b8\n\nwhere (cid:49)y is the one-hot encoded label vector for an input example x. The above objective seeks to\nmaximize the discriminator error on the perturbed images that are output from the generator network.\nWe argue that the objective given by Eq. 2 does not directly enforce transferability for the generated\nperturbations \u03b4. This is primarily due to the reason that the discriminator\u2019s response for clean\nexamples is totally ignored in the conventional generative attacks. Here, inspired by the generative\nadversarial network in [21], we propose a relativistic adversarial perturbation (RAP) generation\napproach that explicitly takes in to account the discriminator\u2019s predictions on clean images. Along-\nside reducing the classi\ufb01er\u2019s con\ufb01dence on perturbed images, the attack algorithm also forces the\ndiscriminator to maintain a high con\ufb01dence scores for the clean samples. The proposed relativistic\nobjective is given by:\n\n\u03b8\u2217 \u2190 argmax\n\nCROSSENTROPY(D\u03c8(x(cid:48)) \u2212 D\u03c8(x), (cid:49)y).\n\n(3)\n\n\u03b8\n\nThe cross entropy loss would be higher when the perturbed image is scored signi\ufb01cantly lower than\nthe clean image response for the ground-truth class i.e., D\u03c8(x(cid:48))y << D\u03c8(x)y. The discriminator\nbasically seeks to increase the \u2018fooling gap\u2019 (D\u03c8(x(cid:48))y \u2212 D\u03c8(x)y) between the true and perturbed\nsamples. Through such relative discrimination, we not only report better transferability rates across\nnetworks trained on the same domain, but most importantly show excellent cross-domain transfer rates\nfor the instance-agnostic perturbations. We attribute this behaviour to the fact that once a perturbation\npattern is optimized using the proposed loss on a source distribution (e.g., paintings, cartoon images),\nthe generator learns a \"contrastive\" signal that is agnostic to the underlying distribution. As a result,\nwhen the same perturbation pattern is applied to networks trained on totally different domain (e.g.,\nnatural images), it still achieves the state-of-the-art attack transferability rates. Table 2 shows the gain\nin transferability when using relativistic cross-entropy (Eq. 3) in comparison to simple cross-entropy\nloss (Eq. 2).\nFor an untargeted attack, the above mentioned objective in Eq. 2 and 3 suf\ufb01ces, however, for a\ntargeted adversarial attack, the prediction for the perturbed image must match a given target class y(cid:48)\ni.e., argmaxj(D\u03c8(x(cid:48))j) = y(cid:48) (cid:54)= y. For such a case, we employ the following loss function:\nCROSSENTROPY(D\u03c8(x(cid:48)), (cid:49)y(cid:48)) + CROSSENTROPY(D\u03c8(x), (cid:49)y).\n\n\u03b8\u2217 \u2190 argmin\n\n(4)\n\nThe overall training scheme for the generative network is given in Algorithm 1.\n\n\u03b8\n\n4\n\n\fAlgorithm 1 Generator Training for Relativistic Adversarial Perturbations\n1: A pretrained classi\ufb01er D\u03c8, arbitrary training data distribution X , perturbation budget \u0001, loss criteria L.\n2: Randomly initialize generator network G\u03b8\n3: repeat\n4:\n5:\n6:\n7:\n8:\n9: until model convergence.\n\nSample mini-batch of data from the training set.\nUse the current state of the generator, G\u03b8, to generate unbounded adversaries.\nProject adversaries, G\u03b8(x), within a valid perturbation budget to obtain x(cid:48) such that (cid:107)x(cid:48) \u2212 x(cid:107)\u221e \u2264 \u0001.\nForward pass x(cid:48) to D\u03c8 and compute loss given in Eq. (3)/Eq. (4) for targeted/untargeted attack.\nBackward pass and update the generator, G\u03b8, parameters to maximize the loss.\n\nFigure 3: Loss and gradients trend for CE and RCE loss functions. Results are reported with VGG16 network\non 100 random images for MI-FGSM attack. Trends are shown in log scale.\n\n4 Gradient Perspective of Relativistic Cross-Entropy\n\nAdversarial perturbations are crafted via loss function gradients. An effective loss function helps\nin the generation of perturbations by back-propagating stronger gradients. Below, we show that\nRelativistic Cross-Entropy (RCE) ensures this requisite and thus leads to better performance than\nregular Cross-Entropy (CE) loss.\nSuppose, the logit-space outputs from the discriminator (pretrained classi\ufb01er) corresponding to\na clean image (x) and a perturbed image (x\u2019) are denoted by a and a(cid:48), respectively. Then,\n\nk(cid:1) is the cross-entropy loss for a perturbed input x(cid:48). For clarity, we\n\nCE(a(cid:48), y)=\u2212 log(cid:0)ea(cid:48)\ny /(cid:80)\ny /(cid:80)\n\nk ea(cid:48)\nk. The derivative of p(cid:48)\n\nde\ufb01ne p(cid:48)\nrule, the derivative of cross-entropy loss is given by:\n\ny = ea(cid:48)\n\ny w.r.t a(cid:48)\n\nk ea(cid:48)\n\ny([[i=y]] \u2212 p(cid:48)\n\ni). Using chain\n\ni is \u2202p(cid:48)\n\ny/\u2202a(cid:48)\n\ni = p(cid:48)\n\n\u2202CE\n\u2202a(cid:48)\n\nFor the relativistic loss formulated as RCE(a(cid:48), a, y)=\u2212 log(cid:0)ea(cid:48)\nry=(cid:0)ea(cid:48)\n\nk\u2212ak(cid:1). The derivative of ry w.r.t a(cid:48)\n\ny\u2212ay /(cid:80)\n\ni is \u2202ry/\u2202a(cid:48)\n\nk ea(cid:48)\n\ny\u2212ay /(cid:80)\n\ni\n\nchain rule, RCE derivative w.r.t to a(cid:48)\n\n= p(cid:48)\n\ni \u2212 [[i=y]].\n\ni is given by:\n\n(5)\n\nk\u2212ak(cid:1), we de\ufb01ne\n\nk ea(cid:48)\n\ni = ri([[i=y]] \u2212 ry). From\n\n= ri \u2212 [[i=y]].\nIn light of above relation, RCE has three important properties:\n\ni\n\n\u2202RCE\n\u2202a(cid:48)\n\n(6)\n\nas opposed to only scores a(cid:48)\nas an explicit objective during optimization.\n\n1. Comparing (Eq.5) with (Eq.6) shows that RCE gradient is a function of \u2018difference\u2019 (a(cid:48)\n\ny\u2212ay)\ny in CE loss. Thus, it measures the relative change in prediction\n2. RCE loss back-propagates larger gradients compared to CE, resulting in ef\ufb01cient training and\nstronger adversaries (see Figure 3 for empirical evidence). Sketch Proof: We can factorize\nthe denominator in (Eq. 6) as follows: \u2202RCE/\u2202a(cid:48)\n[[i=y]]. Consider the fact that maximization of RCE is only possible when e(a(cid:48)\n\nk(cid:54)=y ea(cid:48)\ny\u2212ay) decreases\n\ny\u2212ay +(cid:80)\n\nk\u2212ak )(cid:1)\u2212\n\ni =(cid:0)ea(cid:48)\n\ny\u2212ay /(ea(cid:48)\n\n5\n\n12345678910Number of iterations012345LossLoss Trend Over Iterations12345678910Number of iterations012345Taxicab NormGradients Trend Over Iterations\fand(cid:80)\n\nk(cid:54)=y e(a(cid:48)\n\nk\u2212ak) increases. Generally,\n\ntrained model and a(cid:48)\n\u2202RCE/\u2202a(cid:48)\nsimple words, the gradient strength of RCE is higher than CE.\n\nay (cid:29) ak(cid:54)=y for the score generated by a pre-\nk(cid:54)=y (here k denotes an incorrectly predicted class). Thus,\nk). In\n\nk\u2212ak) > (cid:80)\n\ny) and(cid:80)\n\ni > \u2202CE/\u2202a(cid:48)\n\ny (cid:28) a(cid:48)\n\ni since e(a(cid:48)\n\ny\u2212ay) < e(a(cid:48)\n\nk(cid:54)=y e(a(cid:48)\n\nk(cid:54)=y e(a(cid:48)\n\n3. In case x is misclassi\ufb01ed by F(\u00b7), the gradient strength of RCE is still higher than CE (here\nnoise update with the CE loss will be weaker since adversary\u2019s goal is already achieved i.e.,\nx is misclassi\ufb01ed).\n\nLoss\nCross Entropy (CE)\nRelativistic CE\n\nVGG-16 VGG-19 Squeeze-v1.1 Dense-121\n\n79.21\n86.95\n\n78.96\n85.88\n\n69.32\n77.81\n\n66.45\n75.21\n\nTable 2: Effect of Relativistic loss on trans-\nferability in terms of fooling rate (%) on Im-\nageNet val-set. Generator is trained against\nResNet-152 on Paintings dataset.\n\n5 Experiments\n\n5.1 Rules of the Game\n\nWe report results using following three different attack settings in our experiments: (a) White-box.\nAttacker has access to the original model (both architecture and parameters) and the training data\ndistribution. (b) Black-box. Attacker has access to a pretrained model on the same distribution but\nwithout any knowledge of the target architecture and target data distribution. (c) Cross-domain\nBlack-box. Attacker has neither access to (any) pretrained model, nor to its label space and its\ntraining data distribution. It then has to seek a transferable adversarial function that is learned from a\nmodel pretrained on a possibly different distribution than the original. Hence, this setting is relatively\nfar more challenging than the plain black-box setting.\n\nPerturbation Attack\n\nl\u221e \u2264 10\n\nl\u221e \u2264 16\n\nl\u221e \u2264 32\n\nGaussian Noise\nOurs-Paintings\nOurs-Comics\nOurs-ChestX\nGaussian Noise\nOurs-Paintings\nOurs-Comics\nOurs-ChestX\nGaussian Noise\nOurs-Paintings\nOurs-Comics\nOurs-ChestX\n\nVGG-19\n\nResNet-50\n\nDense-121\n\nFool Rate (\u2191) Top-1 (\u2193) Fool Rate (\u2191) Top-1 (\u2193) Fool Rate (\u2191) Top-1 (\u2193)\n70.30\n62.0\n60.40\n67.63\n66.70\n49.76\n45.17\n59.75\n54.37\n33.46\n26.18\n36.98\n\n70.74\n60.77\n59.26\n67.72\n66.07\n47.62\n43.91\n58.6\n48.40\n28.77\n26.12\n34.85\n\n17.05\n29.00\n31.81\n20.53\n23.30\n44.50\n50.37\n31.81\n39.90\n63.78\n71.85\n59.49\n\n18.06\n31.52\n33.69\n22.00\n25.76\n47.51\n51.78\n34.49\n47.21\n69.05\n71.91\n62.17\n\n23.59\n47.12\n48.47\n40.81\n33.80\n66.52\n67.75\n62.14\n61.07\n87.08\n87.90\n88.12\n\n64.65\n46.68\n45.78\n50.11\n57.92\n30.21\n29.25\n33.95\n35.48\n11.96\n11.17\n10.92\n\nTable 3: Cross-Domain Black-box: Untargeted attack success (%) in terms of fooling rate on ImageNet val-set.\nAdversarial generators are trained against ChexNet on Paintings, Comics and ChestX datasets. Perturbation\nbudget, l\u221e \u2264 10/16/32, is chosen as per the standard practice. Even without the knowledge of targeted model,\nits label space and its training data distribution, the transferability rate is much higher than the Gaussian noise.\n\n5.2 Experimental Settings\n\nGenerator Architecture. We chose ResNet architecture introduced in [22] as the generator network\nG\u03b8; it consists of downsampling, residual and upsampling blocks. For training, we used Adam\noptimizer [23] with a learning rate of 1e-4 and values of exponential decay rate for \ufb01rst and second\nmoments set to 0.5 and 0.999, respectively. Generators are learned against the four pretrained\nImageNet models including VGG-16, VGG-19 [24], Inception (Inc-v3) [25], ResNet-152 [26] and\nChexNet (which is a Dense-121 [27] network trained to diagnose pneumonia) [28].\nDatasets. We consider the following datasets for generator training namely Paintings [29], Comics\n[30], ImageNet and a subset of ChestX-ray (ChestX) [28]. There are approximately 80k samples in\nPaintings, 50k in Comics, 1.2 million in ImageNet training set and 10k in ChestX.\n\n6\n\n\fBee Eater\n\nCardoon\n\nImpala\n\nAnemone Fish\n\nCrane\n\nJigsaw Puzzle\n\nJigsaw Puzzle\n\nJigsaw Puzzle\n\nJigsaw Puzzle\n\nJigsaw Puzzle\n\nFigure 4: Untargeted adversaries produced by generator trained against Inception-v3 on Paintings dataset. 1st\nrow shows original images while 2nd row shows unrestricted outputs of adversarial generator and 3rd row are\nadversaries after valid projection. Perturbation budget is set to l\u221e \u2264 10.\n\nFigure 5: Illustration of attention shift. We use [32] to visualize attention maps of clean (1st row) and adversarial\n(2nd row) images. Adversarial images are obtained by training generator against VGG-16 on Paintings dataset.\n\nInference: Inference is performed on ImageNet validation set (val-set) (50k samples), a subset (5k\nsamples) of ImageNet proposed by [11] and ImageNet-NeurIPS [31] (1k samples) dataset.\nEvaluation Metrics: We use the fooling rate (percentage of input samples for which predicted label\nis \ufb02ipped after adding adversarial perturbations), top-1 accuracy and % increase in error rate (the\ndifference between error rate of adversarial and clean images) to evaluate our proposed approach.\n\n5.2.1 Results\n\nTable 3 shows the cross-domain black-box setting results, where attacker have no access to model\narchitecture, parameters, its training distribution or label space. Note that ChestX [28] does not have\nmuch texture, an important feature to deceive ImageNet models [33], yet the transferability rate of\nperturbations learned against ChexNet is much better than the Gaussian noise.\nTables 4 and 5 show the comparison of our method against different universal methods on both\nnaturally and adversarially trained models [34] (Inc-v3, Inc-v4 and IncRes-v2). Our attack success rate\nis much higher both in white-box and black-box settings. Notably, for the case of adversarially trained\nmodels, Gaussian smoothing on top of our approach leads to signi\ufb01cant increase in transferability.\nWe provide further comparison with GAP [16] in the supplementary material. Figures 4 and 5 show\nthe model\u2019s output and attention shift on example adversaries.\n\n7\n\n\fModel Attack\n\nVGG-16 VGG-19 ResNet-152\n47.10\u2217\nFFF\n71.59\u2217\nAAA\n78.30\u2217\nUAP\n99.58\u2217\nOurs-Paintings\n99.83\u2217\nOurs-Comics\nOurs-ImageNet 99.75\u2217\n38.19\nFFF\n69.45\nAAA\n73.50\nUAP\n98.90\nOurs-Paintings\n99.29\nOurs-Comics\nOurs-ImageNet\n99.19\n19.23\nFFF\n47.21\nAAA\n47.00\nUAP\n86.95\nOurs-Paintings\n88.94\nOurs-Comics\n95.40\nOurs-ImageNet\n\n27.82\n45.33\n63.40\n47.90\n58.18\n52.64\n26.34\n51.74\n58.00\n40.98\n42.61\n53.02\n29.78\u2217\n60.72\u2217\n84.0\u2217\n98.03\u2217\n94.18\u2217\n99.02\u2217\n\n41.98\n65.64\n73.10\n98.97\n99.56\n99.44\n43.60\u2217\n72.84\u2217\n77.80\u2217\n99.61\u2217\n99.76\u2217\n99.80\u2217\n17.15\n48.78\n45.5\n85.88\n88.84\n93.26\n\n6\n1\n-\nG\nG\nV\n\n9\n1\n-\nG\nG\nV\n\n2\n5\n1\n-\nt\ne\nN\ns\ne\nR\n\nTable 4: White & Black-box Setting: Fool rate (%)\nof untargeted attack on ImageNet val-set. Perturba-\ntion budget is l\u221e\u226410. * indicates white-box attack.\nOur attack\u2019s transferability from ResNet-152 to VGG-\n16/19 is even higher than other white-box attacks.\n\n5.2.2 Comparison with State-of-the-Art\n\nInc-v4\n\nInc-v3\n\nModel\n\nAttack\nUAP\nGAP\nRHP\nUAP\nRHP\nIncRes-v2 UAP\nRHP\n\nInc-v3ens3\n1.00/7.82\n5.48/33.3\n32.5/60.8\n2.08/7.68\n27.5/60.3\n1.88/8.28\n29.7/62.3\n33.92/72.46\n47.78/73.06\n21.06/67.5\n34.52/70.3\n28.34/71.3\n41.06/71.96\n\nInc-v3ens4\n1.80/5.60\n4.14/29.4\n31.6/58.7\n1.94/6.92\n26.7/62.5\n1.74/7.22\n29.8/63.3\n38.94/71.4\n48.18/72.68\n24.1/68.72\n56.54/69.9\n29.9/66.72\n42.68/71.58\n\nIncRes-v2ens\n1.88/5.60\n3.76/22.5\n24.6/57.0\n2.34/6.78\n21.2/58.5\n1.96/8.18\n26.8/62.8\nOurs-Paintings\n33.24/69.66\n42.86/73.3\nOurs-gs-Paintings\nOurs-Comics\n12.82/54.72\nOurs-gs-Comics\n23.58/68.02\nOurs-ImageNet\n19.84/60.88\nOurs-gs-ImageNet\n37.4/72.86\nTable 5: Black-box Setting: Transferability compar-\nison in terms of % increase in error rate after attack.\nResults are reported on subset of ImageNet (5k) with\nperturbation budget of l\u221e \u2264 16/32. Our generators are\ntrained against naturally trained Inc-v3 only. \u2018gs\u2019 repre-\nsents Gaussian smoothing applied to generator output\nbefore projection that enhances our attack strength.\n\nFinally, we compare our method with recently proposed instance-speci\ufb01c attack method [10] that\nexhibits high transferability to adversarially trained models. For the very \ufb01rst time in literature, we\nshowed that a universal function like ours can attain much higher transferability rate, outperforming\nthe state-of-the-art instance-speci\ufb01c translation invariant method [10] by a large average absolute gain\nof 46.6% and 86.5% (in fooling rates) on both naturally and adversarially trained models, respectively,\nas reported in Table 6. The naturally trained models are Inception-v3 (Inc-v3) [25], Inception-v4\n(Inc-v4), Inception Resnet v2 (IncRes-v2) [35] and Resnet v2-152 (Res-152) [36]). The adversarially\ntrained models are from [34].\n\nAttack\n\nNaturally Trained\n\nAdversarially Trained\n\n3\nv\n-\nc\nn\nI\n\n2\nv\n-\ns\ne\nR\nc\nn\nI\n\n2\n5\n1\n-\ns\ne\nR\n\nInc-v3 Inc-v4 IncRes-v2 Res-152 Inc-v3ens3 Inc-v3ens4 IncRes-v2ens\n79.6\u2217\n35.9\nFGSM\n75.5\u2217\n37.3\nTI-FGSM\n97.8\u2217\n47.1\nMI-FGSM\nTI-MI-FGSM 97.9\u2217\n52.4\n98.3\u2217\n73.8\nDIM\n98.5\u2217\n75.2\nTI-DIM\n36.1\nFGSM\n44.3\n41.5\n49.7\nTI-FGSM\n64.8\nMI-FGSM\n74.8\n69.5\nTI-MI-FGSM 76.1\n83.5\n86.1\nDIM\n85.5\nTI-DIM\n86.4\n34.0\n40.1\nFGSM\n39.3\n46.4\nTI-FGSM\n48.1\nMI-FGSM\n54.2\n50.9\nTI-MI-FGSM 55.6\n77.8\nDIM\n77.0\n77.0\n73.9\nTI-DIM\n100.0\u2217 99.7\n98.5\n99.8\n97.0\n99.1\n95.4\n\n30.6\n32.1\n46.4\n47.9\n67.8\n69.2\n64.3\u2217\n63.7\u2217\n100.0\u2217\n100.0\u2217\n99.1\u2217\n98.8\u2217\n30.3\n33.4\n44.3\n45.1\n73.5\n73.2\n99.8\n97.6\n99.8\n93.4\n97.5\n90.5\n\n30.2\n34.1\n38.7\n41.1\n58.4\n59.2\n31.9\n40.1\n54.5\n59.6\n73.5\n76.3\n81.3\u2217\n78.9\u2217\n97.5\u2217\n97.4\u2217\n97.4\u2217\n97.2\u2217\n98.9\n93.6\n98.7\n87.7\n98.1\n91.8\n\n14.7\n28.9\n17.4\n35.1\n24.3\n47.1\n17.2\n34.5\n23.7\n51.7\n40.0\n60.1\n17.7\n34.5\n23.7\n37.7\n36.0\n58.8\n74.6\n83.9\n46.8\n58.8\n60.5\n78.4\n\n7.0\n22.3\n9.5\n25.8\n13.0\n37.4\n10.2\n27.8\n13.3\n49.3\n27.9\n59.5\n9.9\n27.8\n13.3\n32.8\n24.1\n42.8\n64.8\n75.9\n23.3\n42.8\n36.4\n68.9\n\n15.6\n28.2\n20.5\n35.8\n24.2\n46.9\n18.0\n34.6\n25.1\n50.7\n41.2\n61.3\n20.2\n34.6\n25.1\n39.9\n40.5\n60.3\n69.3\n85.2\n39.3\n60.3\n55.4\n78.6\n\nOurs-Paintings\nOurs-gs-Paintings 99.9\u2217\n99.9\u2217\nOurs-Comics\n99.9\u2217\nOurs-gs-Comics\n99.8\u2217\nOurs-ImageNet\nOurs-gs-ImageNet 98.9\u2217\n\nTable 6: White-box and\nBlack-box: Transferabil-\nity comparisons.\nSuc-\ncess rate on ImageNet-\nNeurIPS validation set\n(1k images) is reported\nby creating adversaries\nwithin the perturbation\nbudget of l\u221e \u2264 16,\nas per the standard prac-\ntice [10]. Our generators\nare learned against nat-\nurally trained Inception-\n\u2217 indicates\nv3 only.\nwhite-box attack.\n\u2018gs\u2019\nis Gaussian smoothing\napplied to the generator\noutput before projection.\nSmoothing leads to slight\ndecrease in transferabil-\nity on naturally trained\nbut shows signi\ufb01cant in-\ncrease against adversari-\nally trained models.\n\n8\n\n\f(a) Naturally Trained IncRes-v2\n\n(b) Adversarially Trained IncRes-v2\n\n(c) Naturally Trained IncRes-v2\n\n(d) Adversarially Trained IncRes-v2\n\nFigure 6: Effect of Gaussian kernel size and number of training epochs is shown on the transferability (in %age\nfool rate) of adversarial examples. Generator is trained against Inception-v3 on Paintings, while the inference is\nperformed on ImageNet-NeurIPS. Firstly, as number of epochs increases, transferability against naturally trained\nIncRes-v2 increases while decreases against its adversarially trained version. Secondly, as the size of Gaussian\nkernel increases, transferability against naturally as well as adversarially trained IncRes-v2 decreases. Applying\nkernel of size 3 leads to optimal results against adversarially trained model. Perturbation is set to l\u221e \u2264 16.\n\n5.3 Transferability: Naturally Trained vs. Adversarially Trained\n\nFurthermore, we study the impact of training iterations and Gaussian smoothing [10] on the transfer-\nability of our generative adversarial examples. We report results using naturally and adversarially\ntrained IncRes-v2 model [35] as other models exhibit similar behaviour. Figure 6 displays the\ntransferability (in %age accuracy) as a function of the number of training epochs (a-b) and various\nkernel sizes for Gaussian smoothing (c-d).\nFirstly, we observe a gradual increase in the transferability of generator against the naturally trained\nmodel as the training epochs advance. In contrast the transferability deteriorates against the adver-\nsarially trained model. Therefore, when targeting naturally trained models, we train for ten epochs\non Paintings, Comics, and ChestX datasets (although we anticipate better performance for higher\nepochs). When targeting adversarially trained models, we deploy an early stopping criterion to\nobtain the best trained generator since the performance drops on such models as epochs are increased.\nThis fundamentally shows the reliance of naturally and adversarially trained models on different\nset of features. Our results clearly demonstrate that the adversarial solution space is shared across\ndifferent architectures and even across distinct data domains. Since we train our generator against\nnaturally trained models only, therefore it converges to a solution space on which an adversarially\ntrained model has already been trained. As a result, our perturbations gradually become weaker\nagainst adversarially trained models as the training progress. A visual demonstration is provided in\nsupplementary material.\nSecondly, the application of Gaussian smoothing reveals different results on naturally trained and\nadversarially trained models. After applying smoothing, adversaries become stronger for adversarially\ntrained models and get weaker for naturally trained models. We achieve optimal results with the\nkernel size of 3 and \u03c3 = 1 for adversarially trained models and use these settings consistently in our\nexperiments. We apply Gaussian kernel on the unrestricted generator\u2019s output, therefore as the kernel\nsize is increased, generator\u2019s output becomes very smooth and after projection within valid l\u221e range,\nadversaries become weaker.\n\n6 Conclusion\n\nAdversarial examples have been shown to be transferable across different models trained on the same\ndomain. For the \ufb01rst time in literature, we show that the cross-domain transferable adversaries exists\nthat can fool the target domain networks with high success rates. We propose a novel generative\nframework that learns to generate strong adversaries using a relativistic discriminator. Surprisingly,\nour proposed universal adversarial function can beat the instance-speci\ufb01c attack methods that were\npreviously found to be much stronger compared to the universal perturbations. Our generative attack\nmodel trained on Chest X-ray and Comics images, can fool VGG-16, ResNet50 and Dense-121\nmodels with a success rate of \u223c 88% and \u223c 72%, respectively, without having any knowledge of\ndata distribution or label space.\n\n9\n\n13579020406080100Fool Rate (%)Transferability Vs. Epochs13579010203040506070Fool Rate (%)Transferability Vs. EpochsNo Smoothing3x35x57x79x911x11020406080100Fool Rate (%)Transferability Vs. Kernel SizeNo Smoothing3x35x57x79x911x110102030405060Fool Rate (%)Transferability Vs. Kernel Size\fReferences\n[1] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from\n\nphenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.\n\n[2] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and\n\nblack-box attacks. Proceedings of 5th International Conference on Learning Representations, 2017.\n\n[3] Florian Tram\u00e8r, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of\n\ntransferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.\n\n[4] Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classi\ufb01ers\u2019 robustness to adversarial\n\nperturbations. arXiv preprint arXiv:1502.02590, 2015.\n\n[5] Alhussein Fawzi, Seyed-Mohsen Moosavi Dezfooli, and Pascal Frossard. Robustness of classi\ufb01ers: from\nadversarial to random noise. In Advances in Neural Information Processing Systems, pages 1632\u20131640,\n2016.\n\n[6] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.\n\narXiv preprint arXiv:1412.6572, 2014.\n\n[7] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint\n\narXiv:1611.01236, 2016.\n\n[8] Seyed-Mohsen Moosavi Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate\nmethod to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and\npattern recognition, pages 2574\u20132582, 2016.\n\n[9] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High con\ufb01dence\npredictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and\npattern recognition, pages 427\u2013436, 2015.\n\n[10] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Evading defenses to transferable adversarial examples\nby translation-invariant attacks. In Proceedings of the IEEE Computer Society Conference on Computer\nVision and Pattern Recognition, 2019.\n\n[11] Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, and Alan L Yuille. Regional homogene-\nity: Towards learning transferable universal adversarial perturbations against defenses. arXiv preprint\narXiv:1904.00979, 2019.\n\n[12] Seyed-Mohsen Moosavi Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adver-\nsarial perturbations. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages\n86\u201394, 2017.\n\n[13] Konda Reddy Mopuri, Utsav Garg, and R Venkatesh Babu. Fast feature fool: A data independent approach\nto universal adversarial perturbations. In Proceedings of the British Machine Vision Conference (BMVC),\n2017.\n\n[14] Konda Reddy Mopuri, Phani Krishna Uppala, and R. Venkatesh Babu. Ask, acquire, and attack: Data-free\n\nuap generation using class impressions. In ECCV, 2018.\n\n[15] Shumeet Baluja and Ian Fischer. Adversarial transformation networks: Learning to generate adversarial\n\nexamples. arXiv preprint arXiv:1703.09387, 2017.\n\n[16] Omid Poursaeed, Isay Katsman, Bicheng Gao, and Serge J. Belongie. Generative adversarial perturbations.\n\n2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4422\u20134431, 2018.\n\n[17] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow,\nand Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning\nRepresentations (ICRL), 2014.\n\n[18] Cihang Xie, Zhishuai Zhang, Jianyu Wang, Yuyin Zhou, Zhou Ren, and Alan Loddon Yuille. Improving\n\ntransferability of adversarial examples with input diversity. CoRR, abs/1803.06978, 2018.\n\n[19] Yang Song, Rui Shu, Nate Kushman, and Stefano Ermon. Constructing unrestricted adversarial examples\nwith generative models. In Advances in Neural Information Processing Systems, pages 8312\u20138323, 2018.\n\n[20] Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. Generating adversarial\nexamples with adversarial networks. In Proceedings of the 27th International Joint Conference on Arti\ufb01cial\nIntelligence, pages 3905\u20133911. AAAI Press, 2018.\n\n10\n\n\f[21] Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard gan. arXiv\n\npreprint arXiv:1807.00734, 2018.\n\n[22] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and\n\nsuper-resolution. In ECCV, 2016.\n\n[23] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[24] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni-\n\ntion. arXiv preprint arXiv:1409.1556, 2014.\n\n[25] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the\ninception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and\npattern recognition, pages 2818\u20132826, 2016.\n\n[26] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.\nIn Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770\u2013778, 2016.\n\n[27] Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. 2017\n\nIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261\u20132269, 2017.\n\n[28] Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding,\nAarti Bagul, Curtis P. Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Y. Ng. Chexnet:\nRadiologist-level pneumonia detection on chest x-rays with deep learning. CoRR, abs/1711.05225, 2017.\n\n[29] Painter by Number. https://www.kaggle.com/c/painter-by-numbers/data. Kaggle, 2017.\n\n[30] Cenk Bircano\u02d8glu. https://www.kaggle.com/cenkbircanoglu/comic-books-classification.\n\nKaggle, 2017.\n\n[31] NeurIPS.\n\nhttps://www.kaggle.com/c/nips-2017-defense-against-adversarial-attack/\n\ndata. Kaggle, 2017.\n\n[32] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and\nDhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. 2017\nIEEE International Conference on Computer Vision (ICCV), pages 618\u2013626, 2017.\n\n[33] Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland\nBrendel. Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and\nrobustness. In International Conference on Learning Representations, 2019.\n\n[34] Florian Tram\u00e8r, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick McDaniel. Ensemble adver-\nsarial training: Attacks and defenses. In International Conference on Learning Representations (ICRL),\n2018.\n\n[35] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-\n\nresnet and the impact of residual connections on learning. In AAAI, volume 4, page 12, 2017.\n\n[36] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks.\n\nIn European conference on computer vision, pages 630\u2013645. Springer, 2016.\n\n11\n\n\f", "award": [], "sourceid": 7048, "authors": [{"given_name": "Muhammad Muzammal", "family_name": "Naseer", "institution": "Australian National University (ANU)"}, {"given_name": "Salman", "family_name": "Khan", "institution": "Inception Institute of Artificial Intelligence"}, {"given_name": "Muhammad Haris", "family_name": "Khan", "institution": "Inception Institute of Artificial Intelligence"}, {"given_name": "Fahad", "family_name": "Shahbaz Khan", "institution": "Inception Institute of Artificial Intelligence"}, {"given_name": "Fatih", "family_name": "Porikli", "institution": "ANU"}]}