{"title": "Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks", "book": "Advances in Neural Information Processing Systems", "page_first": 3825, "page_last": 3834, "abstract": "Unlike the white-box counterparts that are widely studied and readily accessible, adversarial examples in black-box settings are generally more Herculean on account of the difficulty of estimating gradients. Many methods achieve the task by issuing numerous queries to target classification systems, which makes the whole procedure costly and suspicious to the systems. In this paper, we aim at reducing the query complexity of black-box attacks in this category. We propose to exploit gradients of a few reference models which arguably span some promising search subspaces. Experimental results show that, in comparison with the state-of-the-arts, our method can gain up to 2x and 4x reductions in the requisite mean and medium numbers of queries with much lower failure rates even if the reference models are trained on a small and inadequate dataset disjoint to the one for training the victim model. Code and models for reproducing our results will be made publicly available.", "full_text": "Subspace Attack: Exploiting Promising Subspaces for\n\nQuery-Ef\ufb01cient Black-box Attacks\n\nZiang Yan1,3* Yiwen Guo2,3* Changshui Zhang1\n\n1Institute for Arti\ufb01cial Intelligence, Tsinghua University (THUAI),\n\nState Key Lab of Intelligent Technologies and Systems,\n\nBeijing National Research Center for Information Science and Technology (BNRist),\n\nDepartment of Automation,Tsinghua University, Beijing, China\n\n2 Bytedance AI Lab 3 Intel Labs China\n\nyza18@mails.tsinghua.edu.cn guoyiwen.ai@bytedance.com zcs@mail.tsinghua.edu.cn\n\nAbstract\n\nUnlike the white-box counterparts that are widely studied and readily accessible,\nadversarial examples in black-box settings are generally more Herculean on account\nof the dif\ufb01culty of estimating gradients. Many methods achieve the task by issuing\nnumerous queries to target classi\ufb01cation systems, which makes the whole procedure\ncostly and suspicious to the systems. In this paper, we aim at reducing the query\ncomplexity of black-box attacks in this category. We propose to exploit gradients\nof a few reference models which arguably span some promising search subspaces.\nExperimental results show that, in comparison with the state-of-the-arts, our method\ncan gain up to 2\u00d7 and 4\u00d7 reductions in the requisite mean and medium numbers\nof queries with much lower failure rates even if the reference models are trained\non a small and inadequate dataset disjoint to the one for training the victim model.\nCode and models for reproducing our results are available at https://github.\ncom/ZiangYan/subspace-attack.pytorch.\n\n1\n\nIntroduction\n\nDeep neural networks (DNNs) have been demonstrated to be vulnerable to adversarial examples [37]\nthat are typically formed by perturbing benign examples with an intention to cause misclassi\ufb01cations.\nAccording to the amount of information that is exposed and possible to be leveraged, an intelligent\nadversary shall adopt different categories of attacks. Getting access to critical information (e.g., the\narchitecture and learned parameters) about a target DNN, the adversaries generally prefer white-box\nattacks [37, 7, 24, 2, 23]. After a few rounds of forward and backward passes, such attacks are capable\nof generating images that are perceptually indistinguishable to the benign ones but would successfully\ntrick the target DNN into making incorrect classi\ufb01cations. Whereas, so long as little information is\nexposed, the adversaries will have to adopt black-box attacks [28, 22, 3, 25, 13, 26, 38, 14, 8] instead.\nIn general, black-box attacks require no more information than the con\ufb01dence score from a target\nand thus the threat model is more realistic in practice. Over the past few years, remarkable progress\nhas been made in this regard. While initial efforts reveal the transferability of adversarial examples\nand devote to learning substitute models [28, 22], recent methods focus more on gradient estimation\naccomplished via zeroth-order optimizations [3, 25, 13, 26, 38, 14]. By issuing classi\ufb01cation queries\nto the target (a.k.a., victim model), these methods learn to approach its actual gradient w.r.t. any input,\nso as to perform adversarial attacks just like in the white-box setting. Despite many practical merits,\nhigh query complexity is virtually inevitable for computing sensible estimations of input-gradients in\nsome methods, making their procedures costly and probably suspicious to the classi\ufb01cation system.\n*The \ufb01rst two authors contributed equally to the work. Work was done when YG was with Intel Labs China.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFollowing this line of research, we aim at reducing the query complexity of the black-box attacks. We\ndiscover in this paper that, it is possible that the gradient estimations and zeroth-order optimizations\ncan be performed in subspaces with much lower dimensions than one may suspect, and a principled\nway of spanning such subspaces is considered by utilizing \u201cprior gradients\u201d of a few reference models\nas heuristic search directions. Our method, for the \ufb01rst time, bridges the gap between transfer-based\nattacks and the query-based ones. Powered by the developed mechanism, we are capable of trading\nthe attack failure rate in favor of the query ef\ufb01ciency reasonably well. Experimental results show\nthat our method can gain signi\ufb01cant reductions in the requisite numbers of queries with much lower\nfailure rates, in comparison with previous state-of-the-arts. We show that it is possible to obtain the\nreference models with a small training set disjoint to the one for training CIFAR-10/ImageNet targets.\n\n2 Related Work\n\nOne common and crucial ingredient utilized in most white-box attacks is the model gradient w.r.t the\ninput. In practical scenarios, however, the adversaries may not be able to acquire detailed architecture\nor learned parameters of a model, preventing them from adopting gradient-based algorithms directly.\nOne initial way to overcome this challenge is to exploit transferability [37]. Ever since the adversarial\nphenomenon was discovered [37, 7], it has been presented that adversarial examples crafted on one\nDNN model can probably fool another, even if they have different architectures. Taking advantage of\nthe transferability, Papernot et al. [27, 28] propose to construct a dataset which is labeled by querying\nthe victim model, and train a substitute model as surrogate to mount black-box attacks. Thereafter,\nLiu et al. [22] study such transfer-based attacks over large networks on ImageNet [32], and propose\nto attack an ensemble of models for improved performance. Despite the simplicity, attacks function\nsolely on the transferability suffer from high failure rates.\nAn alternative way of mounting black-box attacks is to perform gradient estimation. Suppose that the\nprediction probabilities (i.e., the con\ufb01dence scores) of the victim model is available, methods in this\ncategory resort to zeroth-order optimizations. For example, Chen et al. [3] propose to accomplish this\ntask using pixel-by-pixel \ufb01nite differences, while Ilyas et al. [13] suggest to apply a variant of natural\nevolution strategies (NES) [33]. With the input-gradients appropriately estimated, they proceed as if\nin a white-box setting. In practice, the two are combined with the C&W white-box attack [2] and\nPGD [23], respectively. Though effective, owing to the high dimensionality of natural images, these\ninitial efforts based on accurate gradient estimation generally require (tens of) thousands of queries to\nsucceed on the victim model, which is very costly in both money and time. Towards reducing the\nquery complexity, Tu et al. [38] and Ilyas et al. [14] further introduce an auto-encoding and a bandit\nmechanisms respectively that incorporate spatial and temporal priors. Similarly, Bhagoji et al. [26]\nshow the effectiveness of random grouping and principal components analysis in achieving the goal.\nIn extreme scenarios where only \ufb01nal decisions of the victim model are exposed, adversarial attacks\ncan still be performed [1, 4]. Such black-box attacks are in general discrepant from the score-based\nattacks, and we restrict our attention to the latter in this paper. As have been brie\ufb02y reviewed, methods\nin this threat model can be divided into two categories, i.e., the transfer-based attacks (which are\nalso known as the oracle-based attacks) and query-based attacks. Our method, probably for the \ufb01rst\ntime, bridges the gap between them and therefore inherits the advantages from both sides. It differs\nfrom existing transfer-based attacks in a sense that it takes gradients of reference models as heuristic\nsearch directions for \ufb01nite difference gradient estimation, and bene\ufb01t from the heuristics, it is far\nmore (query-)ef\ufb01cient than the latest query-based attacks.\n\n3 Motivations\n\nLet us consider attacks on an image classi\ufb01cation system. Formally, the black-box attacks of our\ninterest attempt to perturb an input x \u2208 Rn and trick a victim model f : Rn \u2192 Rk to give an incorrect\nprediction arg maxi f (x)i (cid:54)= y about its label y. While, on account of the high dimensionality of\ninput images, it is dif\ufb01cult to estimate gradient and perform black-box attacks within a few queries, we\necho a recent claim that the limitation can be reasonably ameliorated by exploiting prior knowledge\nproperly [14]. In this section, we will shed light on the motivations of our method.\n\nAttack in Linear Subspaces? Natural images are high-dimensional and spatially over-redundant,\nwhich means not all the pixels (or combinations of pixels) are predictive of the image-level labels. A\n\n2\n\n\f(a)\n(c)\nFigure 1: Black-box attack in low-dimensional random subspaces.\n\n(b)\n\nl(gt + \u03b4u(cid:48)\n\nt) \u2212 l(gt \u2212 \u03b4u(cid:48)\nt)\n\nu(cid:48)\nt,\n\n(1)\n\n\u2206t =\n\n\u03b4\n\nclassi\ufb01cation model offers its predictions typically through mining discriminative components and\nsuppressing irrelevant variations from raw images [19]. One reasonable hypothesis worth exploring\nin this spirit is that, it is probably less effective to perturb an image on some speci\ufb01c pixels (or along\ncertain directions) when attacking a black-box model. From a geometric point of view, that said, the\nproblem probably has a lower intrinsic dimension than n, just like many other ones [20].\nTo verify this, we try estimating gradients and mounting attacks on low-dimensional subspaces for\nimages, which is bootstrapped by generating m < n random basis vectors u0, . . . um\u22121 sequentially\non condition of each being orthogonal to the prior ones. We utilize the bandit optimization advocated\nin a recent paper [14] for gradient estimation, and adopt the same iterative attack (i.e., PGD) as in it.\nRecall that the bandit mechanism updates its estimation gt at each step by a scaled search direction:\n\nin which u(cid:48)\nt is the search direction sampled from a Gaussian distribution, \u03b4 > 0 is a step size that\nregulates the directional estimation, and l(\u00b7) calculates the inner product between its normalized input\nand the precise model gradient. The mechanism queries a victim model twice at each step of the\noptimization procedure for calculating \u2206t, after which a PGD step based on the current estimation is\napplied. Interested readers can check the insightful paper [14] for more details.\nIn this experiment, once the basis {u0, . . . um\u22121} is established for a given image, they are \ufb01xed over\nthe whole optimization procedure that occurs on the m-dimensional subspace instead of the original\nn-dimensional one. More speci\ufb01cally, the search direction u(cid:48)\nt is yielded by combining the generated\ni \u03b1iui and \u03b1i \u223c N (0, 1). We are interested in\nbasis vectors with Gaussian coef\ufb01cients, i.e., u(cid:48)\nhow the value of m affects the failure rate and the requisite number of queries of successful attacks.\nBy sampling 1,000 images from the CIFAR-10 test set, we craft untargeted adversarial examples for\na black-box wide residual network (WRN) [41] with an upper limit of 2,000 queries for ef\ufb01ciency\nreasons. As depicted in Figure 1, after m > 500, all three concerned metrics (i.e., failure rate, mean\nand median query counts) barely change. Moreover, at m = 2000, the failure rate already approaches\n\u223c10%, which is comparable to the result gained when the same optimization is applied in the original\nimage space which has n = 3072 dimensions. See the red dotted line in Figure 1 for this baseline.\nSimilar phenomenon can be observed on other models using other attacks as well, which evidences\nthat the problem may indeed have a lower dimension than one may suspect and it complements the\nstudy of the intrinsic dimensionality of training landscape of DNNs in a prior work [20].\n\nt =(cid:80)\n\nPrior Gradients as Basis Vectors?\nSince the requisite number of queries at m = 2000 is already\nhigh in Figure 1, we know that the random basis vectors boost the state-of-the-art only to some\nlimited extent. Yet, it inspires us to explore more principled subspace bases for query-ef\ufb01cient attacks.\nTo achieve this goal, we start from revisiting and analyzing the transfer-based attacks. We know from\nprior works that even adversarial examples crafted using some single-step attacks like the fast gradient\n(sign) [18] can transfer [28, 22], hence one can hypothesize that the gradients of some \u201csubstitute\u201d\nmodels are more helpful in spanning the search subspaces with reduced dimensionalities. A simple\nyet plausible way of getting these gradients involved is to use them directly as basis vectors. Note that\nunlike the transfer-based attacks in which these models totally substitute for the victim when crafting\nadversarial examples, our study merely considers their gradients as priors. We refer to such models\nand gradients as reference models and prior gradients respectively throughout this paper for clarity.\n\n3\n\n\u0012\r\r\u000e\n\n\u000e\u0012\r\r\u000f\n\n\u000f\u0012\r\r\u0010\n\n\u001b\u00042038\u000443\u000341\u0003$:-85,.0\r\b\u000f\r\b\u0011\r\b\u0013\r\b\u0001\r\b\u000e\r\r\b\u001d,\u0004\u0004:70\u0003#,90\u001899,.\u0004\u0003\u00043\u0003$:-85,.0\u001899,.\u0004\u0003\u00043\u00039\u00040\u0003 7\u0004\u0004\u00043,\u0004\u0003$5,.0\r\u0012\r\r\u000e\n\n\u000e\u0012\r\r\u000f\n\n\u000f\u0012\r\r\u0010\n\n\u001b\u00042038\u000443\u000341\u0003$:-85,.0\r\u000e\r\r\u000f\r\r\u0010\r\r\u0011\r\r\u0012\r\r\u001e0,3\u0003\":07\u000408\u001899,.\u0004\u0003\u00043\u0003$:-85,.0\u001899,.\u0004\u0003\u00043\u00039\u00040\u0003 7\u0004\u0004\u00043,\u0004\u0003$5,.0\r\u0012\r\r\u000e\n\n\u000e\u0012\r\r\u000f\n\n\u000f\u0012\r\r\u0010\n\n\u001b\u00042038\u000443\u000341\u0003$:-85,.0\r\u0012\r\u000e\r\r\u000e\u0012\r\u000f\r\r\u000f\u0012\r\u001e0/\u0004,3\u0003\":07\u000408\u001899,.\u0004\u0003\u00043\u0003$:-85,.0\u001899,.\u0004\u0003\u00043\u00039\u00040\u0003 7\u0004\u0004\u00043,\u0004\u0003$5,.0\f(a)\n\n(b)\n\nFigure 2: Comparison of (a) the failure rates when attacking WRN, and (b) mean squared residuals of\nprojecting the precise gradient onto subspaces spanned by random directions or prior gradients. We\ncollect nine models as candidates to obtain the prior gradients: AlexNet [17], VGG-11/13/16/19 [34],\nand ResNet-20/32/44/56 [10]. We add prior gradients corresponding to models from deep to shallow\none by one to the basis set.\n\nThe simplest solution to utilize such prior gradients might be set these basis vectors to be \ufb01xed over\nthe entire optimization procedure, i.e., only the input-gradient of reference models with respect to\nthe clean image x are utilized. We further let these basis vectors be adaptive when applying an\niterative attack (e.g., the basic iterative method [18] and PGD [23]), simply by recalculating the\nprior gradients (w.r.t the current inputs which may be candidate adversarial examples) at each step.\nDifferent zeroth-order optimization algorithms can be readily involved in the established subspaces.\nFor simplicity, we will stick with the described bandit optimization in the sequel of this paper and we\nleave the exploration on other algorithms like the coordinate-wise \ufb01nite differences [3] and NES [13]\nto future works.\nAn experiment is similarly conducted to compare attacks in the gradient-spanned subspaces1 and\nthe random ones, in which the WRN is still regarded as the victim model. We compare mounting\nblack-box attacks on different subspaces spanned by the adaptive and \ufb01xed prior gradients, as well\nas randomly generated vectors as described before. Figure 2 summarizes our main results. As\nin Figure 1(a), we illustrate the attack failure rates in Figure 2(a). Apparently, the adaptive prior\ngradients are much more promising than its \ufb01xed and random counterparts when spanning search\nsubspaces. We would use the adaptive version of prior gradients in the rest of this paper. For more\ninsights, we project normalized WRN gradients (calculated on clean images) onto the two sorts of\nsubspaces and further compare the mean squared residuals of projection under different circumstances\nin Figure 2(b). It can be seen that the gradient-spanned subspaces indeed align better with the precise\nWRN gradients, and over misalignments between the search subspaces and precise model gradients\nlead to high failure rates.\n\n4 Our Subspace Attack\n\nAs introduced in the previous section, we reckon that it is promising to apply the gradient of some\nreference models to span the search subspace for mounting black-box attacks. However, there remain\nsome challenges in doing so. First, it should be computationally and memory intensive to load all\nthe reference models and calculate their input-gradients as basis vectors. Second, it is likely that an\n\u201cuniversal\u201d adversarial example for a victim model is still far away from such subspaces, which means\nmounting attacks solely on them may lead to high failure rate as encountered in the transfer-based\nattacks. We will discuss the issues and present our solutions in this section. We codename our method\nsubspace attack and summarize it in Algorithm 1, in which the involved hyper-parameters will be\ncarefully explained in Section 5.\n\n4.1 Coordinate Descent for Ef\ufb01ciency\n\nIf one of the prior gradients happens to be well-aligned with the gradient of the victim model, then\n\u201can adaptive\u201d one-dimensional subspace suf\ufb01ces to mount the attack. Nevertheless, we found that it is\nnormally not the case, and increasing the number of reference models and prior gradients facilitates\n\n1Granted, the prior gradients are almost surely linearly independent and thus can be regarded as basis vectors.\n\n4\n\n\u000e\u000f\u0010\u0011\u0012\u0013\u0001\u0001\u0001\u001b\u00042038\u000443\u000341\u0003$:-85,.0\r\b\u000f\r\b\u0011\r\b\u0013\r\b\u0001\r\b\u000e\r\r\b\u001d,\u0004\u0004:70\u0003#,90#,3/42\u001d\u0004\u00050/\u0003!7\u000447\u0003\u00027,/\u00040398!7\u000447\u0003\u00027,/\u00040398\u000e\u000f\u0010\u0011\u0012\u0013\u0001\u0001\u0001\u001b\u00042038\u000443\u000341\u0003$:-85,.0\r\u000b\u0001\u0013\r\u000b\u0001\u0001\r\u000b\u0001\r\r\u000b\u0001\u000f\r\u000b\u0001\u0011\r\u000b\u0001\u0013\r\u000b\u0001\u0001\u000e\u000b\r\r\u001e,\u00043\u00049:/0\u000341\u0003!74\u00040.9\u000443\u0003#08\u0004/:,\u0004#,3/42!7\u000447\u0003\u00027,/\u00040398\fAlgorithm 1 Subspace Attack Based on Bandit Optimization [14]\n1: Input: a benign example x \u2208 Rn, its label y, a set of m reference models {f0, . . . , fm\u22121}, a\nchosen attack objective function L(\u00b7,\u00b7), and the victim model from which the output of f can be\ninferred.\n\n2: Output: an adversarial example xadv ful\ufb01lls ||xadv \u2212 x||\u221e \u2264 \u0001.\n3: Initialize the adversarial example to be crafted xadv \u2190 x.\n4: Initialize the gradient to be estimated g \u2190 0.\n5: Initialize the drop-out/layer ratio p.\n6: while not successful do\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16:\n17: end while\n18: return xadv\n\nChoose a reference model whose index is i uniformly at random\nCalculate a prior gradient with drop-out/layer ratio p as u \u2190 \u2202L(fi(xadv;p),y)\ng+ \u2190 g + \u03c4 u,\n+ \u2190 g+/(cid:107)g+(cid:107)2,\ng(cid:48)\n\u2206t \u2190 L(f (xadv+\u03b4g(cid:48)\ng \u2190 g + \u03b7g\u2206t\nxadv \u2190 xadv + \u03b7 \u00b7 sign(g)\nxadv \u2190 Clip(xadv, x \u2212 \u0001, x + \u0001)\nxadv \u2190 Clip(xadv, 0, 1)\nUpdate the drop-out/layer ratio p following our policy\n\ng\u2212 \u2190 g \u2212 \u03c4 u\n\u2212 \u2190 g\u2212/(cid:107)g\u2212(cid:107)2\ng(cid:48)\n+),y)\u2212L(f (xadv+\u03b4g(cid:48)\n\n\u2202xadv\n\n\u03c4 \u03b4\n\n\u2212),y)\n\nu\n\nthe attack, which can be partially explained by the fact that they are nearly orthogonal to each other\nin high-dimensional spaces [22]. De\ufb01nitely, it is computationally and memory intensive to calculate\nthe input-gradients of a collection of reference models at each step of the optimization.\nGiven a set of basis vectors, off-the-shelf optimization procedures for black-box attacks either estimate\nthe optimal coef\ufb01cients for all vectors before update [3] or give one optimal scaling factor overall [14].\nFor any of them, the whole procedure is somewhat analogous to a gradient descent whose update\ndirections do not necessarily align with single basis vectors. It is thus natural to make an effort based\non coordinate descent [39], which operates along coordinate directions (i.e., basis vectors) to seek the\noptimum of an objective, for better ef\ufb01ciency. In general, the algorithm selects a single coordinate\ndirection or a block of coordinate directions to proceed iteratively. That said, we may only need to\ncalculate one or several prior gradients at each step before update and the complexity of our method is\nsigni\ufb01cantly reduced. Experimental results in Section 5 show that one single prior gradient suf\ufb01ces.\n\n4.2 Drop-out/layer for Exploration\n\nAs suggested in Figure 2(b), one way of guaranteeing a low failure rate in our method is to collect\nadequate reference models. However, it is usually troublesome in practice, if not infeasible. Suppose\nthat we have collected a few reference models which might not be adequate, and we aim to reduce the\nfailure rate whatsoever. Remind that the main reason of high failure rates is the imperfect alignment\nbetween our search subspaces and the precise gradients (cf., Figure 2(b)), however, it seems unclear\nhow to explore other possible search directions without training more reference models. One may\nsimply try adding some random vectors to the basis set for better alignment and higher subspace-\ndimensions, although they bare the ineffectiveness as discussed in Section 3 and we also found in\nexperiments that this strategy does not help much.\nOur solution to resolve this issue is inspired by the dropout [35] and \u201cdroplayer\u201d (a.k.a., stochastic\ndepth) [12] techniques. Drop-out/layer, originally serve as regularization techniques, randomly drop a\nsubset of hidden units or residual blocks (if exist) from DNNs during training. Their successes indicate\nthat a portion of the features can provide reasonable predictions and thus meaningful input-gradients,\nwhich implies the possibility of using drop-out/layer invoked gradients to enrich our search priors 2.\nBy temporarily removing hidden units or residual blocks, we can acquire a spectrum of prior gradients\nfrom each reference model. In experiments, we append dropout to all convolutional/fully-connect\nlayer (except the \ufb01nal one), and we further drop residual blocks out in ResNet reference models.\n\n2We examine the generated input-gradients in this manner and found that most of them are still independent.\n\n5\n\n\f5 Experiments\n\nIn this section, we will testify the effectiveness of our subspace attack by comparing it with the state-\nof-the-arts in terms of the failure rate and the number of queries (of successful attacks). We consider\nboth untargeted and targeted (cid:96)\u221e attacks on CIFAR-10 [16] and ImageNet [32]. All our experiments\nare conducted on a GTX 1080 Ti GPU with PyTorch [29]. Our main results for untargeted attacks are\nsummarized in Table 1, and the results for targeted attacks are reported in the supplementary material.\n\nTable 1: Performance of different black-box attacks with (cid:96)\u221e constraint under untargeted setting. The\nmaximum perturbation is \u0001 = 8/255 for CIFAR-10, and \u0001 = 0.05 for ImageNet. A recent paper [26]\nalso reports its result on WRN similarly, which achieves a failure rate of 1.0% with 7680 queries.\nPyramidNet* in the table indicates PyramidNet+ShakeDrop+AutoAugment [5].\n\nDataset\n\nVictim Model\n\nWRN\n\nCIFAR-10\n\nGDAS\n\nPyramidNet*\n\nInception-v3\n\nImageNet\n\nPNAS-Net\n\nSENet\n\nMethod\nNES [13]\n\nBandits-TD [14]\n\nOurs\n\nNES [13]\n\nBandits-TD [14]\n\nOurs\n\nNES [13]\n\nBandits-TD [14]\n\nOurs\n\nNES [13]\n\nBandits-TD [14]\n\nOurs\n\nNES [13]\n\nBandits-TD [14]\n\nOurs\n\nNES [13]\n\nBandits-TD [14]\n\nOurs\n\nRef. Models\n\nMean Queries Median Queries\n\nFailure Rate\n\n-\n-\n\nAlexNet+VGGNets\n\n-\n-\n\nAlexNet+VGGNets\n\n-\n-\n\nAlexNet+VGGNets\n\n-\n-\n\nOriginal ResNets\n\n-\n-\n\nOriginal ResNets\n\n-\n-\n\nOriginal ResNets\n\n1882\n713\n392\n1032\n373\n250\n1571\n1160\n555\n1427\n887\n462\n2182\n1437\n680\n1759\n1055\n456\n\n1300\n266\n60\n800\n128\n58\n1300\n610\n184\n800\n222\n96\n1300\n552\n160\n900\n300\n66\n\n3.5%\n1.2%\n0.3%\n0.0%\n0.0%\n0.0%\n5.1%\n1.2%\n0.7%\n19.3%\n4.2%\n1.1%\n38.5%\n12.1%\n4.2%\n17.9%\n6.4%\n1.9%\n\n5.1 Experimental Setup\n\nEvaluation Metrics and Settings. As in prior works [13, 26, 14], we adopt the failure rate and the\nnumber of queries to evaluate the performance of attacks using originally correctly classi\ufb01ed images.\nFor untargeted settings, an attack is considered successful if the model prediction is different from\nthe ground-truth, while for the targeted settings, it is considered successful only if the victim model\nis tricked into predicting the target class. We observe that the number of queries changes dramatically\nbetween different images, thus we report both the mean and median number of queries of successful\nattacks to gain a clearer understanding of the query complexity.\nFollowing prior works, we scale the input images to [0, 1], and set the maximum (cid:96)\u221e perturbation to\n\u0001 = 8/255 for CIFAR-10 and \u0001 = 0.05 for ImageNet. We limit to query victim models for at most\n10,000 times in the untargeted experiments and 50,000 times in the targeted experiments, as the latter\ntask is more dif\ufb01cult and requires more queries. In all experiments, we invoke PGD [23] to maximize\nthe hinge logit-diff adversarial loss from Carlini and Wagner [2]. The PGD step size is set to 1/255\nfor CIFAR-10 and 0.01 for ImageNet. At the end of each iteration, we clip the candidate adversarial\nexamples back to [0, 1] to make sure they are still valid images. We initialize the drop-out/layer\nratio as 0.05 and increase it by 0.01 at the end of each iteration until it reaches 0.5 throughout our\nexperiments. Other hyper-parameters like the OCO learning rate \u03b7g and the \ufb01nite-difference step\nsizes (i.e., \u03b4, \u03c4) are set following the paper [14]. We mostly compare our method with NES [13] and\nBandits-TD [14], and their of\ufb01cial implementations are directly used. We apply all the attacks on the\nsame set of clean images and victim models for fair comparison. For Bandits-TD on ImageNet, we\ncraft adversarial examples on a resolution of 50 \u00d7 50 and upscale them according to speci\ufb01c requests\nfrom the victim models (i.e., 299 \u00d7 299 for Inception-v3, 331 \u00d7 331 for PNAS-Net, and 224 \u00d7 224\nfor SENet) before query, just as described in the paper [14]. We do not perform such rescaling on\nCIFAR-10 since no performance gain is observed.\n\n6\n\n\fVictim and Reference Models. On CIFAR-10, we consider three victim models: (a) a WRN [41]\nwith 28 layers and 10 times width expansion 3, which yields 4.03% error rate on the test set; (b) a\nmodel obtained via neural architecture search named GDAS [6] 4, which has a signi\ufb01cantly different\narchitecture than our AlexNet and VGGNet reference models and shows 2.81% test error rate; (c) a\n272-layer PyramidNet+Shakedrop model [9, 40] trained using AutoAugment [5] with only 1.56%\ntest error rate, 5 which is the published state-of-the-art on CIFAR-10 to the best of our knowledge. As\nfor reference models, we simply adopt the AlexNet and VGG-11/13/16/19 architectures with batch\nnormalizations [15]. To evaluate in a more data-independent scenario, we choose an auxiliary dataset\n(containing only 2,000 images) called CIFAR-10.1 [30] to train the reference models from scratch.\nWe also consider three victim models on ImageNet: (a) an Inception-v3 [36] which is commonly\nchosen [13, 14, 4, 38] with 22.7% top-1 error rate on the of\ufb01cial validation set; (b) a PNAS-Net-5-\nLarge model [21] whose architecture is obtained through neural architecture search, with a top-1\nerror rate of 17.26%; (c) an SENet-154 model [11] with a top-1 error rate of 18.68% 6. We adopt\nResNet-18/34/50 as reference architectures, and we gather 30,000+45,000 images from an auxiliary\ndataset [31] and the ImageNet validation set to train them from scratch. The clean images for attacks\nare sampled from the remaining 5,000 ImageNet of\ufb01cial validation images and hence being unseen\nto both the victim and reference models.\n\n5.2 Comparison with The State-of-the-arts\n\nIn this section we compare the performance of our subspace attack with previous state-of-the-art\nmethods on CIFAR-10 and ImageNet under untargeted settings.\nOn CIFAR-10, we randomly select 1,000 images from its of\ufb01cial test set, and mount all attacks on\nthese images. Table 1 summarizes our main results, in which the \ufb01fth to seventh columns compare the\nmean query counts, median query counts and failure rates. On all three victim models, our method\nsigni\ufb01cantly outperforms NES and Bandits-TD in both query ef\ufb01ciency and success rates. By using\nour method, we are able to reduce the mean query counts by a factor of 1.5 to 2.1 times and the median\nquery counts by 2.1 to 4.4 times comparing with Bandits-TD which incorporates both time and spatial\npriors [14]. The PyramidNet+ShakeDop+AutoAugment [5] model, which shows the lowest test error\nrate on CIFAR-10, also exhibits the best robustness under all considered black-box attacks. More\ninterestingly, even if the victim model is GDAS, whose architecture is designed by running neural\narchitecture search and thus being drastically different from that of the reference models, our prior\ngradients can still span promising subspaces for attacks. To the best of our knowledge, we are the \ufb01rst\nto attack PyramidNet+ShakeDrop+AutoAugment which is a published state-of-the-art and GDAS\nwhich has a searched architecture in the black-box setting.\nFor ImageNet, we also randomly sample 1,000 images from the ImageNet validation set for evalu-\nation. Similar to the results on CIFAR-10, the results on ImageNet also evidence that our method\noutperforms the state-of-the-arts by large margins. Moreover, since the applied reference models are\ngenerally more \u201cold-fashioned\u201d and computationally ef\ufb01cient than the victim models that are lately\ninvented, our method introduces little overhead to the baseline optimization algorithm.\n\n5.3 Dropout Ratios and Training Scales\n\nWe are interested in how the dropout ratio would affect our attack performance. To \ufb01gure it out, we\nset an upper limit of the common dropout ratio p to 0.0, 0.2, 0.5 respectively to observe how the\nquery complexity and the failure rate vary when attacking the WRN victim model. With the AlexNet\nand VGGNet reference models trained on CIFAR-10.1 [30], we see from the bottom of Table 2 that\nmore dropout indicates lower failure rate, verifying that exploration via dropout well amends the\nmisalignments between our subspaces and the victim model gradients.\nIt might also be intriguing to evaluate how the performance of our method varies with the scale\nof training set for yielding reference models. We attempt to evaluate it empirically by training\nAlexNet and VGGNets from scratch using different numbers of training images. More speci\ufb01cally,\n\n3Pre-trained model: https://github.com/bearpaw/pytorch-classification\n4Pre-trained model: https://github.com/D-X-Y/GDAS\n5Unlike the other two models that are available online, this one is trained using scripts from: https:\n\n//github.com/tensorflow/models/tree/master/research/autoaugment\n\n6Pre-trained models: https://github.com/Cadene/pretrained-models.pytorch\n\n7\n\n\fTable 2: Impact of the dropout ratio and training scale on CIFAR-10. The victim model is WRN.\n\nRef. Training Set\n\n#Images Maximum p Mean Queries Median Queries\n\nFailure Rate\n\nCIFAR-10 Training\n\n50k\n\nCIFAR-10.1 + CIFAR-10 Test (Part)\n\n2k+8k\n\nCIFAR-10.1\n\n2k\n\n0.0\n0.2\n0.5\n0.0\n0.2\n0.5\n0.0\n0.2\n0.5\n\n59\n77\n111\n239\n174\n212\n519\n380\n392\n\n12\n14\n14\n16\n20\n22\n48\n62\n60\n\n1.4%\n0.2%\n0.2%\n3.2%\n0.7%\n0.3%\n9.6%\n0.9%\n0.3%\n\nwe enlarge our training set by further using the CIFAR-10 of\ufb01cial training and test images, excluding\nthe 1,000 images for mounting attacks of course. In addition to the CIFAR-10.1 dataset as used,\nwe try two larger sets: (a) the of\ufb01cial CIFAR-10 training set which consists of 50,000 images; 7\n(b) a set built by augmenting CIFAR-10.1 with 8,000 CIFAR-10 test images, whose overall size is\n2,000+8,000=10,000. It can be seen from Table 2 that by training reference models with 8,000 more\nimages, the query counts could be cut by over 2\u00d7 without dropout, and the failure rate decreases as\nwell. We believe that the performance gain is powered by better generalization ability of the reference\nmodels. In a special scenario where the reference and the victim models share the same training set,\nour method requires only 59 queries on average to succeed on 98.6% of the testing images without\ndropout. The performance of our method with dropout is also evaluated on the basis of these reference\nmodels, and we can see that dropout is capable of reducing the failure rates signi\ufb01cantly regardless\nof the reference training set. While for the query complexity, we may observe that more powerful\nreference models generally require less exploration governed by dropout to achieve ef\ufb01cient queries.\n\n5.4 Choice of Reference Models and Prior Gradients\n\nTable 3: Subspace attack using different reference models with (cid:96)\u221e constraint under untargeted setting\non CIFAR-10. The maximum perturbation is \u0001 = 8/255, and the victim model is WRN.\n\nRef. Models\n\nVGG-19\n\nVGG-19/16/13\n\nVGG-19/16/13/11+AlexNet\n\nMean Queries Median Queries\n\nFailure Rate\n\n400\n395\n392\n\n78\n71\n60\n\n0.6%\n0.4%\n0.3%\n\nWe investigate the impact of number and architecture of reference models for our method by evaluating\nour attack using different reference model sets, and report the performance in Table 3. As in previous\nexperiments, reference models are trained on CIFAR-10.1, and the maximum dropout ratio is set to\n0.5. We see that increasing the number of reference models indeed facilitates the attack in both query\nef\ufb01ciency and success rates, just like in the exploratory experiment where dropout is absent.\nWe also compare using \u201cgradient descent\u201d and \u201ccoordinate descent\u201d empirically. On CIFAR-10 we\nchoose the same \ufb01ve reference models as previously reported, and at each iteration we compute all\n\ufb01ve prior gradients and search in the complete subspace. We combine all the prior gradients with\nGaussian coef\ufb01cients to provide a search direction in it. Experimental results demonstrate that with\nsigni\ufb01cantly increased run-time, both the query counts and failure rates barely change (mean/median\nqueries: 389/62, failure rate: 0.3%), verifying that our coordinate-descent-\ufb02avored policy achieves a\nsensible trade-off between ef\ufb01ciency and effectiveness.\n\n6 Conclusion\n\nWhile impressive results have been gained, state-of-the-art black-box attacks usually require a large\nnumber of queries to trick a victim classi\ufb01cation system, making the process costly and suspicious to\n\n7In this special setting the reference models and the victim model share the same training data.\n\n8\n\n\fthe system. In this paper, we propose the subspace attack method, which reduces the query complexity\nby restricting the search directions of gradient estimation in promising subspaces spanned by input-\ngradients of a few reference models. We suggest to adopt a coordinate-descent-\ufb02avored optimization\nand drop-out/layer to address some potential issues in our method and trade off the query complexity\nand failure rate. Extensive experimental results on CIFAR-10 and ImageNet evidence that our method\noutperforms the state-of-the-arts by large margins, even if the reference models are trained on a small\nand inadequate dataset disjoint to the one for training the victim models. We also evaluate the effec-\ntiveness of our method on some winning models (e.g., PyramidNet+ShakeDrop+AutoAugment [5]\nand SENet [11]) on these datasets and models whose architectures are designed by running neural\narchitecture search (e.g., GDAS [6] and PNAS [21]).\n\nAcknowledgments\n\nThis work is funded by NSFC (Grant No. 61876095) and Beijing Academy of Arti\ufb01cial Intelligence\n(BAAI).\n\nReferences\n[1] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks\n\nagainst black-box machine learning models. In ICLR, 2018.\n\n[2] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE\n\nSymposium on Security and Privacy (SP), 2017.\n\n[3] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization\nbased black-box attacks to deep neural networks without training substitute models. In Proceedings of the\n10th ACM Workshop on Arti\ufb01cial Intelligence and Security, pages 15\u201326. ACM, 2017.\n\n[4] Minhao Cheng, Thong Le, Pin-Yu Chen, Jinfeng Yi, Huan Zhang, and Cho-Jui Hsieh. Query-ef\ufb01cient\n\nhard-label black-box attack: An optimization-based approach. In ICLR, 2019.\n\n[5] Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning\n\naugmentation policies from data. In CVPR, 2019.\n\n[6] Xuanyi Dong and Yi Yang. Searching for a robust neural architecture in four gpu hours. In CVPR, 2019.\n[7] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.\n\nIn ICLR, 2015.\n\n[8] Chuan Guo, Jacob R Gardner, Yurong You, Andrew G Wilson, and Kilian Q Weinberger. Simple black-box\n\nadversarial attacks. In ICML, 2019.\n\n[9] Dongyoon Han, Jiwhan Kim, and Junmo Kim. Deep pyramidal residual networks. In CVPR, pages\n\n5927\u20135935, 2017.\n\n[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.\n\nIn CVPR, 2016.\n\n[11] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In CVPR, 2018.\n[12] Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. Deep networks with stochastic\n\ndepth. In ECCV, 2016.\n\n[13] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited\n\nqueries and information. In ICML, 2018.\n\n[14] Andrew Ilyas, Logan Engstrom, and Aleksander Madry. Prior convictions: Black-box adversarial attacks\n\nwith bandits and priors. In ICLR, 2019.\n\n[15] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing\n\ninternal covariate shift. In ICML, 2015.\n\n[16] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical\n\nreport, Citeseer, 2009.\n\n[17] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classi\ufb01cation with deep convolutional\n\nneural networks. In NeurIPS, 2012.\n\n[18] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In ICLR, 2017.\n[19] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.\n[20] Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of\n\nobjective landscapes. In ICLR, 2018.\n\n9\n\n\f[21] Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille,\n\nJonathan Huang, and Kevin Murphy. Progressive neural architecture search. In ECCV, 2018.\n\n[22] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and\n\nblack-box attacks. In ICLR, 2017.\n\n[23] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards\n\ndeep learning models resistant to adversarial attacks. In ICLR, 2018.\n\n[24] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. DeepFool: a simple and accurate\n\nmethod to fool deep neural networks. In CVPR, 2016.\n\n[25] Nina Narodytska and Shiva Kasiviswanathan. Simple black-box adversarial attacks on deep neural\n\nnetworks. In CVPR Workshop, 2017.\n\n[26] Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. Practical black-box attacks on deep neural\n\nnetworks using ef\ufb01cient query mechanisms. In ECCV, 2018.\n\n[27] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from\n\nphenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.\n\n[28] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram\nSwami. Practical black-box attacks against machine learning. In Asia Conference on Computer and\nCommunications Security, 2017.\n\n[29] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming\nLin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NeurIPS\nWorkshop, 2017.\n\n[30] Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do cifar-10 classi\ufb01ers\n\ngeneralize to cifar-10? arXiv preprint arXiv:1806.00451, 2018.\n\n[31] Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classi\ufb01ers\n\ngeneralize to imagenet? In ICML, 2019.\n\n[32] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang,\nAndrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large\nscale visual recognition challenge. IJCV, 2015.\n\n[33] Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable\n\nalternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017.\n\n[34] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni-\n\ntion. In ICLR, 2015.\n\n[35] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout:\na simple way to prevent neural networks from over\ufb01tting. The Journal of Machine Learning Research,\n15(1):1929\u20131958, 2014.\n\n[36] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the\n\ninception architecture for computer vision. In CVPR, 2016.\n\n[37] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and\n\nRob Fergus. Intriguing properties of neural networks. In ICLR, 2014.\n\n[38] Chun-Chen Tu, Paishun Ting, Pin-Yu Chen, Sijia Liu, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh, and Shin-\nMing Cheng. Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box\nneural networks. In AAAI, 2019.\n\n[39] Stephen J Wright. Coordinate descent algorithms. Mathematical Programming, 151(1):3\u201334, 2015.\n[40] Yoshihiro Yamada, Masakazu Iwamura, Takuya Akiba, and Koichi Kise. Shakedrop regularization for\n\ndeep residual learning. arXiv preprint arXiv:1802.02375, 2018.\n\n[41] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In BMVC, 2016.\n\n10\n\n\f", "award": [], "sourceid": 2100, "authors": [{"given_name": "Yiwen", "family_name": "Guo", "institution": "Bytedance AI Lab"}, {"given_name": "Ziang", "family_name": "Yan", "institution": "Tsinghua University"}, {"given_name": "Changshui", "family_name": "Zhang", "institution": "Tsinghua University"}]}