{"title": "Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training", "book": "Advances in Neural Information Processing Systems", "page_first": 1831, "page_last": 1841, "abstract": "We introduce a feature scattering-based adversarial training approach for improving model robustness against adversarial attacks.\nConventional adversarial training approaches leverage a supervised scheme (either targeted or non-targeted) in generating attacks for training, which typically suffer from issues such as label leaking as noted in recent works.\nDifferently, the proposed approach generates adversarial images for training through feature scattering in the latent space, which is unsupervised in nature and avoids label leaking. More importantly, this new approach generates perturbed images in a collaborative fashion, taking the inter-sample relationships into consideration. We conduct analysis on model robustness and demonstrate the effectiveness of the proposed approach through extensively experiments on different datasets compared with state-of-the-art approaches.", "full_text": "Defense Against Adversarial Attacks Using\n\nFeature Scattering-based Adversarial Training\n\nHaichao Zhang\u2217\nHorizon Robotics\nhczhang1@gmail.com\n\nJianyu Wang\nBaidu Research\nwjyouch@gmail.com\n\nAbstract\n\nWe introduce a feature scattering-based adversarial training approach for improving\nmodel robustness against adversarial attacks. Conventional adversarial training\napproaches leverage a supervised scheme (either targeted or non-targeted) in gener-\nating attacks for training, which typically suffer from issues such as label leaking\nas noted in recent works. Differently, the proposed approach generates adversarial\nimages for training through feature scattering in the latent space, which is unsu-\npervised in nature and avoids label leaking. More importantly, this new approach\ngenerates perturbed images in a collaborative fashion, taking the inter-sample\nrelationships into consideration. We conduct analysis on model robustness and\ndemonstrate the effectiveness of the proposed approach through extensively exper-\niments on different datasets compared with state-of-the-art approaches. Code is\navailable: https://github.com/Haichao-Zhang/FeatureScatter.\n\n1\n\nIntroduction\n\nWhile breakthroughs have been made in many \ufb01elds such as image classi\ufb01cation leveraging deep\nneural networks, these models could be easily fooled by the so call adversarial examples [55, 4].\nIn terms of the image classi\ufb01cation, an adversarial example for a natural image is a modi\ufb01ed\nversion which is visually indistinguishable from the original but causes the classi\ufb01er to produce a\ndifferent label prediction [4, 55, 24]. Adversarial examples have been shown to be ubiquitous beyond\nclassi\ufb01cation, ranging from object detection [64, 18] to speech recognition [11, 9].\nMany encouraging progresses been made towards improving model robustness against adversarial\nexamples under different scenarios [58, 36, 33, 67, 72, 16, 71]. Among them, adversarial train-\ning [24, 36] is one of the most popular technique [2], which conducts model training using the\nadversarially perturbed images in place of the original ones. However, several challenges remain to\nbe addressed. Firstly, some adverse effects such as label leaking is still an issue hindering adversarial\ntraining [32]. Currently available remedies either increase the number of iterations for generating the\nattacks [36] or use classes other than the ground-truth for attack generation [32, 65, 61]. Increasing\nthe attack iterations will increase the training time proportionally while using non-ground-truth\ntargeted approach cannot fully eliminate label leaking. Secondly, previous approaches for both\nstandard and adversarial training treat each training sample individually and in isolation w.r.t.other\nsamples. Manipulating each sample individually this way neglects the inter-sample relationships and\ndoes not fully leverage the potential for attacking and defending, thus limiting the performance.\nManifold and neighborhood structure have been proven to be effective in capturing the inter-sample\nrelationships [51, 22]. Natural images live on a low-dimensional manifold, with the training and\ntesting images as samples from it [26, 51, 44, 56]. Modern classi\ufb01ers are over-complete in terms of\nparameterizations and different local minima have been shown to be equally effective under the clean\nimage setting [14]. However, different solution points might leverage different set of features for\n\n\u2217Work done while with Baidu Research.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fprediction. For learning a well-performing classi\ufb01er on natural images, it suf\ufb01ces to simply adjust\nthe classi\ufb01cation boundary to intersect with this manifold at locations with good separation between\nclasses on training data, as the test data will largely reside on the same manifold [28]. However,\nthe classi\ufb01cation boundary that extends beyond the manifold is less constrained, contributing to\nthe existence of adversarial examples [56, 59]. For examples, it has been pointed out that some\nclean trained models focus on some discriminative but less robust features, thus are vulnerable to\nadversarial attacks [28, 29]. Therefore, the conventional supervised attack that tries to move feature\npoints towards this decision boundary is likely to disregard the original data manifold structure.\nWhen the decision boundary lies close to the manifold for its out of manifold part, adversarial\nperturbations lead to a tilting effect on the data manifold [56]; at places where the classi\ufb01cation\nboundary is far from the manifold for its out of manifold part, the adversarial perturbations will move\nthe points towards the decision boundary, effectively shrinking the data manifold. As the adversarial\nexamples reside in a large, contiguous region and a signi\ufb01cant portion of the adversarial subspaces\nis shared [24, 19, 59, 40], pure label-guided adversarial examples will clutter as least in the shared\nadversarial subspace. In summary, while these effects encourage the model to focus more around the\ncurrent decision boundary, they also make the effective data manifold for training deviate from the\noriginal one, potentially hindering the performance.\nMotived by these observations, we propose to shift the previous focus on the decision boundary\nto the inter-sample structure. The proposed approach can be intuitively understood as generating\nadversarial examples by perturbing the local neighborhood structure in an unsupervised fashion and\nthen performing model training with the generated adversarial images. The overall framework is\nshown in Figure 1. The contributions of this work are summarized as follows:\n\u2022 we propose a novel feature-scattering approach for generating adversarial images for adversarial\n\u2022 we present an adversarial training formulation which deviates from the conventional minimax\n\u2022 we analyze the proposed approach and compare it with several state-of-the-art techniques, with\n\nformulation and falls into a broader category of bilevel optimization;\n\nextensive experiments on a number of standard benchmarks, verifying its effectiveness.\n\ntraining in a collaborative and unsupervised fashion;\n\n2 Background\n\n2.1 Adversarial Attack, Defense and Adversarial Training\n\nAdversarial examples, initially demonstrated in [4, 55], have attracted great attention recently [4,\n24, 58, 36, 2, 5]. Szegedy et al. pointed out that CNNs are vulnerable to adversarial examples\nand proposed an L-BFGS-based algorithm for generating them [55]. A fast gradient sign method\n(FGSM) for adversarial attack generation is developed and used in adversarial training in [24]. Many\nvariants of attacks have been developed later [41, 8, 54, 62, 7, 6]. In the mean time, many efforts\nhave been devoted to defending against adversarial examples [38, 37, 63, 25, 33, 50, 53, 46, 35].\nRecently, [2] showed that many existing defence methods suffer from a false sense of robustness\nagainst adversarial attacks due to gradient masking, and adversarial training [24, 32, 58, 36] is one of\nthe effective defense method against adversarial attacks. It improves model robustness by solving a\nminimax problem as [24, 36]:\n\n(1)\n\nmin\n\n\u03b8\n\n(cid:2) max\nx(cid:48)\u2208Sx L(x(cid:48), y; \u03b8)(cid:3)\n(cid:0)xt + \u03b1 \u00b7 sign(cid:0)\n\n\u2207xL(xt, y; \u03b8)(cid:1)(cid:1),\n\nwhere the inner maximization essentially generates attacks while the outer minimization corresponds\nto minimizing the \u201cadversarial loss\u201d induced by the inner attacks [36]. The inner maximization can\nbe solved approximately, using for example a one-step approach such as FGSM [24], or a multi-step\nprojected gradient descent (PGD) method [36]\n\nxt+1 = PSx\n\n(2)\nwhere PSx(\u00b7) is a projection operator projecting the input into the feasible region Sx. In the PGD\napproach, the original image x is randomly perturbed to some point x0 within B(x, \u0001), the \u0001-cube\naround x, and then goes through several PGD steps with a step size of \u03b1 as shown in Eqn.(2).\nLabel leaking [32] and gradient masking [43, 58, 2] are some well-known issues that hinder the\nadversarial training [32]. Label leaking occurs when the additive perturbation is highly correlated with\nthe ground-truth label. Therefore, when it is added to the image, the network can directly tell the class\nlabel by decoding the additive perturbation without relying on the real content of the image, leading\n\n2\n\n\fclean\nbatch\n\npert.\nbatch\n\nlabel\n\nFigure 1: Feature Scattering-based Adversarial Training Pipeline. The adversarial perturbations\nare generated collectively by feature scattering, i.e., maximizing the feature matching distance\nbetween the clean samples {xi} and the perturbed samples {x(cid:48)j}. The model parameters are updated\nby minimizing the cross-entropy loss using the perturbed images {x(cid:48)j} as the training samples.\nto higher adversarial accuracy than the clean image during training. Gradient masking [43, 58, 2]\nrefers to the effect that the adversarially trained model learns to \u201cimprove\u201d robustness by generating\nless useful gradients for adversarial attacks, which could be by-passed with a substitute model for\ngenerating attacks, thus giving a false sense of robustness [2].\n\n2.2 Different Distances for Feature and Distribution Matching\n\nEuclidean distance is arguably one of the most commonly used metric for measuring the distance\nbetween a pair of points. When it comes to two sets of points, it is natural to accumulate the individual\npairwise distance as a measure of distance between the two sets, given the proper correspondence.\nAlternatively, we can view each set as an empirical distribution and measure the distance between\nthem using Kullback-Leibler (KL) or Jensen-Shannon (JS) divergence. The challenge for learning\nwith KL or JS divergence is that no useful gradient is provided when the two empirical distributions\nhave disjoint supports or have a non-empty intersection contained in a set of measure zero [1, 49].\nThe optimal transport (OT) distance is an alternative measure of the distance between distributions\nwith advantages over KL and JS in the scenarios mentioned earlier. The OT distance between two\nprobability measures \u00b5 and \u03bd is de\ufb01ned as:\n\nD(\u00b5, \u03bd) = inf\n\n\u03b3\u2208\u03a0(\u00b5,\u03bd)\n\nE(x,y)\u223c\u03b3 c(x, y) ,\n\n(3)\n\nwhere \u03a0(\u00b5, \u03bd) denotes the set of all joint distributions \u03b3(x, y) with marginals \u00b5(x) and \u03bd(y), and\nc(x, y) is the cost function (Euclidean or cosine distance). Intuitively, D(\u00b5, \u03bd) is the minimum cost\nthat \u03b3 has to transport from \u00b5 to \u03bd. It provides a weaker topology than many other measures, which\nis important for applications where the data typically resides on a low dimensional manifold of the\ninput embedding space [1, 49], which is the case for natural images. It has been widely applied\nto many tasks, such as generative modeling [21, 1, 49, 20, 10], auto-encoding [57] and dictionary\nlearning [47]. For comprehensive historical and computational perspective of OT, we refer to [60, 45].\n\n3 Feature Scattering-based Adversarial Training\n3.1 Feature Matching and Feature Scattering\nFeature Matching. Conventional training treats training data as i.i.d samples from a data distribution,\noverlooking the connections between samples. The same assumption is used when generating\nadversarial examples for training, with the direction for perturbing a sample purely based on the\ndirection from the current data point to the decision boundary, regardless of other samples. While\neffective, it disregards the inter-relationship between different feature points, as the adversarial\nperturbation is computed individually for each sample, neglecting any collective distributional\nproperty. Furthermore, the supervised generation of the attacks makes the generated perturbations\nhighly biases towards the decision boundary, as shown in Figure 2. This is less desirable as it might\nneglect other directions that are crucial for learning robust models [28, 17] and leads to label leaking\ndue to high correlation between the perturbation and the decision boundary.\n\n3\n\n \t \tCT{xi}f\u2713(xi){x0j}f\u2713(x0j)CijOTsolver{yj}horse\t{y0j}{yj}min\u2713max{x0j}{fi}{f0j}g(f0j)DL\f(a)\n\n(b)\n\n(c)\n\nFigure 2: Illustration Example of Different Perturbation Schemes. (a) Original data. Perturbed\ndata using (b) supervised adversarial generation method and (c) the proposed feature scattering,\nwhich is an unsupervised method. The overlaid boundary is from the model trained on clean data.\n\nThe idea of leveraging inter-sample relationship for learning dates back to the seminal work of [51, 22,\n48]. This type of local structure is also exploited in this work, but for adversarial perturbation. The\nquest of local structure utilization and seamless integration with the end-to-end-training framework\nnaturally motivates an OT-based soft matching scheme, using the OT-distance as in Eqn.(3). We\nwritten as \u00b5 =(cid:80)n\nconsider OT between discrete distributions hereafter as we mainly focus on applying the OT distance\non image features. Speci\ufb01cally, consider two discrete distributions \u00b5, \u03bd \u2208 P(X), which can be\n(cid:80)\ni ui=(cid:80)\n, with \u03b4x the Dirac function centered on x.2 The\ni=1\u2208\u2206n belong to the n-dimensional simplex, i.e.,\nweight vectors \u00b5={ui}n\ni vi=1, as both \u00b5 and \u03bd are probability distributions. Under such a setting, computing the\nOT distance as de\ufb01ned in Eqn.(3) is equivalent to solving the following network-\ufb02ow problem\n\ni=1 ui\u03b4xi and \u03bd =(cid:80)n\nn(cid:88)\n\ni=1 vi\u03b4x(cid:48)i\ni=1\u2208\u2206n and \u03bd ={vi}n\nn(cid:88)\n\n(cid:104)T, C(cid:105)\n\n(4)\n\nTij \u00b7 c(xi, x\n(cid:48)\nj) = min\n\nT\u2208\u03a0(u,v)\n\nD(\u00b5, \u03bd) = min\n\nT\u2208\u03a0(u,v)\n\ni=1\n\nj=1\n\nwhere \u03a0(u, v) = {T \u2208 Rn\u00d7n\n+ |T1n = u, T(cid:62)1n = v}. 1n is an n-dimensional all-one vector. (cid:104)\u00b7,\u00b7(cid:105)\nrepresents the Frobenius dot-product. C is the transport cost matrix such that Cij = c(xi, x(cid:48)j). In\nthis work, the transport cost is de\ufb01ned as the cosine distance between image features:\n\nc(xi, x\n\n(cid:48)\n\nj) = 1 \u2212 f\u03b8(xi)(cid:62)f\u03b8(x(cid:48)\nj)\nj)(cid:107)2\n(cid:107)f\u03b8(xi)(cid:107)2(cid:107)f\u03b8(x(cid:48)\n\n= 1 \u2212\n\ni f(cid:48)\nf(cid:62)\n(cid:107)fi(cid:107)2(cid:107)f(cid:48)\n\nj\n\nj(cid:107)2\n\n(5)\n\nwhere f\u03b8(\u00b7) denotes the feature extractor with parameter \u03b8. We implement f\u03b8(\u00b7) as the deep neural\nnetwork upto the softmax layer. We can now formally de\ufb01ne the feature matching distance as follows.\nDe\ufb01nition 1. (Feature Matching Distance) The feature matching distance between two set of images\nis de\ufb01ned as D(\u00b5, \u03bd), the OT distance between empirical distributions \u00b5 and \u03bd for the two sets.\nNote that the feature-matching distance is also a function of \u03b8 (i.e. D\u03b8) when f\u03b8(\u00b7) is used for\nextracting the features in the computation of the ground distance as in Eqn.(5). We will simply use\nthe notation D in the following when there is no danger of confusion to minimize notional clutter .\nFeature Scattering. Based on the feature matching distance de\ufb01ned above, we can formulate\nproposed feature scattering method as follows:\n\nn(cid:88)\n\nn(cid:88)\n\n\u02c6\u03bd = arg max\n\n\u03bd\u2208S\u00b5 D(\u00b5, \u03bd), \u00b5 =\n\ni=1\n\ni=1\n\nui\u03b4xi , \u03bd =\n\nvi\u03b4x(cid:48)i\n\n.\n\n(6)\n\nThis can be intuitively interpreted as maximizing the feature matching distance between the original\nand perturbed empirical distributions with respect to the inputs subject to domain constraints S\u00b5\n\nS\u00b5 = {\n\nvi\u03b4zi,| zi \u2208 B(xi, \u0001) \u2229 [0, 255]d},\n\nempirical distribution as \u00b5 =(cid:80)\nproducing a perturbed empirical distribution \u03bd =(cid:80)\n\nwhere B(x, \u0001) = {z|(cid:107)z \u2212 x(cid:107)\u221e \u2264 \u0001} denotes the (cid:96)\u221e-cube with center x and radius \u0001. Formally, we\npresent the notion of feature scattering as follows.\nDe\ufb01nition 2. (Feature Scattering) Given a set of clean data {xi}, which can be represented as an\ni ui = 1, the feature scattering procedure is de\ufb01ned as\ni vi = 1 by maximizing D(\u00b5, \u03bd),\n2The two discrete distributions could be of different dimensions; here we present the exposition assuming the\n\nthe feature matching distance between \u00b5 and \u03bd, subject to domain and budget constraints.\n\nwith(cid:80)\n\ni vi\u03b4x(cid:48)i\n\nsame dimensionality to avoid notion clutter.\n\ni\n\n(cid:88)\ni ui\u03b4xi with(cid:80)\n\n4\n\n\fRemark 1. As the feature scattering is performed on a batch of samples leveraging inter-sample\nstructure, it is more effective as adversarial attacks compared to structure-agnostic random perturba-\ntion while is less constrained than supervisedly generated perturbation which is decision boundary\noriented and suffers from label leaking. Empirical comparisons will be provided in Section 5.\n\n3.2 Adversarial Training with Feature Scattering\n\nWe leverage feature scattering for adversarial training, with the mathmatical formulation as follows\n\nn(cid:88)\n\ni=1\n\nmin\u03b8\n\n1\nn\n\nL\u03b8(x(cid:48)i, yi)\n\ns.t. \u03bd\u2217 (cid:44) n(cid:88)\n\nvi\u03b4x(cid:48)i\n\n= max\n\n\u03bd\u2208S\u00b5 D(\u00b5, \u03bd).\n\n(7)\n\ni=1\n\nThe proposed formulation deviates from the conventional minimax formulation for adversarial\ntraining [24, 36]. More speci\ufb01cally, it can be regarded as an instance of the more general bilevel\noptimization problem [13, 3]. Feature scattering is effective for adversarial training scenario as there\nis a requirements of more data [52]. Feature scattering promotes data diversity without drastically\naltering the structure of the data manifold as in the conventional supervised approach, with label\nleaking as one manifesting phenomenon. Secondly, the feature matching distance couples the samples\nwithin the batch together, therefore the generated adversarial attacks are produced collaboratively by\ntaking the inter-sample relationship into consideration. Thirdly, feature scattering implicitly induces\na coupled regularization (detailed below) on model training, leveraging the inter-sample structure for\njoint regularization.\nThe proposed approach is equivalent\ni=1 L\u03b8(xi, yi) +\n\u03bbR\u03b8(x1,\u00b7\u00b7\u00b7 , xn), consisting of the conventional loss L\u03b8(xi, yi) on the original data, and a regu-\ninputs, i.e., R\u03b8(x1,\u00b7\u00b7\u00b7 , xn)(cid:54)=(cid:80)\nlarization term R\u03b8 coupled over the inputs. It \ufb01rst highlights the unique property of the proposed\nfeature scattering approach that it induces an effective regularization term that is coupled over all\ni R(cid:48)\u03b8(xi). This implies that the model leverages information from\nall inputs in a joint fashion for learning, offering the opportunity of collaborative regularization\nleveraging inter-sample relationships. Second, the usage of a function (D\u03b8) different from L\u03b8 for\ninducing R\u03b8 offers more \ufb02exibilities in the effective regularization; moreover, no label information is\n(cid:80)\nincorporated in D\u03b8, thus avoiding potential label leaking as in the conventional case when \u2202L\u03b8 (xi,yi)\nis highly correlated with yi. Finally, in the case when D\u03b8 is separable over inputs and takes the form\nof a supervised loss, e.g., D\u03b8 \u2261\ni L\u03b8(xi, yi), the proposed approach reduces to the conventional\nadversarial training setup [24, 36]. The overall procedure for the proposed approach is in Algorithm 1.\n\nto the minimization of a loss,\n\n(cid:80)n\n\n\u2202xi\n\n1\nn\n\nAlgorithm 1 Feature Scattering-based Adversarial Training\n\ni=1 \u223cS do\ni ui\u03b4xi,\n\nInput: dataset S, training epochs K, batch size n, learning rate \u03b3, budget \u0001, attack iterations T\nfor k = 1 to K do\nfor random batch {xi, yi}n\nfeature scattering (maximizing feature matching distance D w.r.t. \u03bd):\nfor t = 1 to T do\n\u00b7 x(cid:48)i \u2190 PSx\n\ninitialization: \u00b5 =(cid:80)\n(cid:0)x(cid:48)i + \u0001 \u00b7 sign(cid:0)\n(cid:80)n\ni=1 \u2207\u03b8L(x(cid:48)i, yi; \u03b8)\n\nend for\nadversarial training (updating model parameters):\n\u00b7 \u03b8 \u2190 \u03b8 \u2212 \u03b3 \u00b7 1\n\n\u03bd =(cid:80)\n\u2207x(cid:48)iD(\u00b5, \u03bd)(cid:1)(cid:1)\n\n, x(cid:48)i \u223c B(xi, \u0001)\n\n\u2200i = 1,\u00b7\u00b7\u00b7 , n, \u03bd =(cid:80)\n\ni vi\u03b4x(cid:48)i\n\ni vi\u03b4x(cid:48)i\n\nn\n\nend for\n\nend for\nOutput: model parameter \u03b8.\n\n4 Discussions\nManifold-based Defense [34, 37, 15, 27]. [34, 37, 27] proposed to defend by projecting the perturbed\nimage onto a proper manifold. [15] used a similar idea of manifold projection but approximated\nthis step with a nearest neighbor search against a web-scale database. Differently, we leverage the\nmanifold in the form of inter-sample relationship for the generation of the perturbations, which\ninduces an implicit regularization of the model when used in the adversarial training framework.\nWhile defense in [34, 37, 15, 27] is achieved by shrinking the perturbed inputs towards the manifold,\nwe expand the manifold using feature scattering to generate perturbed inputs for adversarial training.\n\n5\n\n\fInter-sample Regularization [70, 30, 39]. Mixup [70] generates training examples by linear inter-\npolation between pairs of natural examples, thus introducing an linear inductive bias in the vicinity of\ntraining samples. Therefore, the model is expected to reduce the amount of undesirable oscillations\nfor off-manifold samples. Logit pairing [30] augments the original training loss with a \u201cpairing\u201d\nloss, which measures the difference between the logits of clean and adversarial images. The idea\nis to suppress spurious logits responses using the natural logits as a reference. Similarly, virtual\nadversarial training [39] proposed a regularization term based on the KL divergence of the prediction\nprobability of original and adversarially perturbed images. In our model, the inter-sample relationship\nis leveraged for generating the adversarial perturbations, which induces an implicit regularization\nterm in the objective function that is coupled over all input samples.\nWasserstein GAN and OT-GAN [1, 49, 10]. Generative Adversarial Networks (GAN) is a family\nof techniques that learn to capture the data distribution implicitly by generating samples directly [23].\nIt originally suffers from the issues of instability of training and mode collapsing [23, 1]. OT-\nrelated distances [1, 12] have been used for overcoming the dif\ufb01culties encountered in the original\nGAN training [1, 49]. This technique has been further extended to generating discrete data such as\ntexts [10]. Different from GANs, which maximizes a discrimination criteria w.r.t.the parameters of\nthe discriminator for better capturing the data distribution, we maximize a feature matching distance\nw.r.t.the perturbed inputs for generating proper training data to improve model robustness.\n\n5 Experiments\nBaselines and Implementation Details. Our implementation is based on PyTorch and the code as\nwell as other related resources are available on the project page.3 We conduct extensive experiments\nacross several benchmark datasets including CIFAR10 [31], CIFAR100 [31] and SVHN [42]. We\nuse Wide ResNet (WRN-28-10) [68] as the network structure following [36]. We compare the\nperformance of the proposed method with a number of baseline methods, including: i) the model\ntrained with standard approach using clean images (Standard) [31], ii) PGD-based approach from\nMadry et al. (Madry) [36], which is one of the most effective defense method [2], iii) another\nrecent method performs adversarial training with both image and label adversarial perturbations\n(Bilateral) [61]. For training, the initial learning rate \u03b3 is 0.1 for CIFAR and 0.01 for SVHN.\nWe set the number of epochs the Standard and Madry methods as 100 with transition epochs as\n{60, 90} as we empirically observed the performance of the trained model stabilized before 100\nepochs. The training scheduling of 200 epochs similar to [61] with the same transition epochs used as\nwe empirically observed it helps with the model performance, possibly due to the increased variations\nof data via feature scattering. We performed standard data augmentation including random crops\nwith 4 pixels of padding and random horizontal \ufb02ips [31] during training. The perturbation budget\nof \u0001 = 8 is used in training following literature [36]. Label smoothing of 0.5, attack iteration T=1\nand Sinkhorn algorithm [12] with regularization of 0.01 is used. For testing, model robustness is\nevaluated by approximately computing an upper bound of robustness on the test set, by measuring the\naccuracy of the model under different adversarial attacks, including white-box FGSM [24], PGD [36],\nCW [8] (CW-loss [8] within the PGD framework) attacks and variants of black-box attacks.\n5.1 Visual Classi\ufb01cation Performance Under White-box Attacks\nCIFAR10. We conduct experiments on CIFAR10 [31], which is a popular dataset that is widely\nuse in adversarial training literature [36, 61] with 10 classes, 5K training images per class and 10K\ntest images. We report the accuracy on the original test images (Clean) and under PGD and CW\nattack with T iterations (PGDT and CWT ) [36, 8]. The evaluation results are summarized in Table 1.\nIt is observed Standard model fails drastically under different white-box attacks. Madry method\nimproves the model robustness signi\ufb01cantly over the Standard model. Under the standard PGD20\nattack, it achieves 44.9% accuracy. The Bilateral approach further boosts the performance to\n57.5%. The proposed approach outperforms both methods by a large margin, improving over Madry\nby 25.6%, and is 13.0% better than Bilateral, achieving 70.5% accuracy under the standard 20\nsteps PGD attack. Similar patten has been observed for CW metric.\nWe further evaluate model robustness against PGD attacker under different attack budgets with a\n\ufb01xed attack step of 20, with the results shown in Figure 3 (a). It is observed that the performance\nof Standard model drops quickly as the attack budget increases. The Madry model [36] improves\nthe model robustness signi\ufb01cantly across a wide range of attack budgets. The Proposed approach\n\n3https://sites.google.com/site/hczhang1/projects/feature_scattering\n\n6\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 3: Model performance under PGD attack with different (a) attack budgets (b) attack iterations.\nMadry and Proposed models are trained with the attack iteration of 7 and 1 respectively.\n\nModels\n\nStandard\n\nMadry\n\nBilateral\nProposed\n\nClean\n95.6\n85.7\n91.2\n90.0\n\nFGSM PGD10\n36.9\n54.9\n70.7\n78.4\n\n0.0\n45.1\n\n\u2013\n70.9\n\nPGD20\n\n0.0\n44.9\n57.5\n70.5\n\nAccuracy under White-box Attack (\u0001 = 8)\n\nPGD40\n\nPGD100\n\n0.0\n44.8\n\n\u2013\n70.3\n\n0.0\n44.8\n55.2\n68.6\n\nCW10 CW20\n0.0\n0.0\n45.7\n45.9\n56.2\n62.4\n\n\u2013\n62.6\n\nCW40 CW100\n0.0\n45.6\n\n0.0\n45.4\n53.8\n60.6\n\n\u2013\n62.1\n\nTable 1: Accuracy comparison of the Proposed approach with Standard, Madry [36] and\nBilateral [61] methods on CIFAR10 under different threat models.\n\nfurther boosts the performance over the Madry model [36] by a large margin under different attack\nbudgets. We also conduct experiments using PGD attacker with different attack iterations with a\n\ufb01xed attack budget of 8, with the results shown in Figure 3 (b-c) and also Table 1. It is observed that\nboth Madry [36] and Proposed can maintain a fairly stable performance when the number of attack\niterations is increased. Notably, the proposed approach consistently outperforms the Madry [36]\nmodel across a wide range of attack iterations. From Table 1, it is also observed that the Proposed\napproach also outperforms Bilateral [61] under all variants of PGD and CW attacks. We will use\na PGD/CW attackers with \u0001=8 and attack step 20 and 100 in the sequel as part of the threat models.\n\nModels Clean\n\nWhite-box Attack (\u0001 = 8)\n\nFGSM PGD20 PGD100 CW20 CW100\n\nMadry\n\nStandard 97.2 53.0\n93.9 68.4\nBilateral 94.1 69.8\nProposed 96.2 83.5\n\n0.3\n47.9\n53.9\n62.9\n\n0.1\n46.0\n50.3\n52.0\n\n0.3\n48.7\n\n\u2013\n61.3\n\n0.1\n47.3\n48.9\n50.8\n\nModels Clean\n\nWhite-box Attack (\u0001 = 8)\n\nFGSM PGD20 PGD100 CW20 CW100\n\nMadry\n\nStandard 79.0 10.0\n59.9 28.5\nBilateral 68.2 60.8\nProposed 73.9 61.0\n\n0.0\n22.6\n26.7\n47.2\n\n0.0\n22.3\n25.3\n46.2\n\n0.0\n23.2\n\n\u2013\n34.6\n\n0.0\n23.0\n22.1\n30.6\n\nTable 2: Accuracy comparison on (a) SVHN and (b) CIFAR100.\n\nSVHN. We further report results on the SVHN dataset [42]. SVHN is a 10-way house number\nclassi\ufb01cation dataset, with 73257 training images and 26032 test images. The additional training\nimages are not used in experiment. The results are summarized in Table 2(a). Experimental results\nshow that the proposed method achieves the best clean accuracy among all three robust models and\noutperforms other method with a clear margin under both PGD and CW attacks with different number\nof attack iterations, demonstrating the effectiveness of the proposed approach.\nCIFAR100. We also conduct experiments on CIFAR100 dataset, with 100 classes, 50K training\nand 10K test images [31]. Note that this dataset is more challenging than CIFAR10 as the number\nof training images per class is ten times smaller than that of CIFAR10. As shown by the results in\nTable 2(b), the proposed approach outperforms all baseline methods signi\ufb01cantly, which is about\n20% better than Madry [36] and Bilateral [61] under PGD attack and about 10% better under CW\nattack. The superior performance of the proposed approach on this data set further demonstrates the\nimportance of leveraging inter-sample structure for learning [69].\n5.2 Ablation Studies\nWe investigate the impacts of algorithmic components and more results are in the supplementary \ufb01le.\nThe Importance of Feature Scattering. We empirically verify the effectiveness of feature scat-\ntering, by comparing the performances of models trained using different perturbation schemes:\ni) Random: a natural baseline approach that randomly perturb each sample within the epsilon neigh-\nborhood; ii) Supervised: perturbation generated using ground-truth label in a supervised fashion;\niii) FeaScatter: perturbation generated using the proposed feature scattering method. All other\nhyper-parameters are kept exactly the same other than the perturbation scheme used. The results are\nsummarized in Table 3(a). It is evident that the proposed feature scattering (FeaScatter) approach\n\n7\n\nAttack Budget05101520Accuracy (%)020406080100NaturalMadryProposedAttack Iteration0510Accuracy (%)020406080100Attack Iteration050100Accuracy (%)020406080100\fdr\n\ndr\n\ndr\n\ns\ns\no\n\nl\n\ns\ns\no\n\nl\n\nda\n\ns\ns\no\n\nl\n\nda\n\nda\n\nda\n\ndr\n(b)\n\ndr\n(a)\nFigure 4: Loss surface visualization in the vicinity of a natural image along adversarial direction\n(da) and direction of a Rademacher vector (dr) for (a) Standard (b) Madry (c) Proposed models.\noutperforms both Random and Supervised methods, demonstrating its effectiveness. Furthermore,\nas it is the major component that is difference from the conventional adversarial training pipeline, this\nresult suggests that feature scattering is the main contributor to the improved adversarial robustness.\n\ndr\n(c)\n\nda\n\nda\n\nWhite-box Attack (\u0001 = 8)\n\nPerturb\nRandom\n\nClean\n95.3 75.7\nSupervised 86.9 64.4\nFeaScatter 90.0 78.4\n\nFGSM PGD20 PGD100 CW20 CW100\n26.2\n50.3\n60.6\n\n29.9\n56.0\n70.5\n\n18.3\n54.5\n68.6\n\n34.7\n51.2\n62.4\n\nWhite-box Attack (\u0001 = 8)\n\nMatch Clean\n\nUniform 90.0 71.0\nIdentity 87.4 66.3\n90.0 78.4\n\nOT\n\nFGSM PGD20 PGD100 CW20 CW100\n51.4\n50.6\n60.6\n\n53.2\n52.4\n62.4\n\n57.1\n57.5\n70.5\n\n54.7\n56.0\n68.6\n\nTable 3: (a) Importance of feature-scattering. (b) Impacts of different matching schemes.\n\nThe Role of Matching. We further investigate the role of matching schemes within the feature\nscattering component by comparing several different schemes: i) Uniform matching, which matches\neach clean sample uniformly with all perturbed samples in the batch; ii) Identity matching,\nwhich matches each clean sample to its perturbed sample only; iii) OT-matching: the proposed\napproach that assigns soft matches between the clean samples and perturbed samples according to the\noptimization criteria. The results are summarized in Table 3(b). It is observed all variants of matching\nschemes lead to performances that are on par or better than state-of-the-art methods, implying that\nthe proposed framework is effective in general. Notably, OT-matching leads to the best results,\nsuggesting the importance of the proper matching for feature scattering.\nThe Impact of OT-Solvers. Exact minimization of Eqn.(4) over T is intractable in general [1, 49, 21,\n12]. Here we compare two practical solvers, the Sinkhorn algorithm [12] and the Inexact Proximal\npoint method for Optimal Transport (IPOT) algorithm [66]. More details on them can be found in the\nsupplementary \ufb01le and [12, 66, 45]. The results are summarized in Table 4. It is shown that different\ninstantiations of the proposed approach with different OT-solvers lead to comparable performances,\nimplying that the proposed approach is effective in general regardless of the choice of OT-solvers.\n\nSVHN\n\nCIFAR100\n\nCIFAR10\n\n70.5\n69.9\n\n68.6\n67.3\n\n62.4\n59.6\n\n46.2\n46.3\n\n34.6\n32.0\n\n47.2\n47.5\n\n50.8\n48.4\n\n61.3\n57.8\n\n52.0\n49.3\n\n62.9\n60.0\n\n60.6\n56.9\n\nOT-solver\nSinkhorn 90.0 78.4\n89.9 77.9\n\nB-Attack\n\n73.9 61.0\n74.2 67.3\n\n96.2 83.5\n96.0 82.6\n\nClean FGSM PGD20 PGD100 CW20 CW100 Clean FGSM PGD20 PGD100 CW20 CW100 Clean FGSM PGD20 PGD100 CW20 CW100\n30.6\n29.3\nIPOT\nTable 4: Impacts of OT-solvers. The proposed approach performs well with different OT-solvers.\n5.3 Performance under Black-box Attack\nTo further verify if a degenerate minimum is obtained,\nwe evaluate the robustness of the model trained with the\nproposed approach w.r.t.black-box attacks (B-Attack) fol-\nlowing [58]. Two different models are used for generating\ntest time attacks: i) Undefended: undefended model trained using Standard approach, ii) Siamese:\na robust model from another training session using the proposed approach. As demonstrated by\nthe results in the table on the right, the model trained with the proposed approach is robust against\ndifferent types of black-box attacks, verifying that a non-degenerate solution is learned [58].\nFinally, we visualize in Figure 4 the loss surfaces of different models as another level of comparison.\n6 Conclusion\nWe present a feature scattering-based adversarial training method in this paper. The proposed\napproach distinguish itself from others by using an unsupervised feature-scattering approach for\ngenerating adversarial training images, which leverages the inter-sample relationship for collaborative\nperturbation generation. We show that a coupled regularization term is induced from feature scattering\nfor adversarial training and empirically demonstrate the effectiveness of the proposed approach\nthrough extensive experiments on benchmark datasets.\n\nPGD20 PGD100 CW20 CW100\n88.8\n79.8\n\nUndefended 89.0\n81.6\n\n88.7\n81.0\n\n88.9\n80.3\n\nSiamese\n\n8\n\n\fReferences\n[1] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In Proceedings of\n\nthe 34th International Conference on Machine Learning, 2017.\n\n[2] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing\n\ndefenses to adversarial examples. In International Conference on Machine learning, 2018.\n\n[3] J. F. Bard. Practical Bilevel Optimization: Algorithms and Applications. Springer Publishing Company,\n\nIncorporated, 1st edition, 2010.\n\n[4] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndic, P. Laskov, G. Giacinto, and F. Roli. Evasion\nattacks against machine learning at test time. In European Conference on Machine Learning and Principles\nand Practice of Knowledge Discovery in Databases, 2013.\n\n[5] B. Biggio and F. Roli. Wild patterns: Ten years after the rise of adversarial machine learning. CoRR,\n\nabs/1712.03141, 2017.\n\n[6] W. Brendel, J. Rauber, and M. Bethge. Decision-based adversarial attacks: Reliable attacks against\n\nblack-box machine learning models. In International Conference on Learning Representations, 2018.\n\n[7] T. B. Brown, D. Man\u00e9, A. Roy, M. Abadi, and J. Gilmer. Adversarial patch. CoRR, abs/1712.09665, 2017.\n\n[8] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on\n\nSecurity and Privacy, 2017.\n\n[9] N. Carlini and D. A. Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In IEEE\n\nSymposium on Security and Privacy Workshops, 2018.\n\n[10] L. Chen, S. Dai, C. Tao, H. Zhang, Z. Gan, D. Shen, Y. Zhang, G. Wang, R. Zhang, and L. Carin.\nAdversarial text generation via feature-mover\u2019s distance. In Advances in Neural Information Processing\nSystems, 2018.\n\n[11] M. Cisse, Y. Adi, N. Neverova, and J. Keshet. Houdini: Fooling deep structured prediction models. In\n\nAdvances in Neural Information Processing Systems, 2017.\n\n[12] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural\n\nInformation Processing Systems, 2013.\n\n[13] S. Dempe, V. Kalashnikov, G. A. Prez-Valds, and N. Kalashnykova. Bilevel Programming Problems:\nTheory, Algorithms and Applications to Energy Networks. Springer Publishing Company, Incorporated,\n2015.\n\n[14] F. Draxler, K. Veschgini, M. Salmhofer, and F. Hamprecht. Essentially no barriers in neural network energy\n\nlandscape. In International Conference on Machine Learning, 2018.\n\n[15] A. Dubey, L. van der Maaten, Z. Yalniz, Y. Li, and D. Mahajan. Defense against adversarial images using\n\nweb-scale nearest-neighbor search. CoRR, abs/1903.01612, 2019.\n\n[16] L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry. Exploring the landscape of spatial robustness.\n\nIn International Conference on Machine Learning, 2019.\n\n[17] C. Etmann, S. Lunz, P. Maass, and C.-B. Sch\u00f6nlieb. On the connection between adversarial robustness and\n\nsaliency map interpretability. In International Conference on Machine Learning, 2019.\n\n[18] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tram\u00e8r, A. Prakash, T. Kohno, and D. Song.\n\nPhysical adversarial examples for object detectors. CoRR, abs/1807.07769, 2018.\n\n[19] A. Fawzi, S. Moosavi-Dezfooli, P. Frossard, and S. Soatto. Empirical study of the topology and geometry\n\nof deep networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.\n\n[20] A. Genevay, G. Peyre, and M. Cuturi. GAN and VAE from an optimal transport point of view.\n\narXiv:1706.01807, 2017.\n\n[21] A. Genevay, G. Peyre, and M. Cuturi. Learning generative models with sinkhorn divergences.\n\nInternational Conference on Arti\ufb01cial Intelligence and Statistics, 2018.\n\nIn\n\nIn\n\n[22] J. Goldberger, G. E. Hinton, S. T. Roweis, and R. R. Salakhutdinov. Neighbourhood components analysis.\n\nIn Advances in Neural Information Processing Systems, 2005.\n\n9\n\n\f[23] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio.\n\nGenerative adversarial nets. In Advances in Neural Information Processing Systems, 2014.\n\n[24] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International\n\nConference on Learning Representations, 2015.\n\n[25] C. Guo, M. Rana, M. Ciss\u00e9, and L. van der Maaten. Countering adversarial images using input transforma-\n\ntions. In International Conference on Learning Representations, 2018.\n\n[26] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science,\n\n313(5786):504\u2013507, 2006.\n\n[27] A. Ilyas, A. Jalal, E. Asteri, C. Daskalakis, and A. G. Dimakis. The robust manifold defense: Adversarial\n\ntraining using generative models. CoRR, abs/1712.09196, 2017.\n\n[28] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry. Adversarial examples are not bugs,\n\nthey are features. In International Conference on Learning Representations, 2019.\n\n[29] J.-H. Jacobsen, J. Behrmann, R. Zemel, and M. Bethge. Excessive invariance causes adversarial vulnerabil-\n\nity. In International Conference on Learning Representations, 2019.\n\n[30] H. Kannan, A. Kurakin, and I. J. Goodfellow. Adversarial logit pairing. CoRR, abs/1803.06373, 2018.\n\n[31] A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.\n\n[32] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale.\n\nConference on Learning Representations, 2017.\n\nIn International\n\n[33] F. Liao, M. Liang, Y. Dong, and T. Pang. Defense against adversarial attacks using high-level representation\n\nguided denoiser. In Computer Vision and Pattern Recognition, 2018.\n\n[34] B. Lindqvist, S. Sugrim, and R. Izmailov. AutoGAN: Robust classi\ufb01er against adversarial attacks. CoRR,\n\nabs/1812.03405, 2018.\n\n[35] X. Liu, M. Cheng, H. Zhang, and C.-J. Hsieh. Towards robust neural networks via random self-ensemble.\n\nIn European Conference on Computer Vision, 2018.\n\n[36] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to\n\nadversarial attacks. In International Conference on Learning Representations, 2018.\n\n[37] D. Meng and H. Chen. MagNet: a two-pronged defense against adversarial examples. In Proceedings of\n\nthe 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017.\n\n[38] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations.\n\nInternational Conference on Learning Representations, 2017.\n\nIn\n\n[39] T. Miyato, S. Maeda, M. Koyama, and S. Ishii. Virtual adversarial training: a regularization method for\n\nsupervised and semi-supervised learning. CoRR, abs/1704.03976, 2017.\n\n[40] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. In CVPR,\n\n2017.\n\n[41] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to fool deep\n\nneural networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.\n\n[42] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with\nunsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning,\n2011.\n\n[43] N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep\n\nlearning in adversarial settings. CoRR, abs/1511.07528, 2015.\n\n[44] S. Park and M. Thorpe. Representing and learning high dimensional data with the optimal transport map\nfrom a probabilistic viewpoint. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.\n\n[45] G. Peyr\u00e9 and M. Cuturi. Computational optimal transport. to appear in Foundations and Trends in Machine\n\nLearning, 2018.\n\n[46] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer. De\ufb02ecting adversarial attacks with pixel\n\nde\ufb02ection. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.\n\n10\n\n\f[47] A. Rolet, M. Cuturi, and G. Peyr\u00e9. Fast dictionary learning with a smoothed Wasserstein loss.\n\nInternational Conference on Arti\ufb01cial Intelligence and Statistics, 2016.\n\nIn\n\n[48] R. Salakhutdinov and G. Hinton. Learning a nonlinear embedding by preserving class neighbourhood\n\nstructure. In International Conference on Arti\ufb01cial Intelligence and Statistics, 2007.\n\n[49] T. Salimans, H. Zhang, A. Radford, and D. Metaxas. Improving GANs using optimal transport. In\n\nInternational Conference on Learning Representations, 2018.\n\n[50] P. Samangouei, M. Kabkab, and R. Chellappa. Defense-GAN: Protecting classi\ufb01ers against adversarial\n\nattacks using generative models. In International Conference on Learning Representations, 2018.\n\n[51] L. K. Saul, S. T. Roweis, and Y. Singer. Think globally, \ufb01t locally: Unsupervised learning of low\n\ndimensional manifolds. Journal of Machine Learning Research, 4:119\u2013155, 2003.\n\n[52] L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry. Adversarially robust generalization requires\n\nmore data. arXiv preprint arXiv:1804.11285, 2018.\n\n[53] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. Pixeldefend: Leveraging generative models to\n\nunderstand and defend against adversarial examples. arXiv preprint arXiv:1710.10766, 2017.\n\n[54] J. Su, D. V. Vargas, and K. Sakurai. One pixel attack for fooling deep neural networks. CoRR,\n\nabs/1710.08864, 2017.\n\n[55] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing\n\nproperties of neural networks. In International Conference on Learning Representations, 2014.\n\n[56] T. Tanay and L. D. Grif\ufb01n. A boundary tilting persepective on the phenomenon of adversarial examples.\n\nCoRR, abs/1608.07690, 2016.\n\n[57] I. Tolstikhin, O. Bousquet, S. Gelly, and B. Scholkopf. Wasserstein auto-encoders. In International\n\nConference on Learning Representations, 2018.\n\n[58] F. Tram\u00e8r, A. Kurakin, N. Papernot, D. Boneh, and P. McDaniel. Ensemble adversarial training: Attacks\n\nand defenses. In International Conference on Learning Representations, 2018.\n\n[59] F. Tram\u00e8r, N. Papernot, I. J. Goodfellow, D. Boneh, and P. D. McDaniel. The space of transferable\n\nadversarial examples. CoRR, abs/1704.03453, 2017.\n\n[60] C. Villani. Optimal transport, old and new. Springer, 2008.\n[61] J. Wang and H. Zhang. Bilateral adversarial training: Towards fast training of more robust models against\n\nadversarial attacks. In IEEE International Conference on Computer Vision, 2019.\n\n[62] C. Xiao, B. Li, J. yan Zhu, W. He, M. Liu, and D. Song. Generating adversarial examples with adversarial\n\nnetworks. In International Joint Conference on Arti\ufb01cial Intelligence.\n\n[63] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. Mitigating adversarial effects through randomization. In\n\nInternational Conference on Learning Representations, 2018.\n\n[64] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille. Adversarial examples for semantic segmentation\n\nand object detection. In International Conference on Computer Vision, 2017.\n\n[65] C. Xie, Y. Wu, L. van der Maaten, A. Yuille, and K. He. Feature denoising for improving adversarial\n\nrobustness. arXiv preprint arXiv:1812.03411, 2018.\n\n[66] Y. Xie, X. Wang, R. Wang, and H. Zha. A fast proximal point method for Wasserstein distance. In\n\narXiv:1802.04307, 2018.\n\n[67] Z. Yan, Y. Guo, and C. Zhang. Deep defense: Training dnns with improved adversarial robustness. In\n\nAdvances in Neural Information Processing Systems, 2018.\n\n[68] S. Zagoruyko and N. Komodakis. Wide residual networks. In British Machine Vision Conference, 2016.\n[69] H. Zhang, H. Chen, Z. Song, D. Boning, inderjit dhillon, and C.-J. Hsieh. The limitations of adversarial\n\ntraining and the blind-spot attack. In International Conference on Learning Representations, 2019.\n\n[70] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. In\n\nInternational Conference on Learning Representations, 2018.\n\n[71] H. Zhang and J. Wang. Joint adversarial training: Incorporating both spatial and pixel attacks. CoRR,\n\nabs/1907.10737, 2019.\n\n[72] H. Zhang and J. Wang. Towards adversarially robust object detection. In IEEE International Conference\n\non Computer Vision, 2019.\n\n11\n\n\f", "award": [], "sourceid": 1057, "authors": [{"given_name": "Haichao", "family_name": "Zhang", "institution": "Horizon Robotics"}, {"given_name": "Jianyu", "family_name": "Wang", "institution": "Baidu USA"}]}