{"title": "A Spectral View of Adversarially Robust Features", "book": "Advances in Neural Information Processing Systems", "page_first": 10138, "page_last": 10148, "abstract": "Given the apparent difficulty of learning models that are robust to adversarial perturbations, we propose tackling the simpler problem of developing adversarially robust features. Specifically, given a dataset and metric of interest, the goal is to return a function (or multiple functions) that 1) is robust to adversarial perturbations, and 2) has significant variation across the datapoints. We establish strong connections between adversarially robust features and a natural spectral property of the geometry of the dataset and metric of interest. This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset. Finally, we provide empirical evidence that the adversarially robust features given by this spectral approach can be fruitfully leveraged to learn a robust (and accurate) model.", "full_text": "A Spectral View of Adversarially Robust Features\n\nShivam Garg\n\nVatsal Sharan\u2217\n\nBrian Hu Zhang\u2217\n\nGregory Valiant\n\nStanford University\nStanford, CA 94305\n\n{shivamgarg, vsharan, bhz, gvaliant}@stanford.edu\n\nAbstract\n\nGiven the apparent dif\ufb01culty of learning models that are robust to adversarial per-\nturbations, we propose tackling the simpler problem of developing adversarially\nrobust features. Speci\ufb01cally, given a dataset and metric of interest, the goal is to\nreturn a function (or multiple functions) that 1) is robust to adversarial perturba-\ntions, and 2) has signi\ufb01cant variation across the datapoints. We establish strong\nconnections between adversarially robust features and a natural spectral property\nof the geometry of the dataset and metric of interest. This connection can be\nleveraged to provide both robust features, and a lower bound on the robustness of\nany function that has signi\ufb01cant variance across the dataset. Finally, we provide\nempirical evidence that the adversarially robust features given by this spectral\napproach can be fruitfully leveraged to learn a robust (and accurate) model.\n\n1\n\nIntroduction\n\nWhile machine learning models have achieved spectacular performance in many settings, including\nhuman-level accuracy for a variety of image recognition tasks, these models exhibit a striking\nvulnerability to adversarial examples. For nearly every input datapoint\u2014including training data\u2014a\nsmall perturbation can be carefully chosen to make the model misclassify this perturbed point. Often,\nthese perturbations are so minute that they are not discernible to the human eye.\nSince the initial work of Szegedy et al. [2013] and Goodfellow et al. [2014] identi\ufb01ed this surprising\nbrittleness of many models trained over high-dimensional data, there has been a growing appreciation\nfor the importance of understanding this vulnerability. From a conceptual standpoint, this lack of\nrobustness seems to be one of the most signi\ufb01cant differences between humans\u2019 classi\ufb01cation abilities\n(particularly for image recognition tasks), and computer models. Indeed this vulnerability is touted\nas evidence that computer models are not really learning, and are simply assembling a number of\ncheap and effective, but easily fooled, tricks. Fueled by a recent line of work demonstrating that\nadversarial examples can actually be created in the real world (as opposed to requiring the ability\nto edit the individual pixels in an input image) [Evtimov et al., 2017, Brown et al., 2017, Kurakin\net al., 2016, Athalye and Sutskever, 2017], there has been a signi\ufb01cant effort to examine adversarial\nexamples from a security perspective. In certain settings where trained machine learning systems\nmake critically important decisions, developing models that are robust to adversarial examples might\nbe a requisite for deployment.\nDespite the intense recent interest in both computing adversarial examples and on developing learning\nalgorithms that yield robust models, we seem to have more questions than answers. In general,\nensuring that models trained on high-dimensional data are robust to adversarial examples seems to be\nextremely dif\ufb01cult: for example, Athalye et al. [2018] claims to have broken six attempted defenses\n\n\u2217Equal contribution\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fsubmitted to ICLR 2018 before the conference even happened. Additionally, we currently lack\nanswers to many of the most basic questions concerning why adversarial examples are so dif\ufb01cult to\navoid. What are the tradeoffs between the amount of data available, accuracy of the trained model,\nand vulnerability to adversarial examples? What properties of the geometry of a dataset determine\nwhether a robust and accurate model exists?\nThe goal of this work is to provide a new perspective on robustness to adversarial examples, by\ninvestigating the simpler objective of \ufb01nding adversarially robust features. Rather than trying to\nlearn a robust function that also achieves high classi\ufb01cation accuracy, we consider the problem\nof learning any function that is robust to adversarial perturbations with respect to any speci\ufb01ed\nmetric. Speci\ufb01cally, given a dataset of d-dimensional points and a metric of interest, can we learn\nfeatures\u2014namely functions from Rd \u2192 R\u2014which 1) are robust to adversarial perturbations of a\nbounded magnitude with respect to the speci\ufb01ed metric, and 2) have signi\ufb01cant variation across the\ndatapoints (which precludes the trivially robust constant function).\nThere are several motivations for considering this problem of \ufb01nding robust features: First, given the\napparent dif\ufb01culty of learning adversarially robust models, this is a natural \ufb01rst step that might help\ndisentangle the confounding challenges of achieving robustness and achieving good classi\ufb01cation\nperformance. Second, given robust features, one can hope to get a robust model if the classi\ufb01er\nused on top of these features is reasonably Lipschitz. While there are no a priori guarantees that\nthe features contain any information about the labels, as we empirically demonstrate, these features\nseem to contain suf\ufb01cient information about the geometry of the dataset to yield accurate models.\nIn this sense, computing robust features can be viewed as a possible intermediate step in learning\nrobust models, which might also signi\ufb01cantly reduce the computational expense of training robust\nmodels directly. Finally, considering this simpler question of understanding robust features might\nyield important insights into the geometry of datasets, and the speci\ufb01c metrics under which the\nrobustness is being considered (e.g. the geometry of the data under the (cid:96)\u221e, or (cid:96)2, metric.) For\nexample, by providing a lower bound on the robustness of any function (that has variance one across\nthe datapoints), we trivially obtain a lower bound on the robustness of any classi\ufb01cation model.\n\n1.1 Robustness to Adversarial Perturbations\n\nBefore proceeding, it will be useful to formalize the notion of robustness (or lack thereof) to\nadversarial examples. The following de\ufb01nition provides one natural such de\ufb01nition, and is given in\nterms of a distribution D from which examples are drawn, and a speci\ufb01c metric, dist(\u00b7,\u00b7) in terms of\nwhich the magnitude of perturbations will be measured.\nDe\ufb01nition 1. A function f : Rd \u2192 R is said to be (\u03b5, \u03b4, \u03b3) robust to adversarial perturbations for a\ndistribution D over Rd with respect to a distance metric dist : Rd \u00d7 Rd \u2192 R if, for a point x drawn\naccording to D, the probability that there exists x(cid:48) such that dist(x, x(cid:48)) \u2264 \u03b5 and |f (x) \u2212 f (x(cid:48))| \u2265 \u03b4,\nis bounded by \u03b3. Formally,\n\n[\u2203x(cid:48) s.t. dist(x, x(cid:48)) \u2264 \u03b5 and |f (x) \u2212 f (x(cid:48))| \u2265 \u03b4] \u2264 \u03b3.\n\nPr\nx\u223cD\n\nIn the case that the function f is a binary classi\ufb01er, if f is (\u03b5, 1, \u03b3) robust with respect to the\ndistribution D of examples and a distance metric d, then even if adversarial perturbations of magnitude\n\u03b5 are allowed, the classi\ufb01cation accuracy of f can suffer by at most \u03b3.\nOur approach will be easier to describe, and more intuitive, when viewed as a method for assigning\nfeature values to an entire dataset. Here the goal is to map each datapoint to a feature value (or set of\nvalues), which is robust to perturbations of the points in the dataset. Given a dataset X consisting of\nn points in Rd, we desire a function F that takes as input X, and outputs a vector F (X) \u2208 Rn; such\na function F is robust for a dataset X if, for all X(cid:48) obtained by perturbing points in X, F (X) and\nF (X(cid:48)) are close.\nFormally, let X be the set of all datasets consisting of n points in Rd, and (cid:107)\u00b7(cid:107) denote the (cid:96)2 norm.\nFor notational convenience, we will use FX and F (X) interchangeably, and use FX (x) to denote\nthe feature value F associates with a point x \u2208 X. We overload dist(\u00b7,\u00b7) to de\ufb01ne distance\nbetween two ordered sets X = (x1, x2, . . . , xn) and X(cid:48) = (x(cid:48)\nn) as dist(X, X(cid:48)) =\nmaxi\u2208[n] dist(xi, x(cid:48)\n\ni). With these notations in place, we de\ufb01ne a robust function as follows:\n\n1, x(cid:48)\n\n2, . . . , x(cid:48)\n\n2\n\n\fDe\ufb01nition 2. A function F : X \u2192 Rn is said to be (\u03b5, \u03b4) robust to adversarial perturbations for a\ndataset X with respect to a distance metric dist(\u00b7,\u00b7) as de\ufb01ned above, if, for all datasets X(cid:48) such\nthat dist(X, X(cid:48)) \u2264 \u0001, (cid:107)F (X) \u2212 F (X(cid:48))(cid:107) \u2264 \u03b4.\n\npoints, say,(cid:80)\n\nn if we were to perturb all points in X by at most \u0001.\n\nIf a function F is (\u0001, \u03b4) robust for a dataset X, it implies that feature values of 99% of the points in\nX will not vary by more than 10\u03b4\u221a\nAs in the case of robust functions of single datapoints, to preclude the possibility of some trivial\nfunctions we require F to satisfy certain conditions: 1) FX should have signi\ufb01cant variance across\ni(FX (xi) \u2212 Ex\u223cU nif (X)[FX (x)])2 = 1. 2) Changing the order of points in dataset X\nshould not change FX, that is, for any permutation \u03c3 : Rn \u2192 Rn, F\u03c3(X) = \u03c3(FX ). Given a data\ndistribution D, and a threshold \u0001, the goal will be to \ufb01nd a function F that is as robust as possible, in\nexpectation, for a dataset X drawn from D.\nWe mainly follow De\ufb01nition 2 throughout the paper as the ideas behind our proposed features follow\nmore naturally under that de\ufb01nition. However, we brie\ufb02y discuss how to extend these ideas to come\nup with robust features of single datapoints (De\ufb01nition 1) in section 2.1.\n\n1.2 Summary of Results\n\nIn Section 2, we describe an approach to constructing features using spectral properties of an\nappropriately de\ufb01ned graph associated with a dataset in question. We show provable bounds for\nthe adversarial robustness of these features. We also show a synthetic setting in which some of the\nexisting models such as neural networks, and nearest-neighbor classi\ufb01er are known to be vulnerable\nto adversarial perturbations, while our approach provably works well. In Section 3, we show a lower\nbound which, in certain parameter regimes, implies that if our spectral features are not robust, then\nno robust features exist. The lower bound suggests a fundamental connection between the spectral\nproperties of the graph obtained from the dataset, and the inherent extent to which the data supports\nadversarial robustness. To explore this connection further, in Section 5, we show empirically that\nspectral properties do correlate with adversarial robustness. In Section 5, we also test our adversarial\nfeatures on the downstream task of classi\ufb01cation on adversarial images, and obtain positive results.\nDue to space constraints, we have deferred all the proofs to the supplementary material.\n\n1.3 Shortcomings and Future Work\n\nOur theory and empirics indicate that there may be fundamental connections between spectral\nproperties of graphs associated with data and the inherent robustness to adversarial examples. A\nworthwhile future direction is to further clarify this connection, as it may prove illuminating and\nfruitful. Looking at the easier problem of \ufb01nding adversarial features also presents the opportunity\nof developing interesting sample-complexity results for security against adversarial attacks. Such\nresults may be much more dif\ufb01cult to prove for the problem of adversarially robust classi\ufb01cation,\nsince generalization is not well understood (even in the non-adversarial setting) for classi\ufb01cation\nmodels such as neural networks.\nOur current approach involves computing distances between all pair of points, and performing an\neigenvector computation on a Laplacian matrix of a graph generated using these distances. Both\nof these steps are computationally expensive operations, and future work could address improving\nthe ef\ufb01ciency of our approach. In particular, it seems likely that similar spectral features can be\napproximated without computing all the pairwise distances, which would result in signi\ufb01cant speed-\nup. We also note that our experiments for testing our features on downstream classi\ufb01cation tasks on\nadversarial data is based on transfer attacks, and it may be possible to degrade this performance using\nstronger attacks. The main takeaway from this experiment is that our conceptually simple features\nalong with a linear classi\ufb01er is able to give competitive results for reasonable strong attacks. Future\nworks can possibly explore using robustly trained models on top of these spectral features, or using a\nspectral approach to distill the middle layers of neural networks.\n\n1.4 Related Work\n\nOne of the very \ufb01rst methods proposed to defend against adversarial examples was adversarial\ntraining using the fast gradient sign method (FGSM) [Goodfellow et al., 2014], which involves taking\n\n3\n\n\fa step in the direction of the gradient of loss with respect to data, to generate adversarial examples,\nand training models on these examples. Later, Madry et al. [2017] proposed a stronger projected\ngradient descent (PGD) training which essentially involves taking multiple steps in the direction of\nthe gradient to generate adversarial examples followed by training on these examples. More recently,\nKolter and Wong [2017], Raghunathan et al. [2018], and Sinha et al. [2017] have also made progress\ntowards training provably adversarially robust models. There have also been efforts towards proving\nlower bounds on the adversarial accuracy of neural networks, and using these lower bounds to train\nrobust models [Hein and Andriushchenko, 2017, Peck et al., 2017]. Most prior work addresses the\nquestion of how to \ufb01x the adversarial examples problem, and there is less work on identifying why\nthis problem occurs in the \ufb01rst place, or highlighting which geometric properties of datasets make\nthem vulnerable to adversarial attacks. Two recent works speci\ufb01cally address the \u201cwhy\u201d question:\nFawzi et al. [2018] give lower bounds on robustness given a speci\ufb01c generative model of the data,\nand Schmidt et al. [2018] and Bubeck et al. [2018] describe settings in which limited computation or\ndata are the primary bottleneck to \ufb01nding a robust classi\ufb01er. In this work, by considering the simpler\ntask of coming up with robust features, we provide a different perspective on both the questions of\n\u201cwhy\u201d adversarial perturbations are effective, and \u201chow\u201d to ensure robustness to such attacks.\n\n1.5 Background: Spectral Graph Theory\n\nLet G = (V (G), E(G)) be an undirected, possibly weighted graph, where for notational simplicity\nV (G) = {1, . . . , n}. Let A = (aij) be the adjacency matrix of G, and D be the the diagonal matrix\nwhose ith diagonal entry is the sum of edge weights incident to vertex i. The matrix L = D \u2212 A\nis called the Laplacian matrix of the graph G. The quadratic form, and hence the eigenvalues and\neigenvectors, of L carry a great deal of information about G. For example, for any v \u2208 Rn, we have\n\n(cid:88)\n\nvT Lv =\n\n(i,j)\u2208E(G)\n\naij(vi \u2212 vj)2.\n\nIt is immediately apparent that L has at least one eigenvalue of 0: the vector v1 = (1, 1, . . . , 1)\nsatis\ufb01es vT Lv = 0. Further, the second (unit) eigenvector is the solution to the minimization problem\n\n(cid:88)\n\nmin\n\nv\n\n(i,j)\u2208E\n\ns.t. (cid:88)\n\naij(vi \u2212 vj)2\n\n(cid:88)\n\nvi = 0;\n\nv2\ni = 1.\n\ni\n\ni\n\nIn other words, the second eigenvector assigns values to the vertices such that the average value is 0,\nthe variance of the values across the vertices is 1, and among such assignments, minimizes the sum\nof the squares of the discrepancies between neighbors. For example, in the case that the graph has\ntwo (or more) connected components, this second eigenvalue is 0, and the resulting eigenvector is\nconstant on each connected component.\nOur original motivation for this work is the observation that, at least super\ufb01cially, this characterization\nof the second eigenvector sounds similar in spirit to a characterization of a robust feature: here,\nneighboring vertices should have similar value, and for robust features, close points should be mapped\nto similar values. The crucial question then is how to formalize this connection. Speci\ufb01cally, is there a\nway to construct a graph such that the neighborhood structure of the graph captures the neighborhood\nof datapoints with respect to the metric in question? We outline one such construction in Section 2.\nWe will also consider the normalized or scaled Laplacian, which is de\ufb01ned by\nLnorm = D\u22121/2(D \u2212 A)D\u22121/2 = I \u2212 D\u22121/2AD\u22121/2.\n\nThe scaled Laplacian normalizes the entries of L by the total edge weights incident to each vertex, so\nthat highly-irregular graphs do not have peculiar behavior. For more background on spectral graph\ntheory, we refer the readers to Spielman [2007] and Chung [1997].\n\n2 Robust Features\n\nIn this section, we describe a construction of robust features, and prove bounds on their robustness.\nLet X = (x1, . . . , xn) be our dataset, and let \u03b5 > 0 be a threshold for attacks. We construct a robust\nfeature FX using the second eigenvector of the Laplacian of a graph corresponding to X, de\ufb01ned in\nterms of the metric in question. Formally, given the dataset X, and a distance threshold parameter\nT > 0 which possibly depends on \u03b5, we de\ufb01ne FX as follows:\n\n4\n\n\fvariance across points, namely,(cid:80)\n\ni(FX (xi) \u2212 Ex\u223cU nif (X)[FX (x)])2 = 1, since(cid:80)\n\nDe\ufb01ne G(X) to be the graph whose nodes correspond to points in X, i.e., {x1, . . . , xn}, and for which\nthere is an edge between nodes xi and xj, if dist(xi, xj) \u2264 T . Let L(X) be the (un-normalized)\nLaplacian of G(X), and let \u03bbk(X) and vk(X) be its kth smallest eigenvalue and a corresponding\nunit eigenvector. In all our constructions, we assume that the \ufb01rst eigenvector v1(X) is set to be the\nunit vector proportional to the all-ones vector. Now de\ufb01ne FX (xi) = v2(X)i; i.e. the component\nof v2(X) corresponding to xi. Note that FX de\ufb01ned this way satis\ufb01es the requirement of suf\ufb01cient\ni v2(X)i = 0\nand (cid:107)v2(X)(cid:107) = 1.\nWe now give robustness bounds for this choice of feature FX. To do this, we will need slightly\nmore notation. For a \ufb01xed \u03b5 > 0, de\ufb01ne the graph G+(X) to be the graph with the same\nnodes as G(X), except that the threshold for an edge is T + 2\u03b5 instead of T . Formally, in\nG+(X), there is an edge between xi and xj if dist(xi, xj) \u2264 T + 2\u0001. Similarly, de\ufb01ne G\u2212(X)\nto be the graph with same set of nodes, with the threshold for an edge being T \u2212 2\u03b5. De\ufb01ne\nL+(X), \u03bb+\nk (X) analogously to the earlier de\ufb01nitions. In the fol-\nlowing theorem, we give robustness bounds on the function F as de\ufb01ned above.\nTheorem 1. For any pair of datasets X and X(cid:48), such that dist(X, X(cid:48)) \u2264 \u03b5, the function F : Xn \u2192\nRn obtained using the second eigenvector of the Laplacian as de\ufb01ned above satis\ufb01es\n2 (X) \u2212 \u03bb\u2212\n\u03bb+\n3 (X) \u2212 \u03bb\u2212\n\u03bb\u2212\n\nmin((cid:107)F (X) \u2212 F (X(cid:48))(cid:107),(cid:107)(\u2212F (X)) \u2212 (F (X(cid:48)))(cid:107)) \u2264(cid:16)\n\nk (X), L\u2212(X), \u03bb\u2212\n\nk (X), v\u2212\n\n(cid:17)(cid:115)\n\n2 (X)\n2 (X)\n\nk (X), v+\n\n\u221a\n2\n\n2\n\n.\n\n2 = \u03bb\u2212\n\nTheorem 1 essentially guarantees that the features, as de\ufb01ned above, are robust up to sign-\ufb02ip, as\nlong as the eigengap between the second and third eigenvalues is large, and the second eigenvalue\ndoes not change signi\ufb01cantly if we slightly perturb the distance threshold used to determine whether\nan edge exists in the graph in question. Note that \ufb02ipping signs of the feature values of all points\nin a dataset (including training data) does not change the classi\ufb01cation problem for most common\nclassi\ufb01ers. For instance, if there exists a linear classi\ufb01er that \ufb01ts points with features FX well, then a\nlinear classi\ufb01er can \ufb01t points with features \u2212FX equally well. So, up to sign \ufb02ip, the function F is\n(\u03b5, \u03b4X ) robust for dataset X, where \u03b4X corresponds to the bound given in Theorem 1.\nTo understand this bound better, we discuss a toy example. Consider a dataset X that consists of two\nclusters with the property that the distance between any two points in the same cluster is at most\n4\u0001, and the distance between any two points in different clusters is at least 10\u0001. Graph G(X) with\nthreshold T = 6\u0001, will have exactly two connected components. Note that v2(X) will perfectly\nseparate the two connected components with v2(X)i being 1\u221a\nn if i belongs to component 1, and\n\u22121\u221a\nn otherwise. In this simple case, we conclude immediately that FX is perfectly robust: perturbing\npoints by \u0001 cannot change the connected component any point is identi\ufb01ed with. Indeed, this agrees\nwith Theorem 1: \u03bb+\nNext, we brie\ufb02y sketch the idea behind the proof of Theorem 1. Consider the second eigenvector\nv2(X(cid:48)) of the Laplacian of the graph G(X(cid:48)) where dataset X(cid:48) is obtained by perturbing points in\nX. We argue that this eigenvector can not be too far from v\u2212\n2 (X). For the sake of contradiction,\nconsider the extreme case where v2(X(cid:48)) is orthogonal to v\u2212\n2 (X). If the gap between the second and\nthird eigenvalue of G\u2212(X) is large, and the difference between \u03bb2(X(cid:48)) and \u03bb\u2212\n2 (X) is small, then by\nreplacing v\u2212\n3 (X) with v2(X(cid:48)) as the third eigenvector of G\u2212(X), we get a much smaller value for\n\u03bb\u2212\n3 (X), which is not possible. Hence, we show that the two eigenvectors in consideration can not be\northogonal. The proof of the theorem extends this argument to show that v2(X(cid:48)), and v\u2212\n2 (X) need\nto be close if we have a large eigengap for G\u2212(X), and a small gap between \u03bb2(X(cid:48)) and \u03bb\u2212\n2 (X).\nUsing a similar argument, one can show that v2(X) and v\u2212\n2 (X), also need to be close. Applying the\ntriangle inequality, we get that v2(X) and v2(X(cid:48)) are close. Also, since we do not have any control\nover \u03bb2(X(cid:48)), we use an upper bound on it given by \u03bb2(X+), and state our result in terms of the gap\nbetween \u03bb2(X +) and \u03bb2(X\u2212).\nThe approach described above also naturally yields a construction of a set of robust features by\nconsidering the higher eigenvectors of Laplacian. We de\ufb01ne the ith feature vector for a dataset X\nX = vi+1(X). As the eigenvectors of a symmetric matrix are orthogonal, this gives us a set\nas F i\nof k diverse feature vectors {F 1\nX (x))T be a\n\n2 = 0 since the two clusters are at a distance > 10\u03b5.\n\nX}. Let FX (x) = (F 1\n\nX (x), . . . , F k\n\nX , . . . , F k\n\nX (x), F k\n\nX , F 2\n\n5\n\n\fk-dimensional column vector denoting the feature values for point x \u2208 X. In the following theorem,\nwe give robustness bounds on these feature vectors.\nTheorem 2. For any pair of datasets X and X(cid:48), such that dist(X, X(cid:48)) \u2264 \u03b5, there exists a k \u00d7 k\ninvertible matrix M, such that the features FX and FX(cid:48) as de\ufb01ned above satisfy\nk+1(X) \u2212 \u03bb\u2212\n\u03bb+\nk+2(X) \u2212 \u03bb\u2212\n\u03bb\u2212\n\ni)(cid:107)2 \u2264(cid:16)\n\n(cid:107)M FX (xi) \u2212 FX(cid:48)(x(cid:48)\n\n(cid:115)(cid:88)\n\n(cid:17)(cid:115)\n\n2 (X)\n2 (X)\n\n\u221a\n\n2k\n\n2\n\ni\u2208[n]\n\nTheorem 2 is a generalization of Theorem 1, and gives a bound on the robustness of feature vectors\nFX up to linear transformations. Note that applying an invertible linear transformation to all the\npoints in a dataset (including training data) does not alter the classi\ufb01cation problem for models\ninvariant under linear transformations. For instance, if there exists a binary linear classi\ufb01er given\nby vector w, such that sign(wT FX (x)) corresponds to the true label for point x, then the classi\ufb01er\ngiven by (M\u22121)T w assigns the correct label to linearly transformed feature vector M FX (x).\n\n2.1 Extending a Feature to New Points\n\nIn the previous section, we discussed how to get robust features for points in a dataset. In this section,\nwe brie\ufb02y describe an extension of that approach to get robust features for points outside the dataset,\nas in De\ufb01nition 1.\nLet X = {x1, . . . , xn} \u2282 Rd be the training dataset drawn from some underlying distribution D\nover Rd. We use X as a reference to construct a robust function fX : Rd \u2192 R. For the sake of\nconvenience, we drop the subscript X from fX in the case where the dataset in question is clear.\nGiven a point x \u2208 Rd, and a distance threshold parameter T > 0, we de\ufb01ne f (x) as follows:\nDe\ufb01ne G(X) and G(x) to be graphs whose nodes are points in dataset X, and {x} \u222a X = {x0 =\nx, x1, . . . , xn} respectively, and for which there is an edge between nodes xi and xj, if dist(xi, xj) \u2264\nT . Let L(x) be the Laplacian of G(x), and let \u03bbk(x) and vk(x) be its kth smallest eigenvalue and\na corresponding unit eigenvector. Similarly, de\ufb01ne L(X), \u03bbk(X) and vk(X) for G(X). In all\nour constructions, we assume that the \ufb01rst eigenvectors v1(X) and v1(x) are set to be the unit\nvector proportional to the all-ones vector. Now de\ufb01ne f (x) = v2(x)0; i.e. the component of v2(x)\ncorresponding to x0 = x. Note that the eigenvector v2(x) has to be picked \u201cconsistently\u201d to avoid\nsign\ufb02ips in f as \u2212v2(x) is also a valid eigenvector. To resolve this, we select the eigenvector v2(x)\nto be the eigenvector (with eigenvalue \u03bb2(x)) whose last |X| entries has the maximum inner product\nwith v2(X).\nWe now state a robustness bound for this feature f as per De\ufb01nition 1. For a \ufb01xed \u03b5 > 0 de\ufb01ne the\ngraph G+(x) to be the graph with the same nodes and edges of G(x), except that the threshold for\nx0 = x is T + \u03b5 instead of T . Formally, in G+(x), there is an edge between xi and xj if:\n\n(a) i = 0 or j = 0, and dist(xi, xj) \u2264 T + \u03b5; or\n(b) i > 0 and j > 0, and dist(xi, xj) \u2264 T .\n\nk , v+\n\nk , v\u2212\n\nk , L\u2212, \u03bb\u2212\n\nSimilarly, de\ufb01ne G\u2212(x) to be the same graph with T + \u03b5 replaced with T \u2212 \u03b5. De\ufb01ne\nL+, \u03bb+\nk analogously to the earlier de\ufb01nitions. In the following theorem, we give a\nrobustness bound on the function f as de\ufb01ned above.\nTheorem 3. For a suf\ufb01ciently large training set size n, if EX\u223cD\nsome small enough constant c, then with probability 0.95 over the choice of X, the function fX :\nRd \u2192 R as de\ufb01ned above satis\ufb01es Prx\u223cD[\u2203x(cid:48) s.t. dist(x, x(cid:48)) \u2264 \u03b5 and |fX (x) \u2212 fX (x(cid:48))| \u2265 \u03b4x] \u2264\n0.05, for\n\n\u22121(cid:105) \u2264 c for\n\n(\u03bb3(X) \u2212 \u03bb2(X))\n\n(cid:104)\n\nThis also implies that with probability 0.95 over the choice of X, fX is (\u0001, 20 Ex\u223cD[\u03b4x], 0.1) robust\nas per De\ufb01nition 1.\n\nThis bound is very similar to bound obtained in Theorem 1, and says that the function f is robust,\nas long as the eigengap between the second and third eigenvalues is suf\ufb01ciently large for G(X) and\n\n6\n\n(cid:16)\n\n\u221a\n\n(cid:17)(cid:115)\n\n\u03b4x =\n\n6\n\n2\n\n2 (x) \u2212 \u03bb\u2212\n\u03bb+\n3 (x) \u2212 \u03bb\u2212\n\u03bb\u2212\n\n2 (x)\n2 (x)\n\n.\n\n\fG\u2212(x), and the second eigenvalue does not change signi\ufb01cantly if we slightly perturb the distance\nthreshold used to determine whether an edge exists in the graph in question. Similarly, one can\nalso obtain a set of k features, by taking the \ufb01rst k eigenvectors of G(X) prepended with zero, and\nprojecting them onto the bottom-k eigenspace of G(x).\n\n3 A Lower Bound on Adversarial Robustness\n\nIn this section, we show that spectral properties yield a lower bound on the robustness of any function\non a dataset. We show that if there exists an (\u03b5, \u03b4) robust function F (cid:48) on dataset X, then the spectral\napproach (with appropriately chosen threshold), will yield an (\u03b5(cid:48), \u03b4(cid:48)) robust function, where the\nrelationship between \u03b5, \u03b4 and \u03b5(cid:48), \u03b4(cid:48) is governed by easily computable properties of the dataset, X.\nThis immediately provides a way of establishing a bound on the best possible robustness that dataset\nX could permit for perturbations of magnitude \u03b5. Furthermore it suggests that the spectral properties\nof the neighborhood graphs we consider, may be inherently related to the robustness that a dataset\nallows. We now formally state our lower bound:\nTheorem 4. Assume that there exists some (\u03b5, \u03b4) robust function F \u2217 for the dataset X (not necessarily\nconstructed via the spectral approach). For any threshold T , let GT be the graph obtained on X by\nthresholding at T . Let dT be the maximum degree of GT . Then the feature F returned by the spectral\napproach on the graph G2\u03b5/3 is at least (\u03b5/6, \u03b4(cid:48)) robust (up to sign), for\n\n(cid:115)\n\n\u03b4(cid:48) = \u03b4\n\n8(d\u03b5 + 1)\n\n\u03bb3(G\u03b5/3) \u2212 \u03bb2(G\u03b5/3)\n\n.\n\nThe bound gives reasonable guarantees when the degree is small and the spectral gap is large. To\nproduce meaningful bounds, the neighborhood graph must have some structure at the threshold\nin question; in many practical settings, this would require an extremely large dataset, and hence\nthis bound is mainly of theoretical interest at this point. Still, our experimental results in Section 5\nempirically validate the hypothesis that spectral properties have implications for the robustness of\nany model: we show that the robustness of an adversarially trained neural network on different data\ndistributions correlates with the spectral properties of the distribution.\n\n4 Synthetic Setting: Adversarial Spheres\n\n\u221a\n\n2. In particular, the median distance between two such points is exactly r\n\nGilmer et al. [2018] devise a situation in which they are able to show in theory that training adversari-\nally robust models is dif\ufb01cult. The authors describe the \u201cconcentric spheres dataset\u201d, which consists\nof\u2014as the name suggests\u2014two concentric d-dimensional spheres, one of radius 1 and one of radius\nR > 1. The authors then argue that any classi\ufb01er that misclassi\ufb01es even a small fraction of the inner\nsphere will have a signi\ufb01cant drop in adversarial robustness.\nWe argue that our method, in fact, yields a near-perfect classi\ufb01er\u2014one that makes almost no errors\non natural or adversarial examples\u2014even when trained on a modest amount of data. To see this,\nconsider a sample of 2N training points from the dataset, N from the inner sphere and N from the\nouter sphere. Observe that the distance between two uniformly chosen points on a sphere of radius\n2, and\nr is close to r\nwith high probability for large d, the distance will be within some small radius \u03b5 of r\n2. Thus, for\ndistance threshold\n2 + 2\u03b5, after adding a new test point to the training data, we will get a graph\nwith large clique corresponding to the inner sphere, and isolated points on the outer sphere, with\nhigh probability. This structure doesn\u2019t change by perturbing the test point by \u0001, resulting in a robust\nclassi\ufb01er. We now formalize this intuition.\nLet the inner sphere be of radius one, and outer sphere be of some constant radius R > 1. Let\n\u03b5 = (R \u2212 1)/8 be the radius of possible perturbations. Then we can state the following:\nTheorem 5. Pick initial distance threshold T =\n2 + 2\u03b5 in the (cid:96)2 norm, and use the \ufb01rst N + 1\neigenvectors as proposed in Section 2.1 to construct a (N + 1)-dimensional feature map f : Rd \u2192\nRN +1. Then with probability at least 1 \u2212 N 2e\u2212\u2126(d) over the random choice of training set, f maps\nthe entire inner sphere to the same point, and the entire outer sphere to some other point, except for a\n\u03b3-fraction of both spheres, where \u03b3 = N e\u2212\u2126(d). In particular, f is (\u03b5, 0, \u03b3)-robust.\n\n\u221a\n\n\u221a\n\n\u221a\n\n\u221a\n\n7\n\n\fFigure 1: Comparison of performance on ad-\nversarially perturbed MNIST data .\n\nFigure 2: Performance on adversarial data vs\nour upper bound.\n\nThe extremely nice form of the constructed feature f in this case means that, if we use half of the\ntraining set to get the feature map f, and the other half to train a linear classi\ufb01er (or, indeed, any\nnontrivial model at all) trained on top of this feature, this will yield a near-perfect classi\ufb01er even\nagainst adversarial attacks. The adversarial spheres example is a case in which our method allows us\nto make a robust classi\ufb01er, but other common methods do not. For example, nearest-neighbors will\nfail at classifying the outer sphere (since points on the outer sphere are generally closer to points on\nthe inner sphere than to other points on the outer sphere), and Gilmer et al. [2018] demonstrate in\npractice that training adversarially robust models on the concentric spheres dataset using standard\nneural network architectures is extremely dif\ufb01cult when the dimension d grows large.\n\n5 Experiments\n\n5.1\n\nImage Classi\ufb01cation: The MNIST Dataset\n\nWhile the main focus of our work is to improve the conceptual understanding of adversarial robustness,\nwe also perform experiments on the MNIST dataset. We test the ef\ufb01cacy of our features by evaluating\nthem on the downstream task of classifying adversarial images. We used a subset of MNIST dataset,\nwhich is commonly used in discussions of adversarial examples [Goodfellow et al., 2014, Szegedy\net al., 2013, Madry et al., 2017]. Our dataset has 11,000 images of hand written digits from zero\nto nine, of which 10,000 images are used for training, and rest for test. We compare three different\nmodels, the speci\ufb01cs of which are given below:\nRobust neural network (pgd-nn): We consider a fully connected neural network with one hidden\nlayer having 200 units, with ReLU non-linearity, and cross-entropy loss. We use PyTorch imple-\nmentation of Adam [Kingma and Ba, 2014] for optimization with a step size of 0.001. To obtain a\nrobust neural network, we generate adversarial examples using projected gradient descent for each\nmini-batch, and train our model on these examples. For projected gradient descent, we use a step size\nof 0.1 for 40 iterations.\nSpectral features obtained using scaled Laplacian, and linear classi\ufb01er (unweighted-laplacian-\nlinear): We use the (cid:96)2 norm as a distance metric, and distance threshold T = 9 to construct a\ngraph on all 11,000 data points. Since the distances between training points are highly irregular,\nour constructed graph is also highly irregular; thus, we use the scaled Laplacian to construct our\nfeatures. Our features are obtained from the 20 eigenvectors corresponding to \u03bb2 to \u03bb21. Thus each\nimage is mapped to a feature vector in R20. On top of these features, we use a linear classi\ufb01er with\ncross-entropy loss for classi\ufb01cation. We train the linear classi\ufb01er using 10,000 images, and test it on\n1,000 images obtained by adversarially perturbing test images.\nSpectral features obtained using scaled Laplacian with weighted edges, and linear classi\ufb01er\n(weighted-laplacian-linear): This is similar to the previous model, with the only difference being\nthe way in which the graph is constructed. Instead of using a \ufb01xed threshold, we have weighted\nedges between all pairs of images, with the weight on the edge between image i and j being\n\n8\n\n\f(cid:16)\u22120.1(cid:107)xi \u2212 xj(cid:107)2\n\n(cid:17)\n\n2\n\n. As before, we use 20 eigenvectors corresponding to the scaled Laplacian of\n\nexp\nthis graph, with a linear classi\ufb01er for classi\ufb01cation.\nNote that generating our features involve computing distances between all pair of images, followed by\nan eigenvector computation. Therefore, \ufb01nding the gradient (with respect to the image coordinates)\nof classi\ufb01ers built on top of these features is computationally extremely expensive. As previous works\n[Papernot et al., 2016] have shown that transfer attacks can successfully fool many different models,\nwe use transfer attacks using adversarial images corresponding to robust neural networks (pgd-nn).\nThe performance of these models on adversarial data is shown in \ufb01gure 1. We observe that weighted-\nlaplacian-linear performs better than pgd-nn on large enough perturbations. Note that it is possible\nthat robustly trained deep convolutional neural nets perform better than our model. It is also possible\nthat the performance of our models may deteriorate with stronger attacks. Still, our conceptually\nsimple features, with just a linear classi\ufb01er on top, are able to give competitive results against\nreasonably strong adversaries. It is possible that training robust neural networks on top of these\nfeatures, or using such features for the middle layers of neural nets may give signi\ufb01cantly more robust\nmodels. Therefore, our experiments should be considered mainly as a proof of concept, indicating\nthat spectral features may be a useful tool in one\u2019s toolkit for adversarial robustness.\nWe also observe that features from weighted graphs perform better than their unweighted counterpart.\nThis is likely because the weighted graph contains more information about the distances, while most\nof this information is lost via thresholding in the unweighted graph.\n5.2 Connection Between Spectral Properties and Robustness\n\n(\u03bb+\n\n2 \u2212 \u03bb\u2212\n\n2 )/(\u03bb\u2212\n\n(cid:113)\n\n3 \u2212 \u03bb\u2212\n\nWe hypothesize that the spectral properties of the graph associated with a dataset has fundamental\nconnections with its adversarial robustness. The lower bound shown in section 3 sheds some more\nlight on this connection. In Theorem 1, we show that adversarial robustness is proportional to\n2 ). To study this connection empirically, we created 45 datasets correspond-\ning to each pair of digits in MNIST. As we expect some pairs of digits to be less robust to adversarial\nperturbations than others, we compare our spectral bounds for these various datasets, to their observed\nadversarial accuracies.\nSetup: The dataset for each pair of digits has 5000 data points, with 4000 points used as the training\nset, and 1000 points used as the test set. Similarly to the previous subsection, we trained robust neural\nnets on these datasets. We considered fully connected neural nets with one hidden layer having 50\nunits, with ReLU non-linearity, and cross-entropy loss. For each mini-batch, we generated adversarial\nexamples using projected gradient descent with a step size of 0.2 for 20 iterations, and trained the\nneural net on these examples. Finally, to test this model, we generated adversarial perturbations of\nsize 1 in (cid:96)2 norm to obtain the adversarial accuracy for all 45 datasets.\nTo get a bound for each dataset X, we generated two graphs G\u2212(X), and G+(X) with all 5000\npoints (not involving adversarial data). We use the (cid:96)2 norm as a distance metric. The distance\nthreshold T for G\u2212(X) is set to be the smallest value such that each node has degree at least one,\nand the threshold for G+(X) is two more than that of G\u2212(X). We calculated the eigenvalues of the\nscaled Laplacians of these graphs to obtain our theoretical bounds.\nObservations: As shown in Figure 2, we observe some correlation between our upper bounds and\nthe empirical adversarial robustness of the datasets. Each dataset is represented by a point in Figure 2,\nwhere the x-axis is proportional to our bound, and the y-axis indicates the zero-one loss of the neural\nnets on adversarial examples generated from that dataset. The correlation is 0.52 after removing the\nright-most outlier. While this correlation is not too strong, it suggests some connection between our\nspectral bounds on the robustness and the empirical robustness of certain attack/defense heuristics.\n\n6 Conclusion\nWe considered the task of learning adversarially robust features as a simpli\ufb01cation of the more\ncommon goal of learning adversarially robust classi\ufb01ers. We showed that this task has a natural\nconnection to spectral graph theory, and that spectral properties of a graph associated to the underlying\ndata have implications for the robustness of any feature learned on the data. We believe that exploring\nthis simpler task of learning robust features, and further developing the connections to spectral graph\ntheory, are promising steps towards the end goal of building robust machine learning models.\n\n9\n\n\fAcknowledgments: This work was supported by NSF awards CCF-1704417 and 1813049, and an\nONR Young Investigator Award (N00014-18-1-2295).\n\nReferences\nAnish Athalye and Ilya Sutskever. Synthesizing robust adversarial examples. arXiv preprint\n\narXiv:1707.07397, 2017.\n\nAnish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of\nsecurity: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.\n\nTB Brown, D Man\u00e9, A Roy, M Abadi, and J Gilmer. Adversarial patch. arxiv e-prints (dec. 2017).\n\narXiv preprint cs.CV/1712.09665, 2017.\n\nS\u00e9bastien Bubeck, Eric Price, and Ilya Razenshteyn. Adversarial examples from computational\n\nconstraints. arXiv preprint arXiv:1805.10204, 2018.\n\nFan RK Chung. Spectral graph theory. Number 92. American Mathematical Soc., 1997.\n\nIvan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir\nRahmati, and Dawn Song. Robust physical-world attacks on machine learning models. arXiv\npreprint arXiv:1707.08945, 2017.\n\nAlhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classi\ufb01er. arXiv\n\npreprint arXiv:1802.08686, 2018.\n\nJustin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wattenberg,\n\nand Ian Goodfellow. Adversarial spheres. arXiv preprint arXiv:1801.02774, 2018.\n\nIan J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial\n\nexamples. arXiv preprint arXiv:1412.6572, 2014.\n\nMatthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classi\ufb01er\nagainst adversarial manipulation. In Advances in Neural Information Processing Systems, pages\n2263\u20132273, 2017.\n\nDiederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\nJ Zico Kolter and Eric Wong. Provable defenses against adversarial examples via the convex outer\n\nadversarial polytope. arXiv preprint arXiv:1711.00851, 2017.\n\nAlexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world.\n\narXiv preprint arXiv:1607.02533, 2016.\n\nAleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\nTowards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083,\n2017.\n\nNicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from\nphenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277,\n2016.\n\nJonathan Peck, Joris Roels, Bart Goossens, and Yvan Saeys. Lower bounds on the robustness to\nadversarial perturbations. In Advances in Neural Information Processing Systems, pages 804\u2013813,\n2017.\n\nAditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial\n\nexamples. arXiv preprint arXiv:1801.09344, 2018.\n\nLudwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander M \u02dbadry. Adver-\n\nsarially robust generalization requires more data. arXiv preprint arXiv:1804.11285, 2018.\n\nAman Sinha, Hongseok Namkoong, and John Duchi. Certi\ufb01able distributional robustness with\n\nprincipled adversarial training. arXiv preprint arXiv:1710.10571, 2017.\n\n10\n\n\fDaniel A Spielman. Spectral graph theory and its applications. In Foundations of Computer Science,\n\n2007. FOCS\u201907. 48th Annual IEEE Symposium on, pages 29\u201338. IEEE, 2007.\n\nChristian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow,\nand Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.\n\n11\n\n\f", "award": [], "sourceid": 6514, "authors": [{"given_name": "Shivam", "family_name": "Garg", "institution": "Stanford University"}, {"given_name": "Vatsal", "family_name": "Sharan", "institution": "Stanford University"}, {"given_name": "Brian", "family_name": "Zhang", "institution": "Stanford University"}, {"given_name": "Gregory", "family_name": "Valiant", "institution": "Stanford University"}]}