{"title": "Approximate Feature Collisions in Neural Nets", "book": "Advances in Neural Information Processing Systems", "page_first": 15842, "page_last": 15850, "abstract": "Work on adversarial examples has shown that neural nets are surprisingly sensitive to adversarially chosen changes of small magnitude. In this paper, we show the opposite: neural nets could be surprisingly insensitive to adversarially chosen changes of large magnitude. We observe that this phenomenon can arise from the intrinsic properties of the ReLU activation function. As a result, two very different examples could share the same feature activation and therefore the same classification decision. We refer to this phenomenon as feature collision and the corresponding examples as colliding examples. We find that colliding examples are quite abundant: we empirically demonstrate the existence of polytopes of approximately colliding examples in the neighbourhood of practically any example.", "full_text": "Approximate Feature Collisions in Neural Nets\n\nKe Li\u21e4\n\nUC Berkeley\n\nke.li@eecs.berkeley.edu\n\nTianhao Zhang\u21e4\nNanjing University\n\nbryanzhang@smail.nju.edu.cn\n\nJitendra Malik\nUC Berkeley\n\nmalik@eecs.berkeley.edu\n\nAbstract\n\nWork on adversarial examples has shown that neural nets are surprisingly sensitive\nto adversarially chosen changes of small magnitude. In this paper, we show the op-\nposite: neural nets could be surprisingly insensitive to adversarially chosen changes\nof large magnitude. We observe that this phenomenon can arise from the intrinsic\nproperties of the ReLU activation function. As a result, two very different exam-\nples could share the same feature activation and therefore the same classi\ufb01cation\ndecision. We refer to this phenomenon as feature collision and the correspond-\ning examples as colliding examples. We \ufb01nd that colliding examples are quite\nabundant: we empirically demonstrate the existence of polytopes of approximately\ncolliding examples in the neighbourhood of practically any example.\n\n1\n\nIntroduction\n\nDeep learning has achieved resounding success in recent years and is quickly becoming an integral\ncomponent of many real-world systems. As a result of its success, increasing attention has focused\non studying the robustness of neural nets, in particular to inputs that are deliberately chosen to yield\nan unexpected prediction.\nIt is well-known that neural nets could be surprisingly sensitive to minute changes to the input that\nare deliberately designed to result in a different classi\ufb01cation decision. Such examples are known as\nadversarial examples and have been extensively studied in the literature (Dalvi et al., 2004; Biggio\net al., 2013; Szegedy et al., 2013). In this paper, we demonstrate the existence of a phenomenon that\nis in some sense the opposite: we show that neural nets could be surprisingly insensitive to large\nchanges to the input. We provide an explanation for how and why this could arise and propose a\nmethod for systematically \ufb01nding such kinds of inputs.\nTo \ufb01nd changes that a neural net is insensitive to, it suf\ufb01ces to \ufb01nd examples that share the same\nfeature activation at some layer, since the same activation at some layer implies the same activation at\nall subsequent layers and therefore the same classi\ufb01cation decision. We will refer to the phenomenon\nof multiple examples sharing the same feature activations as feature collisions 2 and such examples\n\n\u21e4Equal contribution.\n2It should noted that the term \u201cfeature collision\u201d was also used by (Shafahi et al., 2018) to refer to a different,\nbut related, concept. In their context, feature collisions refer to input examples that are close to both a target data\nexample in feature space and a different data example from a different class in input space. In our context, input\nexamples only need to be close to a target data example in feature space; they do not need to be close to any\nparticular data example in input space.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fas colliding examples, to borrow terminology from the hashing literature, since feature collisions\ncorrespond to cases where the neural net maps different inputs to the same activation.\nA neural net is essentially \u201cblind\u201d to the differences between colliding examples, and so one would\nhope that such examples are rare and isolated, and when they do occur, are very similar to each other\nso that differences between them can be safely ignored. We show in this paper that for common\nneural net architectures, such examples could actually be quite abundant \u2013 in fact, there could be a\npolytope containing in\ufb01nitely many colliding examples, that is, all examples inside this polytope\nhave the same feature activation. We show this arises from the geometry of recti\ufb01ed linear unit\n(ReLU) activation functions. The key observation is that we could change any negative component\nin the pre-activation of a layer before ReLU to any other negative value without changing the post-\nactivation, and consequently keep the pre- and post-activations in all later layers the same. We show\nthat in general, for any neural net with a layer with ReLU activations, there is a convex (but possibly\nunbounded) polytope in the space of post-activations of the previous layer such that all input examples\nwhose post-activation vectors fall in the interior or the boundary of the polytope will have identical\nactivations in all later layers. We devise a method of empirically \ufb01nding relaxed versions of such\npolytopes, within which feature activations are approximately equal.\nIt turns out such polytopes are surprisingly common. We demonstrate the existence of such polytopes\nin the neighbourhood of practically any example that we tried. Furthermore, we demonstrate the\nexistence of such polytopes regardless of what target feature activation the examples inside the\npolytope are made to collide with. We \ufb01nd that the radius of the polytopes is quite large. Moreover,\nwe empirically con\ufb01rm that all of 2000 examples randomly drawn from the polytope are classi\ufb01ed\ncon\ufb01dently into the same class.\nDue to humans\u2019 insensitivity to differences at high frequencies, large differences in terms of mag-\nnitude may not be perceptually obvious. To \ufb01nd polytopes where images at different locations are\nperceptually different, we also constrain the search space to compositions of image patches and\ndemonstrate the existence of such polytopes even under this constrained setting.\nBecause such polytopes arise from the properties of the architecture itself (in particular the activation\nfunctions) rather than the weights, the polytopes cannot be eliminated by simply augmenting the\ndataset. Training the neural net on a different dataset can only make the weights of the neural net\ndifferent and so cannot in general change the existence of the polytopes (though it may change the\nvolume of the polytopes).\n\n2 Method\nConsider any layer in a neural net with ReLU activations. Let W 2 RN\u21e5d denote the weight matrix,\nx 2 Rd the previous layer\u2019s post-activation, b 2 RN the biases associated with the current layer.\nMoreover, let\u2019s de\ufb01ne y 2 RN as the vector of post-activations (activations after the ReLU), and\n\u02dcy 2 RN as the vector of pre-activations (activations before the ReLU). So,\n\ny = max(\u02dcy, 0) = max(Wx + b, 0), where W =264\n\nw>1...\n\nw>n\n\n375 b =264\n\nb1\n...\nbn\n\n375\n\nOur goal is to \ufb01nd a colliding example that has the same post-activations as a target example. We can\nidentify examples by their post-activations in the previous layer, since two examples with identical\npost-activations in the previous layer will always have identical pre- and post-activations in the\ncurrent layer. We will denote the colliding example as x\u21e4 and the target example as xt and de\ufb01ne\ny\u21e4, \u02dcy\u21e4, yt, \u02dcyt analogously. For any vector v, we will use the notation vi to denote the ith component.\nSince ReLUs map all non-positive values to zeros, if the target example has a post-activation of\nzero in one component, as long as the pre-activation of the colliding example is non-positive in that\ncomponent, then the target and the colliding example would have the same post-activation in that\ncomponent. In order to make the post-activations of the two examples identical in all components,\nthe following conditions are necessary and suf\ufb01cient:\n\n8i such that \u02dcyt\n8i such that \u02dcyt\n\ni > 0, \u02dcy\u21e4i = w>i x\u21e4 + bi = \u02dcyt\ni\ni \uf8ff 0, \u02dcy\u21e4i = w>i x\u21e4 + bi \uf8ff 0\n\n2\n\n\fProposition 1. Let N + be the number of components in \u02dcyt that are positive. Let W+ 2 RN +\u21e5d\nand W 2 R(NN +)\u21e5d be the submatrices of W consisting of rows where the corresponding\ncomponents in \u02dcyt are positive and non-positive respectively. Assume N + < d. Let ker(\u00b7) denote the\nkernel of a matrix. The set of colliding examples forms a convex (but possibly unbounded) polytope\nin at least a d N +-dimensional (af\ufb01ne) subspace if ker(W+) 6\u2713 ker(W), or a subspace of at\nleast d N + dimensions otherwise.\nProof. The set of colliding examples must satisfy all of the above conditions, so it is the intersection\nof the following sets:\n\n{x\u21e4 : w>i x\u21e4 + bi = \u02dcyt\ni} for i such that \u02dcyt\n{x\u21e4 : w>i x\u21e4 + bi \uf8ff 0} for i such that \u02dcyt\n\ni > 0\ni \uf8ff 0\n\nGeometrically, each set that corresponds to an equality constraint represents a (d 1)-dimensional\nhyperplane. Each set that corresponds to an inequality constraint represents a half-space. The\nconjunction of the equality constraints is the intersection of the associated hyperplanes, which is an\naf\ufb01ne subspace of at least d N + dimensions. We consider the projection of each half-space onto\nthis subspace, which is either a half-space in the subspace or the subspace itself.\nWhen ker(W+) 6\u2713 ker(W), at least the projection of one half-space is a half-space in the subspace\n(see the appendix for a derivation of this fact). The set of colliding examples is the intersection\nof all projections of half-spaces, which is the intersection of all projections that are half-spaces in\nthe subspace. In general, the intersection of \ufb01nitely many half-spaces is a convex (but possibly\nunbounded) polytope. So, in this case, the set of colliding examples is a convex polytope in the\nsubspace.\nWhen ker(W+) \u2713 ker(W), all projections of half-spaces are the subspace itself, and so the\nintersection of the projections is simply the subspace itself. Therefore, in this case, the set of colliding\nexamples is the subspace.\n\nAny point in this polytope corresponds to a colliding example. Since there could be in\ufb01nitely many\npoints in a polytope, there could be in\ufb01nitely colliding examples. If the polytope is bounded, we\nwould be able to characterize all such examples by the vertices of the polytope, in which case the\npolytope would simply be the convex hull of all the vertices. Then, any colliding example can be\nwritten as a convex combination of the vertices, and we can generate a new colliding example by\ntaking an arbitrary convex combination of the vertices. 3\nTo demonstrate the existence of this polytope, it suf\ufb01ces to \ufb01nd a subset contained in the polytope,\ne.g.: the convex hull of a subset of the vertices. To \ufb01nd a subset of vertices, we can move the dividing\nhyperplane of a half-space towards the feasible direction, which mathematically corresponds to\ndecreasing the RHS of the corresponding inequality constraint. This is equivalent to picking a unit in\nthe current layer and trying to make it as negative as possible. This process is illustrated in Figure 1\nin 2D. To \ufb01nd a different vertex, we can simply pick a different constraint to optimize.\nTo \ufb01nd a vertex, we solve the following optimization problem for some i such that \u02dcyt\ni \uf8ff 0. Note\nthat instead of searching over the space of the previous layer\u2019s post-activations x, we can directly\nsearch over the space of inputs u. Let \u02dcy(u) and y(u) denote the current layer\u2019s pre-activation and\npost-activation of an arbitrary input u.\n\nmin\n\nu\n\n\u21b5L+(u) + Li\n\n(u), where L+(u) =\n\n1\n\n2\u02dcy(u) 1\u21e5\u02dcyt > 0\u21e4 yt2\n\n(u) =\u02dcy(u) 1\u21e5\u02dcyt \uf8ff 0\u21e4i +Xj6=i(\u02dcy(u) 1\u21e5\u02dcyt \uf8ff 0\u21e4)j\n\n2\n\nand Li\n\nIntuitively, L+ aims to keep the coordinates in which the target example has positive pre-activations\nplaces constraints on the\nsimilar between the target and the colliding example. On the other hand, Li\n\ncoordinates in which the target has non-positive pre-activations. The \ufb01rst term of Li\ntries to make\n\nthe pre-activation of the selected hidden unit i as negative as possible and the second term keeps the\n3If the polytope is unbounded, we can still generate new colliding examples this way, but there could be\n\ncolliding examples that are not convex combinations of the vertices.\n\n3\n\n\fFigure 1: On the left, we illustrate the polytope that arises when there are three half spaces in a\ntwo-dimensional subspace. On the right, we illustrate how we can \ufb01nd a vertex of a polytope.\n\nother hidden units close to zero. It does not matter how negative the selected unit becomes, since\nafter applying the ReLU activation function, it becomes zero, which by de\ufb01nition is equal to the\npost-activation of the target example.\n\n3 Experiment\n\nIn this section we will apply our method to two standard neural net architectures trained on the\nMNIST and ImageNet datasets.\n\n3.1 MNIST Dataset\nFirst we train a simple fully-connected neural network with two hidden layers 4, each with 256 units\nand ReLU activations on the MNIST dataset. The trained model achieves a test accuracy of 96.64%.\nNext, we apply the proposed method to \ufb01nd feature collisions at the \ufb01rst hidden layer. We performed\ntwo experiments and found two polytopes that collide with the same target example, one by initializing\nfrom the target example, and one by initializing from a different example in the dataset. For\neach experiment, we found \ufb01ve vertices of the polytope, by optimizing the proposed objective for\ni = 0, 1, . . . , 4. The results are shown in Table 1.\nAny example within the convex hull of the \ufb01ve vertices in each row has similar feature activations\nat the \ufb01rst hidden layer and therefore all subsequent hidden layers. To con\ufb01rm this empirically,\nwe randomly generated 2000 examples from each discovered polytope (by taking random convex\ncombinations of the \ufb01ve polytope vertices) and checked the predicted class label and con\ufb01dence of\nthe prediction. As shown in Table 2, 100% of the 2000 examples are classi\ufb01ed as the same class as\nthe target example with extremely high con\ufb01dence (1.0).\nInterestingly, for the polytope in the \ufb01rst row, even though all polytope vertices are clearly 7s, the\nneural net should not classify examples with the polytope with the same con\ufb01dence as it does on\nthe target image, since the images in the polytope are clearly harder to discern. In this case, while\nthe classi\ufb01cation decision is correct, the con\ufb01dence is not. For the polytope in the second row,\nthe polytope vertices are clearly not 7s, yet the neural net still classi\ufb01ed them as 7s with very high\ncon\ufb01dence. In this case, both the classi\ufb01cation decision and the con\ufb01dence are incorrect. Furthermore,\nall intermediate activations of the polytope and the target example are similar, which means that\nthe neural net sees almost no difference at all between the target example and examples within the\npolytope.\nWe now turn our attention to the size of the polytopes. We report the average distance in Table 2, and\ncompare it to the average distance between pairs of arbitrary images from the dataset. It turns out the\naverage distance between polytope vertices is 12-13% of the average distance between images in the\ndataset, indicating that the polytope is quite large, even though it may not appear so visually, because\nhumans are not adept at detecting differences at high frequencies.\n\n4https://github.com/aymericdamien/TensorFlow-Examples\n\n4\n\n \u0001x +\u2264 0w2b2 \u0001x +\u2264 0w3b3 \u0001x +\u2264 0w1b1 as decreases \u0001x +<= \u03b3w3b3\u03b3\fTarget\n\nPolytope vertices\n\nTable 1: Two polytopes that collide with the same target example (an image of the digit 7). To the\nneural net, no example within either polytope is distinguishable from the target example.\n\nWe also applied thew proposed method to the LeNet-5 (LeCun et al.) convolutional network; the\ndetails and results are included in the appendix.\n\nRow# % of successful\n\ncollisions\n\n1\n2\n\n100%\n100%\n\nTarget\n1.0\n1.0\n\nAverage distance between images\nOptimized samples\n1.33\n1.28\n\n10.21\n\n1.0\n1.0\n\nTop class probability\n\nInterpolated samples MNIST\n\nTable 2: Quantitative results of MNIST experiments. First column corresponds to the row number of\nthe experiments in Table 1. Second column shows the probability of randomly sampled images in\nthe polytope being classi\ufb01ed in the same class as the target image. Third column shows both the top\nclass con\ufb01dence for both the target and the samples. Fourth column compares the average l2 distance\nbetween images from MNIST dataset with the average distance between images from the 5 optimized\npolytope vertices.\n\n3.2\n\nImageNet Dataset\n\nWe now perform the same experiment on ImageNet. We use a pre-trained VGG-16 net (Simonyan &\nZisserman (2014)) that achieves a 92.7% top-5 test accuracy. In our experiments, we tried to \ufb01nd\nfeature collisions at the fc6 layer.\nWe visualize the results in Table 3 and present the quantitative results in Table 11. For reasons of\nspace, additional results are found in the appendix. As shown in Table11, the neural net classi\ufb01es\nall 2000 randomly chosen examples from each polytope into the category of the target example,\nagain with very high con\ufb01dence (0.963 0.996). In the case of ImageNet, the polytopes are even\nlarger relative to the average distance between images from the dataset: the average distance between\npolytope vertices is now 12 24% of the average pairwise distance between images, though the\ndifferences between the polytope vertices are not visually apparent for the same reason as above.\n\nTarget\n\nPolytope vertices\n\nTable 3: Colliding polytope on ImageNet that is found when initializing from another image. All\nexamples within the polytopes collide with the target image.\n\n5\n\n\fRow# % of successful\n\ncollisions\n\n1\n\n100%\n\nTop class probability\n\nTarget\n0.993\n\nInterpolated samples\n0.995\n\nAverage distance between images\nPolytope vertices\nImageNet\n37538.22\n4906.56\n\nTable 4: Quantitative results corresponding to the results shown in Table 3.\n\n4 More Perceptually Different Polytope Vertices\n\nDo the colliding examples found in the previous section matter? One could argue that there is nothing\nto worry about, because if the different colliding examples are not perceptually different to a human,\nperhaps it is \ufb01ne if a neural net cannot tell the difference between the different colliding examples. 5\nIn this section, we demonstrate that colliding examples can still be found even when we bias the\nsearch towards perceptually different colliding examples. To this end, we constrain the search space\nto compositions of existing image patches, which are more perceptually different because changes in\nthe way different patches are composed usually include changes in coarser details, which are more\nvisually apparent.\nMore concretely, we parameterize the space of images in terms of compositions of existing image\npatches rather than raw pixel values, where the parameters specify how to compose different image\npatches. Because the space of image patches is much larger than the space of pixels, we have to\nconstrain our set of patches to only those that could potentially be useful for solving our optimization\nproblem.\nWe construct such a set by picking an image to start from, and then extract equal-sized patches from\nthat image at different spatial locations. For each patch, we then retrieve the k most similar patches\nfrom the dataset consisting of all patches extracted from images in the dataset. We then choose the\nsolution space to be the space of possible convex combinations of these patches and optimize over\nthe coef\ufb01cients for combining the patches. More details are included in the appendix.\n\n4.1 Results\n\nWe perform the experiment described above on the ImageNet dataset. We construct a dataset of\npatches by randomly selecting 10% of the images from each class and extracting 32 \u21e5 32 patches\nusing a sliding window with constant pixel strides along horizontal and vertical directions. This\nyielded a dataset containing 37, 011, 074 patches. To retrieve the k-nearest neighbours from this\ndataset, we use Prioritized DCI (Li & Malik, 2017), which can \ufb01nd the nearest neighbours in this\ndataset in \ufb01ve minutes.\nWe parameterize the coef\ufb01cients for combining patches using a softmax over different possible\npatches, so that the coef\ufb01cients at every pixel location lie within the range of (0, 1) and sum up to 1.\nTable 5 shows the results of our experiments. For reasons of space, additional results are found in the\nappendix. As shown, the differences between the polytope vertices are more obvious (for a guide\non where to direct attention to, take a look at the rightmost column, which shows the locations in\nwhere the differences are most apparent). Yet, as shown in Table 6, 100% of the examples inside this\npolytope are classi\ufb01ed the same as the target image with high con\ufb01dence. This demonstrates that it\nis possible to \ufb01nd an arbitrarily many perceptually different examples (obtained by sampling from\ndifferent locations in the polytope) that all look nearly identical to a neural net.\nTable 6 also shows the average pairwise distance between the polytope vertices. It is interesting to\nnote that the distance between the polytope vertices is similar in magnitude to the distance between the\npolytope vertices found previously in Table 11. This suggests that colliding polytopes where different\npoints are perceptually different are not necessarily larger in volume than colliding polytopes where\nthe differences are not as obvious. This suggests that the colliding polytopes found using the vanilla\nmethod are not necessarily insigni\ufb01cant \u2013 it\u2019s just that they seem less signi\ufb01cant to the human eye.\n\n5Note that this argument cannot be made for the difference between the target example and a colliding\nexample; the target example is clearly different from a colliding example to a human, but triggers nearly identical\nactivations as the colliding example. This is clearly undesirable.\n\n6\n\n\fTarget\n\nPolytope vertices\n\nIndicator\n\nTable 5: ImageNet results, where the search space is constrained to compositions of image patches,\nthereby leading to more perceptually different polytope vertices. First row shows vertices optimized\nfrom an image of a church that share feature collisions with itself. Second row shows vertices\noptimized from the same image of a church but share feature collision with a cleaver. Images in the\nrightmost column highlights the differences between the vertices of the polytope using purple boxes.\n\nRow# % of successful\n\ncollisions\n\n1\n2\n\n100%\n100%\n\nTop class probability\n\nTarget\n0.952\n0.999\n\nInterpolated samples\n0.877\n0.800\n\nAverage distance between images\nPolytope vertices\nImageNet\n8597.34\n9587.92\n\n37538.22\n\nTable 6: Quantitative results corresponding to the results shown in Table 5.\n\n5 Discussion\n\nWhile this paper focuses on the ReLU activation function, the observations are broadly applicable.\nFirst, the proposed method can be used to \ufb01nd colliding examples as long as there is some intermediate\nlayer with ReLU activation, since once the features at that layer collides, features at later layers must\ncollide as well.\nSecond, the observations hold approximately for any activation function that saturates, i.e. sigmoid,\ntanh or ELU, since in the saturating region, the pre-activation can be changed substantially without\nsigni\ufb01cantly changing the post-activation.\nThe phenomenon of feature collisions can be used in various applications. Some possible applications\nthat we can foresee are below:\n\n1. Representative Data Collection: The size of the colliding polytope around a training\nexample can be used to discover regions of the data space where insuf\ufb01cient training\nexamples have been collected. More concretely, if the size of a colliding polytope around a\ntraining example is large, then the neural net could be over-generalizing in the neighbourhood\nof that example, and so the model may not be accurate in this neighbourhood. This can be\nused to inform the end-user whether the prediction in this neighbourhood should be trusted,\nor to guide data collection, so that more examples in this neighbourhood are collected in the\nfuture.\n\n2. Design of Regularizers: The insight our method reveals can lead to the design of regulariz-\ners that mitigates undesirable over-generalization. For example, one could try to minimize\nthe size of the colliding polytopes, by discouraging the hyperplanes associated with each\nhidden unit from being near-collinear (i.e. having highly positive cosine similarity) with\nother hyperplanes. This can be also used to guide architecture selection.\n\n3. Identi\ufb01cation of Vulnerable Training Examples: The proposed method can identify the\ntraining examples that a neural net depends most on, which could have large colliding\npolytopes around them. This can help detect outliers and training examples that could have\nbeen mislabelled or adversarially tampered with, or legitimate training examples that could\nbe vulnerable to manipulation due to how much the neural net depends on them.\n\n7\n\n\f6 Related Work\n\nThere have been several lines of work that use iterative optimization to \ufb01nd noteworthy input examples.\nOne line of work is on adversarial examples, where the goal is to \ufb01nd a small perturbation to a source\ninput example so that it is misclassi\ufb01ed. Tatu et al. (2011) proposed using projected gradient descent\nto \ufb01nd a perturbed version of an example with similar SIFT features as an example from a different\nclass. Szegedy et al. (2013) demonstrated a similar phenomenon in neural nets, where an adversarially\nperturbed example can be made to be classi\ufb01ed by neural nets as an example from any arbitrarily class.\nNguyen et al. (2015) further shows that it is easy to generate images that do not resemble any class,\nbut are classi\ufb01ed as a recognizable object with high con\ufb01dence. Kurakin et al. (2016) demonstrated\nthat even when adversarial examples are printed on a sheet of paper, they are still effective at fooling\nneural nets. Athalye & Sutskever (2017) even managed to 3D print adversarial example models.\nMore recently, Shafahi et al. (2018) showed that it is possible to adversarially perturb examples so\nthat their features are close to the features of a completely different example.\nSimilar techniques have also been used for a different purpose, namely to understand what input\nimage would cause the activation of a particular neuron in neural nets to be high, which is known\nas activation maximization (Erhan et al., 2009). This technique can be applied to either a neuron\nin the output layer (Simonyan et al., 2013) or a hidden layer (Erhan et al., 2009; Yosinski et al.,\n2015). A related line of work, known as code inversion (Mahendran & Vedaldi, 2015; Dosovitskiy\n& Brox, 2016), aims to \ufb01nd an input image whose entire activation vector after a particular layer is\nsimilar to the activation vector of a particular real image. Unlike the adversarial example literature,\nthe goal of this body of work is to \ufb01nd an interpretable/visually recognizable image that allows for\nthe visualization of the kinds of images that would either cause a particular neuron to activate or all\nneurons in the same layer to exhibit a particular pattern. Therefore, a regularizer is usually included\nin the objective function that favours images that are more natural, e.g.: those that are smooth and do\nnot have high frequencies. Surprisingly, the images that are found often bear resemblance to instances\nin the target class. It has been conjectured (Mahendran & Vedaldi, 2015; Nguyen et al., 2016) that\nthe inclusion of this regularizer would explain the apparent discrepancy between the \ufb01ndings of the\nadversarial example literature and the code inversion literature \u2013 code inversion is simply not \ufb01nding\nadversarial examples because having high-frequency perturbations is penalized by the regularizer.\nOur experiments show that this may in fact not be true, since we are able to successfully \ufb01nd examples\nthat do not have high frequencies but clearly do not resemble any instance of the target class. We\nconjecture this may be because points near the centre of the polytope may not be reachable using a\nna\u00efve loss function that only penalizes the difference between post-activations; this is because the\ngradient becomes zero as soon as the input example is moved into the polytope. As a result, the\nsolution found using gradient descent will usually be near the boundary of the polytope, which may\nhappen to resemble objects in the target class.\nConcurrent work (Jacobsen et al., 2018) also explores a similar theme. One difference with this work\nis that they only consider collisions at the level of the classi\ufb01cation predictions, rather than collisions\nat the level of lower-level features.\n\n7 Conclusion\n\nIn this paper, we have shown theoretically that polytopes of examples sharing the same feature\nembeddings could exist due to the properties of ReLU activation functions. We developed a method\nfor \ufb01nding such polytopes and demonstrated empirically that they do in fact exist in commonly used\nneural nets. Somewhat surprisingly, in Section 4.1, we demonstrated that even after constraining\nexamples to be compositions of image patches, these polytopes still exist. Furthermore, the vertices\nof the polytope appear perceptually different, which shows that interpolations in this polytope can all\nbe misclassi\ufb01ed even though each interpolation is visually distinctive.\n\nAcknowledgements\n\nThis work was supported by ONR MURI N00014-14-1-0671. Ke Li thanks the Natural Sciences and\nEngineering Research Council of Canada (NSERC) for fellowship support.\n\n8\n\n\fReferences\nAthalye, A. and Sutskever, I. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397, 2017.\n\nBiggio, B., Corona, I., Maiorca, D., Nelson, B., \u0160rndi\u00b4c, N., Laskov, P., Giacinto, G., and Roli, F. Evasion attacks\nagainst machine learning at test time. In Joint European conference on machine learning and knowledge\ndiscovery in databases, pp. 387\u2013402. Springer, 2013.\n\nDalvi, N., Domingos, P., Sanghai, S., Verma, D., et al. Adversarial classi\ufb01cation. In Proceedings of the tenth\nACM SIGKDD international conference on Knowledge discovery and data mining, pp. 99\u2013108. ACM, 2004.\n\nDosovitskiy, A. and Brox, T. Inverting visual representations with convolutional networks. In Proceedings of the\n\nIEEE Conference on Computer Vision and Pattern Recognition, pp. 4829\u20134837, 2016.\n\nErhan, D., Bengio, Y., Courville, A., and Vincent, P. Visualizing higher-layer features of a deep network.\n\nUniversity of Montreal, 1341(3):1, 2009.\n\nJacobsen, J.-H., Behrmann, J., Zemel, R., and Bethge, M. Excessive invariance causes adversarial vulnerability.\n\narXiv preprint arXiv:1811.00401, 2018.\n\nKurakin, A., Goodfellow, I., and Bengio, S. Adversarial examples in the physical world. arXiv preprint\n\narXiv:1607.02533, 2016.\n\nLeCun, Y. et al. Lenet-5, convolutional neural networks.\n\nLi, K. and Malik, J. Fast k-nearest neighbour search via prioritized dci. arXiv preprint arXiv:1703.00440, 2017.\n\nMahendran, A. and Vedaldi, A. Understanding deep image representations by inverting them. In Proceedings of\n\nthe IEEE conference on computer vision and pattern recognition, pp. 5188\u20135196, 2015.\n\nNguyen, A., Yosinski, J., and Clune, J. Deep neural networks are easily fooled: High con\ufb01dence predictions for\nunrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,\npp. 427\u2013436, 2015.\n\nNguyen, A., Yosinski, J., and Clune, J. Multifaceted feature visualization: Uncovering the different types of\n\nfeatures learned by each neuron in deep neural networks. arXiv preprint arXiv:1602.03616, 2016.\n\nShafahi, A., Huang, W. R., Najibi, M., Suciu, O., Studer, C., Dumitras, T., and Goldstein, T. Poison frogs!\n\ntargeted clean-label poisoning attacks on neural networks. arXiv preprint arXiv:1804.00792, 2018.\n\nSimonyan, K. and Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv\n\npreprint arXiv:1409.1556, 2014.\n\nSimonyan, K., Vedaldi, A., and Zisserman, A. Deep inside convolutional networks: Visualising image classi\ufb01ca-\n\ntion models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.\n\nSzegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties\n\nof neural networks. arXiv preprint arXiv:1312.6199, 2013.\n\nTatu, A., Lauze, F., Nielsen, M., and Kimia, B. Exploring the representation capabilities of the hog descriptor.\nIn Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 1410\u20131417.\nIEEE, 2011.\n\nYosinski, J., Clune, J., Nguyen, A., Fuchs, T., and Lipson, H. Understanding neural networks through deep\n\nvisualization. arXiv preprint arXiv:1506.06579, 2015.\n\n9\n\n\f", "award": [], "sourceid": 9264, "authors": [{"given_name": "Ke", "family_name": "Li", "institution": "UC Berkeley"}, {"given_name": "Tianhao", "family_name": "Zhang", "institution": "Nanjing University"}, {"given_name": "Jitendra", "family_name": "Malik", "institution": "University of California at Berkley"}]}