{"title": "Full-Gradient Representation for Neural Network Visualization", "book": "Advances in Neural Information Processing Systems", "page_first": 4124, "page_last": 4133, "abstract": "We introduce a new tool for interpreting neural nets, namely full-gradients, which decomposes the neural net response into input sensitivity and per-neuron sensitivity components. This is the first proposed representation which satisfies two key properties: completeness and weak dependence, which provably cannot be satisfied by any saliency map-based interpretability method. Using full-gradients, we also propose an approximate saliency map representation for convolutional nets dubbed FullGrad, obtained by aggregating the full-gradient components.\n\nWe experimentally evaluate the usefulness of FullGrad in explaining model behaviour with two quantitative tests: pixel perturbation and remove-and-retrain. Our experiments reveal that our method explains model behavior correctly, and more comprehensively than other methods in the literature. Visual inspection also reveals that our saliency maps are sharper and more tightly confined to object regions than other methods.", "full_text": "Full-Gradient Representation for\n\nNeural Network Visualization\n\nSuraj Srinivas\n\nIdiap Research Institute & EPFL\nsuraj.srinivas@idiap.ch\n\nFran\u00e7ois Fleuret\n\nIdiap Research Institute & EPFL\nfrancois.fleuret@idiap.ch\n\nAbstract\n\nWe introduce a new tool for interpreting neural net responses, namely full-gradients,\nwhich decomposes the neural net response into input sensitivity and per-neuron\nsensitivity components. This is the \ufb01rst proposed representation which satis\ufb01es\ntwo key properties: completeness and weak dependence, which provably cannot\nbe satis\ufb01ed by any saliency map-based interpretability method. For convolutional\nnets, we also propose an approximate saliency map representation, called FullGrad,\nobtained by aggregating the full-gradient components.\nWe experimentally evaluate the usefulness of FullGrad in explaining model be-\nhaviour with two quantitative tests: pixel perturbation and remove-and-retrain.\nOur experiments reveal that our method explains model behavior correctly, and\nmore comprehensively, than other methods in the literature. Visual inspection\nalso reveals that our saliency maps are sharper and more tightly con\ufb01ned to object\nregions than other methods.\n\n1\n\nIntroduction\n\nThis paper studies saliency map representations for the interpretation of neural network functions.\nSaliency maps assign to each input feature an importance score, which is a measure of the usefulness\nof that feature for the task performed by the neural network. However, the presence of internal\nstructure among features sometimes makes it dif\ufb01cult to assign a single importance score per feature.\nFor example, input spaces such as that of natural images are compositional in nature. This means that\nwhile any single individual pixel in an image may be unimportant on its own, a collection of pixels\nmay be critical if they form an important image region such as an object part.\nFor example, a bicycle in an image can still be identi\ufb01ed if any single pixel is missing, but if the\nentire collection of pixels corresponding to a key element, such as a wheel or the drive chain, are\nmissing, then it becomes much more dif\ufb01cult. Here the importance of a part cannot be deduced from\nthe individual importance of its constituent pixels, as each such individual pixel is unimportant on\nits own. An ideal interpretability method would not just provide importance for each pixel, but also\ncapture that of groups of pixels which have an underlying structure.\nThis tension also reveals itself in the formal study of saliency maps. While there is no single formal\nde\ufb01nition of saliency, there are several intuitive characteristics that the community has deemed\nimportant [1, 2, 3, 4, 5, 6]. One such characteristic is that an input feature must be considered\nimportant if changes to that feature greatly affect the neural network output [5, 7]. Another desirable\ncharacteristic is that the saliency map must completely explain the neural network output, i.e., the\nindividual feature importance scores must add up to the neural network output [1, 2, 3]. This is done\nby a redistribution of the numerical output score to individual input features. In this view, a feature\nis important if it makes a large numerical contribution to the output. Thus we have two distinct\nnotions of feature importance, both of which are intuitive. The \ufb01rst notion of importance assignment\nis called local attribution and second, global attribution. It is almost always the case for practical\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fneural networks that these two notions yield methods that consider entirely different sets of features\nto be important, which is counter-intuitive.\nIn this paper we propose full-gradients, a representation which assigns importance scores to both the\ninput features and individual feature detectors (or neurons) in a neural network. Input attribution helps\ncapture importance of individual input pixels, while neuron importances capture importance of groups\nof pixels, accounting for their structure. In addition, full-gradients achieve this by simultaneously\nsatisfying both notions of local and global importance. To the best of our knowledge, no previous\nmethod in literature has this property.\nThe overall contributions of our paper are:\n\n1. We show in \u00a7 3 that weak dependence (see De\ufb01nition 1), a notion of local importance,\nand completeness (see De\ufb01nition 2), a notion of global importance, cannot be satis\ufb01ed\nsimultaneously by any saliency method. This suggests that the counter-intuitive behavior of\nsaliency methods reported in literature [3, 5] is unavoidable.\n\n2. We introduce in \u00a7 4 the full-gradients which are more expressive than saliency maps, and\nsatisfy both importance notions simultaneously. We also use this to de\ufb01ne approximate\nsaliency maps for convolutional nets, dubbed FullGrad, by leveraging strong geometric\npriors induced by convolutions.\n\n3. We perform in \u00a7 5 quantitative tests on full-gradient saliency maps including pixel perturba-\ntion and remove-and-retrain [8], which show that FullGrad outperforms existing competitive\nmethods.\n\n2 Related Work\n\nWithin the vast literature on interpretability of neural networks, we shall restrict discussion solely to\nsaliency maps or input attribution methods. First attempts at obtaining saliency maps for modern\ndeep networks involved using input-gradients [7] and deconvolution [9]. Guided backprop [10] is\nanother variant obtained by changing the backprop rule for input-gradients to produce cleaner saliency\nmaps. Recent works have also adopted axiomatic approaches to attribution by proposing methods\nthat explicitly satisfy certain intuitive properties. Deep Taylor decomposition [2], DeepLIFT [3],\nIntegrated gradients [1] and DeepSHAP [4] adopt this broad approach. Central to all these approaches\nis the requirement of completeness which requires that the saliency map account for the function\noutput in an exact numerical sense. In particular, Lundberg et al.[4] and Ancona et al.[11] propose\nunifying frameworks for several of these saliency methods.\nHowever, some recent work also shows the fragility of some of these methods. These include\nunintuitive properties such as being insensitive to model randomization [6], partly recovering the\ninput [12] or being insensitive to the model\u2019s invariances [5]. One possible reason attributed for the\npresence of such fragilities is evaluation of attribution methods, which are often solely based on\nvisual inspection. As a result, need for quantitative evaluation methods is urgent. Popular quantitative\nevaluation methods in literature are based on image perturbation [13, 11, 2]. These tests broadly\ninvolve removing the most salient pixels in an image, and checking whether they affect the neural\nnetwork output. However, removing pixels can cause artifacts to appear in images. To compensate\nfor this, RemOve And Retrain (ROAR) [8] propose a retraining-based procedure. However, this\nmethod too has drawbacks as retraining can cause the model to focus on parts of the input it had\npreviously ignored, thus not explaining the original model. Hence we do not yet have completely\nrigorous methods for saliency map evaluation.\nSimilar to our paper, some works [3, 14] also make the observation that including biases within\nattributions can enable gradient-based attributions to satisfy the completeness property. However,\nthey do not propose attribution methods based on this observation like we do in this paper.\n\n3 Local vs. Global Attribution\n\nIn this section, we show that there cannot exist saliency maps that satisfy both notions of local and\nglobal attribution. We do this by drawing attention to a simple fact that D\u2212dimensional saliency\nmap cannot summarize even linear models in RD, as such linear models have D + 1 parameters. We\n\n2\n\n\fprove our results by de\ufb01ning a weak notion of local attribution which we call weak dependence, and\na weak notion of global attribution, called completeness.\nLet us consider a neural network function f : RD \u2192 R with inputs x \u2208 RD. A saliency map\nS(x) = \u03c3(f, x) \u2208 RD is a function of the neural network f and an input x. For linear models of the\nform f (x) = wT x + b , it is common to visualize the weights w. For this case, we observe that the\nsaliency map S(x) = w is independent of x. Similarly, piecewise-linear models can be thought of as\ncollections of linear models, with each linear model being de\ufb01ned on a different local neighborhood.\nFor such cases, we can de\ufb01ne weak dependence as follows.\nDe\ufb01nition 1. (Weak dependence on inputs) Consider a piecewise-linear model\n\n\uf8f1\uf8f2\uf8f3wT\n\n0 x + b0 x \u2208 U0\n...\nn x + bn x \u2208 Un\nwT\n\nf (x) =\n\nwhere all Ui are open connected sets. For this function, the saliency map S(x) = \u03c3(f, x) restricted\nto a set Ui is independent of x, and depends only on the parameters wi, bi.\nHence in this case S(x) depends weakly on x by being dependent only on the neighborhood Ui\nin which x resides. This generalizes the notion of local importance to piecewise-linear functions.\nA stronger form of this property, called input invariance, was deemed desirable in previous work\n[5], which required saliency methods to mirror model sensitivity. Methods which satisfy our weak\ndependence include input-gradients [7], guided-backprop [10] and deconv [9]. Note that our de\ufb01nition\nof weak dependence also allows for two disconnected sets having the same linear parameters (wi, bi)\nto have different saliency maps, and hence in that sense is more general than input invariance [5],\nwhich does not allow for this. We now de\ufb01ne completeness for a saliency map by generalizing\nequivalent notions presented in prior work [1, 2, 3].\nDe\ufb01nition 2. (Completeness) A saliency map S(x) is\n\n\u2022 complete if there exists a function \u03c6 such that \u03c6(S(x), x) = f (x) for all f, x.\n\u2022 complete with a baseline x0 if there exists a function \u03c6c such that \u03c6c(S(x), S0(x0), x, x0) =\n\nf (x) \u2212 f (x0) for all f, x, x0, where S0(x0) is the saliency map of x0.\n\nThe intuition here is that if we expect S(x) to completely encode the computation performed by f,\nthen it must be possible to recover f (x) by using the saliency map S(x) and input x. Note that the\nsecond de\ufb01nition is more general, and in principle subsumes the \ufb01rst. We are now ready to state our\nimpossibility result.\nProposition 1. For any piecewise-linear function f, it is impossible to obtain a saliency map S that\nsatis\ufb01es both completeness and weak dependence on inputs, in general.\n\nThe proof is provided in the supplementary material. A natural consequence of this is that methods\nsuch as integrated gradients [1], deep Taylor decomposition [2] and DeepLIFT [3] which satisfy\ncompleteness do not satisfy weak dependence. For the case of integrated gradients, we provide\na simple illustration showing how this can lead to unintuitive attributions. Given a baseline x(cid:48),\nintegrated gradients (IG) is given by IGi(x) = (xi \u2212 x(cid:48)\nd\u03b1, where xi is the\nith input co-ordinate.\nExample 1. (Integrated gradients [1] can be counter-intuitive)\nConsider the piecewise-linear function for inputs (x1, x2) \u2208 R2.\n\n\u2202f (x(cid:48)+\u03b1(x\u2212x(cid:48)))\n\n\u2202xi\n\n\u03b1=0\n\ni) \u00d7(cid:82) 1\n\uf8f1\uf8f2\uf8f3x1 + 3x2 x1, x2 \u2264 1\n\n3x1 + x2 x1, x2 > 1\n0\notherwise\n\nf (x1, x2) =\n\nAssume baseline x(cid:48) = (0, 0). Consider three points (2, 2), (4, 4), (1.5, 1.5) , all of which satisfy\nx1, x2 > 1 and thus are subject to the same linear function of f (x1, x2) = 3x1 + x2. However,\ndepending on which point we consider, IG yields different relative importances among the input\nfeatures. E.g: IG(x1 = 4, x2 = 4) = (10, 6) where it seems that x1 is more important (as 10 > 6),\n\n3\n\n\fwhile for IG(1.5, 1.5) = (2.5, 3.5), it seems that x2 is more important. Further, at IG(2, 2) = (4, 4)\nboth co-ordinates are assigned equal importance. However in all three cases, the output is clearly\nmore sensitive to changes to x1 than it is to x2 as they lie on f (x1, x2) = 3x1 + x2, and thus\nattributions to (2, 2) and (1.5, 1.5) are counter-intuitive.\n\nThus it is clear that two intuitive properties of weak dependence and completeness cannot be\nsatis\ufb01ed simultaneously. Both are intuitive notions for saliency maps and thus satisfying just one\nmakes the saliency map counter-intuitive by not satisfying the other. Similar counter-intuitive\nphenomena observed in literature may be unavoidable. For example, Shrikumar et al. [3] show counter-\nintuitive behavior of local attribution methods by invoking a property similar global attribution, called\nsaturation sensitivity. On the other hand, Kindermans et al. [5] show fragility for global attribution\nmethods by appealing to a property similar to local attribution, called input insensitivity.\nThis paradox occurs primarily because saliency maps are too restrictive, as both weights and biases\nof a linear model cannot be summarized by a saliency map. While exclusion of the bias term in\nlinear models to visualize only the weights seems harmless, the effect of such exclusion compounds\nrapidly for neural networks which have bias terms for each neuron. Neural network biases cannot be\ncollapsed to a constant scalar term like in linear models, and hence cannot be excluded. In the next\nsection we shall look at full-gradients, which is a more expressive tool than saliency maps, accounts\nfor bias terms and satis\ufb01es both weak dependence and completeness.\n\n4 Full-Gradient Representation\n\nIn this section, we introduce the full-gradient representation, which provides attribution to both inputs\nand neurons. We proceed by observing the following result for ReLU networks.\nProposition 2. Let f be a ReLU neural network without bias parameters, then f (x) = \u2207xf (x)T x.\nThe proof uses the fact that for such nets, f (kx) = kf (x) for any k > 0. This can be extended to\nReLU neural networks with bias parameters by incorporating additional inputs for biases, which is a\nstandard trick used for the analysis of linear models. For a ReLU network f (\u00b7; b) with bias, let the\nnumber of such biases in f be F .\nProposition 3. Let f be a ReLU neural network with biases b \u2208 RF , then\n\nf (x; b) = \u2207xf (x; b)T x + \u2207bf (x; b)T b\n\n(1)\n\nThe proof for these statements is provided in the supplementary material. Here biases include both\nexplicit bias parameters and well as implicit biases, such as running averages of batch norm layers.\nFor practical networks, we have observed that these implicit biases are often much larger in magnitude\nthan explicit bias parameters, and hence might be more important.\nWe can extend this decomposition to non-ReLU networks by considering implicit biases arising due\nto usage of generic non-linearities. For this, we linearize a non-linearity y = \u03c3(x) at a neighborhood\naround x to obtain y = d\u03c3(x)\ndx x + b\u03c3. Here b\u03c3 is the implicit bias that is unaccounted for by the\nderivative. Note that for ReLU-like non-linearities, b\u03c3 = 0. As a result, we can trivially extend\nthe representation to arbitrary non-linearities by appending b\u03c3 to the vector b of biases. In general,\nany quantity that is unaccounted for by the input-gradient is an implicit bias, and thus by de\ufb01nition,\ntogether they must add up to the function output, like in equation 1.\nEquation 1 is an alternate representation of the neural network output in terms of various gradient\nterms. We shall call \u2207xf (x, b) as input-gradients, and \u2207bf (x, b)(cid:12)b as the bias-gradients. Together,\nthey constitute full-gradients. To the best our knowledge, this is the only other exact representation\nof neural network outputs, other than the usual feed-forward neural net representation in terms of\nweights and biases.\nFor the rest of the paper, we shall henceforth use the shorthand notation f b(x) for \u2207bf (x, b) (cid:12) b,\nthe bias-gradient, and drop the explicit dependence on b in f (x, b).\n\n4.1 Properties of Full-Gradients\n\nHere discuss some intuitive properties of full-gradients. We shall assume that full-gradients comprise\nof the pair G = (\u2207xf (x), f b(x)) \u2208 RD+F . We shall also assume with no loss of generality that\n\n4\n\n\fInput\n\nLayer 3\n\nbias-gradient\n\nInput-grad\n\u00d7 input\n\nFullGrad\naggregate\nFigure 1: Visualization of bias-gradients at different layers of a VGG-16 pre-trained neural network.\nWhile none of the intermediate layer bias-gradients themselves demarcate the object satisfactorily,\nthe full-gradient map achieves this by aggregating information from the input-gradient and all\nintermediate bias-gradients. (see Equation 2).\n\nbias-gradient\n\nLayer 5\n\nLayer 7\n\nbias-gradient\n\nthe network contains ReLU non-linearity without batch-norm, and that all biases are due to bias\nparameters.\nWeak dependence on inputs: For a piecewise linear function f, it is clear that the input-gradient is\nlocally constant in a linear region. It turns out that a similar property holds for f b(x) as well, and a\nshort proof of this can be found in the supplementary material.\nCompleteness: From equation 1, we see that the full-gradients exactly recover the function output\nf (x), satisfying completeness.\nSaturation sensitivity: Broadly, saturation refers to the phenomenon of zero input attribution to\nregions of zero function gradient. This notion is closely related to global attribution, as it requires\nsaliency methods to look beyond input sensitivity. As an example used in prior work [1], consider\nf (x) = a \u2212 ReLU(b \u2212 x), with a = b = 1. At x = 2, even though f (x) = 1, the attribution to the\nonly input is zero, which is deemed counter-intuitive. Integrated gradients [1] and DeepLIFT [3]\nconsider handling such saturation for saliency maps to be a central issue and introduce the concept of\nbaseline inputs to tackle this. However, one potential issue with this is that the attribution to the input\nnow depends on the choice of baseline for a given function. To avoid this, we here argue that is better\nto also provide attributions to some function parameters. In the example shown, the function f (x)\nhas two biases (a, b) and the full-gradient method attributes (1, 0) to these biases for input x = 2.\nFull Sensitivity to Function Mapping: Adebayo et al. [6] recently proposed sanity check criteria\nthat every saliency map must satisfy. The \ufb01rst of these criteria is that a saliency map must be sensitive\nto randomization of the model parameters. Random parameters produce incorrect input-output\nmappings, which must be re\ufb02ected in the saliency map. The second sanity test is that saliency maps\nmust change if the data used to train the model have their labels randomized. A stronger criterion\nwhich generalizes both these criteria is that saliency maps must be sensitive to any change in the\nfunction mapping, induced by changing the parameters. This change of parameters can occur by either\nexplicit randomization of parameters or training with different data. It turns out that input-gradient\nbased methods are insensitive to some bias parameters as shown below.\nExample 2. (Bias insensitivity of input-gradient methods)\nConsider a one-hidden layer net of the form f (x) = w1 \u2217 relu(w0 \u2217 x + b0) + b1. For this, it is easy\nto see that input-gradients [7] are insensitive to small changes in b0 and arbitrarily large changes in\nb1. This applies to all input-gradient methods such as guided backprop [10] and deconv [9]. Thus\nnone of these methods satisfy the model randomization test on f (x) upon randomizing b1.\n\nOn the other hand, full-gradients are sensitive to every parameter that affects the function mapping.\nIn particular, by equation 1 we observe that given full-gradients G, we have \u2202G\n= 0 for a parameter\n\u2202\u03b8i\n\u03b8i, if and only if \u2202f\n\u2202\u03b8i\n\n= 0.\n\n4.2 FullGrad: Full-Gradient Saliency Maps for Convolutional Nets\n\nFor convolutional networks, bias-gradients have a spatial structure which is convenient to visualize.\nConsider a single convolutional \ufb01lter z = w \u2217 x + b where w \u2208 R2k+1, b = [b, b....b] \u2208 RD and (\u2217)\nfor simplicity refers to a convolution with appropriate padding applied so that w \u2217 x \u2208 RD, which is\noften the case with practical convolutional nets. Here the bias parameter is a single scalar b repeated\n\n5\n\n\fD times due to the weight sharing nature of convolutions. For this particular \ufb01lter, the bias-gradient\nf b(x) = \u2207zf (x) (cid:12) b \u2208 RD is shaped like the input x \u2208 RD, and hence can be visualized like the\ninput. Further, the locally connected nature of convolutions imply that each co-ordinate f b(x)i is a\nfunction of only x[i\u2212k, i+k], thus capturing the importance of a group of input co-ordinates centered\nat i. This is easily ensured for practical convolutional networks (e.g.: VGG, ResNet, DenseNet,\netc) which are often designed such that feature sizes of immediate layers match and are aligned by\nappropriate padding.\nFor such nets we can now visualize per-neuron and per-layer maps using bias-gradients. Per-neuron\nmaps are obtained by visualizing a spatial map \u2208 RD for every convolutional \ufb01lter. Per-layer maps\nare obtained by aggregating such neuron-wise maps. An example is shown in Figure 1. For images,\nwe visualize these maps after performing standard post-processing steps that ensure good viewing\ncontrast. These post-processing steps are simple re-scaling operations, often supplemented with an\nabsolute value operation to visualize only the magnitude of importance while ignoring the sign. One\ncan also visualize separately the positive and negative parts of the map to avoid ignoring signs. Let\nsuch post-processing operations be represented by \u03c8(\u00b7). For maps that are downscaled versions of\ninputs, such post-processing also includes a resizing operation, often done by standard algorithms\nsuch as cubic interpolation.\nWe can also similarly visualize approximate network-wide saliency maps by aggregating such layer-\nwise maps. Let c run across channels cl of a layer l in a neural network, then the FullGrad saliency\nmap Sf (x) is given by\n\n(cid:88)\n\n(cid:88)\n\n\u03c8(cid:0)f b(x)c\n\n(cid:1)\n\nSf (x) = \u03c8(\u2207xf (x) (cid:12) x) +\n\nl\u2208L\n\nc\u2208cl\n\n(2)\n\nHere, \u03c8(\u00b7) is the post-processing operator discussed above. For this paper, we choose \u03c8(\u00b7) =\nbilinearUpsample(rescale(abs(\u00b7))), where rescale(\u00b7) linearly rescales values to lie be-\ntween 0 and 1, and bilinearUpsample(\u00b7) upsamples the gradient maps using bilinear interpolation\nto have the same spatial size as the image. For a network with both convolutional and fully-connected\nlayers, we can obtain spatial maps for only the convolutional layers and hence the effect of fully-\nconnected layers\u2019 bias parameters are not completely accounted for. Note that omitting \u03c8(\u00b7) and\nperforming an additional spatial aggregation in the equation above results in the exact neural net\noutput value according to the full-gradient decomposition. Further discussion on post-processing is\npresented in Section 6.\nWe stress here that the FullGrad saliency map described here is approximate, in the sense that the full\nrepresentation is in fact G = (\u2207xf (x), f b(x)) \u2208 RD+F , and our network-wide saliency map merely\nattempts to capture information from multiple maps into a single visually coherent one. This saliency\nmap has the disadvantage that all saliency maps have, i.e. they cannot satisfy both completeness and\nweak dependence at the same time, and changing the aggregation method (such as removing (cid:12)x in\nequation 2, or changing \u03c8(\u00b7)) can help us satisfy one property or the other. Experimentally we \ufb01nd\nthat aggregating maps as per equation 2 produces the sharpest maps, as it enables neuron-wise maps\nto vote independently on the importance of each spatial location.\n\n5 Experiments\n\nTo show the effectiveness of FullGrad, we perform two quantitative experiments. First, we use a pixel\nperturbation procedure to evaluate saliency maps on the Imagenet 2012 dataset. Second, we use the\nremove and retrain procedure [8] to evaluate saliency maps on the CIFAR100 dataset.\n\n5.1 Pixel perturbation\n\nPopular methods to benchmark saliency algorithms are variations of the following procedure: remove\nk most salient pixels and check variation in function value. The intuition is that good saliency\nalgorithms identify pixels that are important to classi\ufb01cation and hence cause higher function output\nvariation. Benchmarks with this broad strategy are employed in [13, 11]. However, this is not a\nperfect benchmark because replacing image pixels with black pixels can cause high-frequency edge\nartifacts to appear which may cause output variation. When we employed this strategy for a VGG-16\nnetwork trained on Imagenet, we \ufb01nd that several saliency methods have similar output variation\n\n6\n\n\f(a)\n\n(b)\n\nFigure 2: Quantitative results on saliency maps. (a) Pixel perturbation benchmark (see Section 5.1)\non Imagenet 2012 validation set where we remove k% least salient pixels and measure absolute value\nof fractional output change. The lower the curve, the better. (b) Remove and retrain benchmark (see\nSection 5.2) on CIFAR100 dataset done by removing k% most salient pixels, retraining a classi\ufb01er\nand measuring accuracy. The lower the accuracy, the better. Results are averaged across three runs.\nNote that the scales of standard deviation are different for both graphs.\n\nto random pixel removal. This effect is also present in large scale experiments by [13, 11]. This\noccurs because random pixel removal creates a large number of disparate artifacts that easily confuse\nthe model. As a result, it is dif\ufb01cult to distinguish methods which create unnecessary artifacts from\nthose that perform reasonable attributions. To counter this effect, we slightly modify this procedure\nand propose to remove the k least salient pixels rather than the most salient ones. In this variant,\nmethods that cause the least change in function output better identify unimportant regions in the\nimage. We argue that this benchmark is better as it partially decouples the effects of artifacts from\nthat of removing salient pixels.\nSpeci\ufb01cally, our procedure is as follows: for a given value of k, we replace the k image pixels\ncorresponding to k least saliency values with black pixels. We measure the neural network function\noutput for the most con\ufb01dent class, before and after perturbation, and plot the absolute value of the\nfractional difference. We use our pixel perturbation test to evaluate full-gradient saliency maps on the\nImagenet 2012 validation dataset, using a VGG-16 model with batch normalization. We compare\nwith gradCAM [15], input-gradients [7], smooth-grad [16] and integrated gradients [1]. For this test,\nwe also measure the effect of random pixel removal as a baseline to estimate the effect of artifact\ncreation. We observe that FullGrad causes the least change in output value, and are hence able to\nbetter estimate which pixels are unimportant.\n\n5.2 Remove and Retrain\n\nRemOve And Retrain (ROAR) [8] is another approximate benchmark to evaluate how well saliency\nmethods explain model behavior. The test is as follows: remove the top-k pixels of an image identi\ufb01ed\nby the saliency map for the entire dataset, and retrain a classi\ufb01er on this modi\ufb01ed dataset. If a saliency\nalgorithm indeed correctly identi\ufb01es the most crucial pixels, then the retrained classi\ufb01er must have\na lower accuracy than the original. Thus an ideal saliency algorithm is one that is able to reduce\nthe accuracy the most upon retraining. Retraining compensates for presence of deletion artifacts\ncaused by removing top-k pixels, which could otherwise mislead the model. This is also not a perfect\nbenchmark, as the retrained model now has additional cues such as the positions of missing pixels,\nand other visible cues which it had previously ignored. In contrast to the pixel perturbation test which\nplaces emphasis on identifying unimportant regions, this test rewards methods that correctly identify\nimportant pixels in the image.\nWe use ROAR to evaluate full-gradient saliency maps on the CIFAR100 dataset, using a 9-layer VGG\nmodel. We compare with gradCAM [15], input-gradients [7], integrated gradients [1] and a smooth\n\n7\n\n101100101% pixels removed0.00.10.20.30.40.5Absolute fractional output changeFullGradInput-GradientgradCAMIntegrated gradientSmoothGradRandom102030405060708090% pixels removed2030405060Accuracy (%)FullGradInput-GradientgradCAMIntegrated GradientSmoothGradRandom\fImage\n\nInput\n\ngradient [7]\n\nIntegrated\ngradient [1]\n\nSmooth-grad\n\n[16]\n\nGrad-CAM\n\n[15]\n\nFullGrad\n(Ours)\n\nFigure 3: Comparison of different neural network saliency methods. Integrated-gradients [1] and\nsmooth-grad [16] produce noisy object boundaries, while grad-CAM [15] indicates important regions\nwithout adhering to boundaries. FullGrad combine both desirable attributes by highlighting salient\nregions while being tightly con\ufb01ned within objects. For more results, please see supplementary\nmaterial.\n\ngrad variant called smooth grad squared [16, 8], which was found to perform among the best on this\nbenchmark. We see that FullGrad is indeed able to decrease the accuracy the most when compared to\nthe alternatives, indicating that they correctly identify important pixels in the image.\n\n5.3 Visual Inspection\n\nWe perform qualitative visual evaluation for FullGrad, along with four baselines: input-gradients\n[7], integrated gradients [1], smooth grad [16] and grad-CAM [15]. We see that the \ufb01rst three maps\nare based on input-gradients alone, and tend to highlight object boundaries more than their interior.\nGrad-CAM, on the other hand, highlights broad regions of the input without demarcating clear\nobject boundaries. FullGrad combine advantages of both \u2013 highlighted regions are con\ufb01ned to object\nboundaries while highlighting its interior at the same time. This is not surprising as FullGrad includes\ninformation both about input-gradients, and also about intermediate-layer gradients like grad-CAM.\nFor input-gradient, integrated gradients and smooth-grad, we do not super-impose the saliency map\non the image, as it reduces visual clarity. More comprehensive results without superimposed images\nfor gradCAM and FullGrad are present in the supplementary material.\n6 How to Choose \u03c8(\u00b7)\n\nIn this section, we shall discuss the trade-offs that arise with particular choices of the post-processing\nfunction \u03c8(\u00b7), which is central to the reduction from full-gradients to FullGrad. Note that by\n\n8\n\n\fProposition 1, any post-processing function cannot satisfy all properties we would like as the resulting\nrepresentation would still be saliency-based. This implies that any particular choice of post-processing\nwould prioritize satisfying some properties over others.\nFor example, the post-processing function used in this paper is suited to perform well with the\ncommonly used evaluation metrics of pixel perturbation and ROAR for image data. These metrics\nemphasize highlighting important regions, and thus the magnitude of saliency seems to be more\nimportant than the sign. However there are other metrics where this form of post-processing does\nnot perform well. One example is the digit-\ufb02ipping experiment [3], where an example task is to turn\nimages of the MNIST digit \"8\" into those of the digit \"3\" by removing pixels which provide positive\nevidence of \"8\" and negative evidence for \"3\". This task emphasizes signed saliency maps, and hence\nthe proposed FullGrad post-processing does not work well here. Having said that, we found that a\nminimal form of post-processing, with \u03c8m(\u00b7) = bilinearUpsample(\u00b7) performed much better on\nthis task. However, this post-processing resulted in a drop in performance on the primary metrics of\npixel perturbation and ROAR. Apart from this, we also found that pixel perturbation experiments\nworked much better on MNIST with \u03c8mnist(\u00b7) = bilinearUpsample(abs(\u00b7)), which was not the\ncase for Imagenet / CIFAR100. Thus it seems that the post-processing method to use may depend\nboth on the metric and the dataset under consideration. Full details of these experiments are presented\nin the supplementary material.\nWe thus provide the following recommendation to practitioners: choose the post-processing\nfunction based on the evaluation metrics that are most relevant to the application and datasets\nconsidered. For most computer vision applications, we believe that the proposed FullGrad post-\nprocessing may be suf\ufb01cient. However, this might not hold for all domains and it might be important\nto de\ufb01ne good evaluation metrics for each case in consultation with domain experts to ascertain the\nfaithfulness of saliency methods to the underlying neural net functions. These issues arise because\nsaliency maps are approximate representations of neural net functionality as shown in Proposition 1,\nand the numerical quantities in the full-gradient representation (equation 1) could be visualized in\nalternate ways.\n\n7 Conclusions and Future Work\n\nIn this paper, we proposed a novel technique dubbed FullGrad to visualize the function mapping\nlearnt by neural networks. This is done by providing attributions to both the inputs and the neurons\nof intermediate layers. Input attributions code for sensitivity to individual input features, while\nneuron attributions account for interactions between the input features. Individually, they satisfy\nweak dependence, a weak notion for local attribution. Together, they satisfy completeness, a desirable\nproperty for global attribution.\nThe inability of saliency methods to satisfy multiple intuitive properties both in theory and practice,\nhas important implications for interpretability. First, it shows that saliency methods are too limiting\nand that we may need more expressive schemes that allow satisfying multiple such properties\nsimultaneously. Second, it may be the case that all interpretability methods have such trade-offs, in\nwhich case we must specify what these trade-offs are in advance for each such method for the bene\ufb01t\nof domain experts. Third, it may also be the case that multiple properties might be mathematically\nirreconcilable, which implies that interpretability may be achievable only in a narrow and speci\ufb01c\nsense.\nAnother point of contention with saliency maps is the lack of unambiguous evaluation metrics. This\nis tautological; if an unambiguous metric indeed existed, the optimal strategy would involve directly\noptimizing over that metric rather than use saliency maps. One possible avenue for future work\nmay be to de\ufb01ne such clear metrics and build models that are trained to satisfy them, thus being\ninterpretable by design.\n\nAcknowledgements\n\nWe would like to thank Anonymous Reviewer #1 for providing constructive feedback during peer-\nreview that helped highlight the importance of post-processing.\nThis work was supported by the Swiss National Science Foundation under the ISUL grant FNS-30209.\n\n9\n\n\fReferences\n[1] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks.\n\narXiv preprint arXiv:1703.01365, 2017.\n\n[2] Gr\u00e9goire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-\nRobert M\u00fcller. Explaining nonlinear classi\ufb01cation decisions with deep taylor decomposition.\nPattern Recognition, 65:211\u2013222, 2017.\n\n[3] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through\npropagating activation differences. In Proceedings of the 34th International Conference on\nMachine Learning-Volume 70, pages 3145\u20133153. JMLR. org, 2017.\n\n[4] Scott M Lundberg and Su-In Lee. A uni\ufb01ed approach to interpreting model predictions. In\n\nAdvances in Neural Information Processing Systems, pages 4765\u20134774, 2017.\n\n[5] Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T Sch\u00fctt, Sven\nD\u00e4hne, Dumitru Erhan, and Been Kim. The (un) reliability of saliency methods. arXiv preprint\narXiv:1711.00867, 2017.\n\n[6] Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim.\nSanity checks for saliency maps. In Advances in Neural Information Processing Systems, pages\n9505\u20139515, 2018.\n\n[7] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks:\nVisualising image classi\ufb01cation models and saliency maps. arXiv preprint arXiv:1312.6034,\n2013.\n\n[8] Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. Evaluating feature\n\nimportance estimates. arXiv preprint arXiv:1806.10758, 2018.\n\n[9] Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In\n\nEuropean conference on computer vision, pages 818\u2013833. Springer, 2014.\n\n[10] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving\n\nfor simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.\n\n[11] Marco Ancona, Enea Ceolini, Cengiz Oztireli, and Markus Gross. Towards better understanding\nof gradient-based attribution methods for deep neural networks. In 6th International Conference\non Learning Representations (ICLR 2018), 2018.\n\n[12] Weili Nie, Yang Zhang, and Ankit Patel. A theoretical explanation for perplexing behaviors of\n\nbackpropagation-based visualizations. arXiv preprint arXiv:1805.07039, 2018.\n\n[13] Wojciech Samek, Alexander Binder, Gr\u00e9goire Montavon, Sebastian Lapuschkin, and Klaus-\nRobert M\u00fcller. Evaluating the visualization of what a deep neural network has learned. IEEE\ntransactions on neural networks and learning systems, 28(11):2660\u20132673, 2016.\n\n[14] Pieter-Jan Kindermans, Kristof Sch\u00fctt, Klaus-Robert M\u00fcller, and Sven D\u00e4hne. Investigating\nthe in\ufb02uence of noise and distractors on the interpretation of neural networks. arXiv preprint\narXiv:1611.07270, 2016.\n\n[15] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi\nParikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-\nbased localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages\n618\u2013626. IEEE, 2017.\n\n[16] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Vi\u00e9gas, and Martin Wattenberg. Smooth-\n\ngrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.\n\n10\n\n\f", "award": [], "sourceid": 2284, "authors": [{"given_name": "Suraj", "family_name": "Srinivas", "institution": "Idiap Research Institute & EPFL"}, {"given_name": "Fran\u00e7ois", "family_name": "Fleuret", "institution": "Idiap"}]}