{"title": "Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives", "book": "Advances in Neural Information Processing Systems", "page_first": 592, "page_last": 603, "abstract": "In this paper we propose a novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network. Given an input we find what should be minimally and sufficiently present (viz. important object pixels in an image) to justify its classification and analogously what should be minimally and necessarily \\emph{absent} (viz. certain background pixels). We argue that such explanations are natural for humans and are used commonly in domains such as health care and criminology. What is minimally but critically \\emph{absent} is an important part of an explanation, which to the best of our knowledge, has not been explicitly identified by current explanation methods that explain predictions of neural networks. We validate our approach on three real datasets obtained from diverse domains; namely, a handwritten digits dataset MNIST, a large procurement fraud dataset and a brain activity strength dataset. In all three cases, we witness the power of our approach in generating precise explanations that are also easy for human experts to understand and evaluate.", "full_text": "Explanations based on the Missing: Towards\n\nContrastive Explanations with Pertinent Negatives\n\nAmit Dhurandhar\u2217\n\nIBM Research\n\nYorktown Heights, NY 10598\n\nadhuran@us.ibm.com\n\nPin-Yu Chen\u2217\nIBM Research\n\nYorktown Heights, NY 10598\n\npin-yu.chen@ibm.com\n\nRonny Luss\nIBM Research\n\nYorktown Heights, NY 10598\n\nrluss@us.ibm.com\n\nChun-Chen Tu\n\nUniversity of Michigan\nAnn Arbor, MI 48109\ntimtu@umich.edu\n\nPaishun Ting\n\nUniversity of Michigan\nAnn Arbor, MI 48109\npaishun@umich.edu\n\nKarthikeyan Shanmugam\n\nIBM Research\n\nYorktown Heights, NY 10598\n\nkarthikeyan.shanmugam2@ibm.com\n\nPayel Das\n\nIBM Research\n\nYorktown Heights, NY 10598\n\ndaspa@us.ibm.com\n\nAbstract\n\nIn this paper we propose a novel method that provides contrastive explanations\njustifying the classi\ufb01cation of an input by a black box classi\ufb01er such as a deep\nneural network. Given an input we \ufb01nd what should be minimally and suf\ufb01ciently\npresent (viz.\nimportant object pixels in an image) to justify its classi\ufb01cation\nand analogously what should be minimally and necessarily absent (viz. certain\nbackground pixels). We argue that such explanations are natural for humans and are\nused commonly in domains such as health care and criminology. What is minimally\nbut critically absent is an important part of an explanation, which to the best of\nour knowledge, has not been explicitly identi\ufb01ed by current explanation methods\nthat explain predictions of neural networks. We validate our approach on three\nreal datasets obtained from diverse domains; namely, a handwritten digits dataset\nMNIST, a large procurement fraud dataset and a brain activity strength dataset.\nIn all three cases, we witness the power of our approach in generating precise\nexplanations that are also easy for human experts to understand and evaluate.\n\n1\n\nIntroduction\n\nSteve is the tall guy with long hair who does not wear glasses. Explanations as such are used frequently\nby people to identify other people or items of interest. We see in this case that characteristics such\nas being tall and having long hair help describe the person, although incompletely. The absence of\nglasses is important to complete the identi\ufb01cation and help distinguish him from, for instance, Bob\nwho is tall, has long hair and wears glasses. It is common for us humans to state such contrastive\nfacts when we want to accurately explain something. These contrastive facts are by no means a list of\nall possible characteristics that should be absent in an input to distinguish it from all other classes\nthat it does not belong to, but rather a minimal set of characteristics/features that help distinguish it\nfrom the \"closest\" class that it does not belong to.\n\n\u2217First two authors have equal contribution.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fIn this paper we want to generate such explanations for neural networks, in which, besides highlighting\nwhat is minimally suf\ufb01cient (e.g. tall and long hair) in an input to justify its classi\ufb01cation, we also\nwant to identify contrastive characteristics or features that should be minimally and critically absent\n(e.g. glasses), so as to maintain the current classi\ufb01cation and to distinguish it from another input that\nis \"closest\" to it but would be classi\ufb01ed differently (e.g. Bob). We thus want to generate explanations\nof the form, \"An input x is classi\ufb01ed in class y because features fi,\u00b7\u00b7\u00b7 , fk are present and because\nfeatures fm,\u00b7\u00b7\u00b7 , fp are absent.\" The need for such an aspect as what constitutes a good explanation\nhas been stressed on recently [12]. It may seem that such crisp explanations are only possible for\nbinary data. However, they are also applicable to continuous data with no explicit discretization or\nbinarization required. For example, in Figure 1, where we see hand-written digits from MNIST [40]\ndataset, the black background represents no signal or absence of those speci\ufb01c features, which in this\ncase are pixels with a value of zero. Any non-zero value then would indicate the presence of those\nfeatures/pixels. This idea also applies to colored images where the most prominent pixel value (say\nmedian/mode of all pixel values) can be considered as no signal and moving away from this value\ncan be considered as adding signal. One may also argue that there is some information loss in our\nform of explanation, however we believe that such explanations are lucid and easily understandable\nby humans who can always further delve into the details of our generated explanations such as\nthe precise feature values, which are readily available. Moreover, the need for such simple, clear\nexplanations over unnecessarily complex and detailed ones is emphasized in the recent General Data\nProtection Regulation (GDPR) passed in Europe [41].\nIn fact, there is another strong motivation to have\nsuch form of explanations due to their presence\nin certain human-critical domains. In medicine\nand criminology there is the notion of pertinent\npositives and pertinent negatives [15], which\ntogether constitute a complete explanation. A\npertinent positive (PP) is a factor whose pres-\nence is minimally suf\ufb01cient in justifying the \ufb01nal\nclassi\ufb01cation. On the other hand, a pertinent\nnegative (PN) is a factor whose absence is nec-\nessary in asserting the \ufb01nal classi\ufb01cation. For\nexample in medicine, a patient showing symp-\ntoms of cough, cold and fever, but no sputum or\nchills, will most likely be diagnosed as having\n\ufb02u rather than having pneumonia. Cough, cold\nand fever could imply both \ufb02u or pneumonia,\nhowever, the absence of sputum and chills leads to the diagnosis of \ufb02u. Thus, sputum and chills are\npertinent negatives, which along with the pertinent positives are critical and in some sense suf\ufb01cient\nfor an accurate diagnosis.\nWe thus propose an explanation method called contrastive explanations method (CEM) for neural\nnetworks that highlights not only the pertinent positives but also the pertinent negatives. This is seen\nin Figure 1 where our explanation of the image being predicted as a 3 in the \ufb01rst row does not only\nhighlight the important pixels (which look like a 3) that should be present for it to be classi\ufb01ed as a 3,\nbut also highlights a small horizontal line (the pertinent negative) at the top whose presence would\nchange the classi\ufb01cation of the image to a 5 and thus should be absent for the classi\ufb01cation to remain\na 3. Therefore, our explanation for the digit in row 1 of Figure 1 to be a 3 would be: The row 1 digit\nis a 3 because the cyan pixels (shown in column 2) are present and the pink pixels (shown in column\n3) are absent. This second part is critical for an accurate classi\ufb01cation and is not highlighted by any\nof the other state-of-the-art interpretability methods such as layerwise relevance propagation (LRP)\n[1] or locally interpretable model-agnostic explanations (LIME) [30], for which the respective results\nare shown in columns 4 and 5 of Figure 1. Moreover, given the original image, our pertinent positives\nhighlight what should be present that is necessary and suf\ufb01cient for the example to be classi\ufb01ed as\na 3. This is not the case for the other methods, which essentially highlight positively or negatively\nrelevant pixels that may not be necessary or suf\ufb01cient to justify the classi\ufb01cation.\nPertinent Negatives vs Negatively Relevant Features: Another important thing to note here is\nthe conceptual distinction between pertinent negatives that we identify and negatively correlated\nor relevant features that other methods highlight. The question we are trying to answer is: why is\n\nFigure 1: CEM versus LRP and LIME on MNIST.\nPP/PN are highlighted in cyan/pink respectively.\nFor LRP, green is neutral, red/yellow is positive rel-\nevance, and blue is negative relevance. For LIME,\nred is positive relevance and white is neutral.\n\n2\n\n\finput x classi\ufb01ed in class y?. Ergo, any human asking this question wants all the evidence in support\nof the hypothesis of x being classi\ufb01ed as class y. Our pertinent positives as well as negatives are\nevidences in support of this hypothesis. However, unlike the positively relevant features highlighted\nby other methods that are also evidence supporting this hypothesis, the negatively relevant features\nby de\ufb01nition do not. Hence, another motivation for our work is that we believe when a human\nasks the above question, they are more interested in evidence supporting the hypothesis rather than\ninformation that devalues it. This latter information is de\ufb01nitely interesting, but is of secondary\nimportance when it comes to understanding the human\u2019s intent behind the question.\nGiven an input and its classi\ufb01cation by a neural network, CEM creates explanations for it as follows:\n(1) It \ufb01nds a minimal amount of (viz. object/non-background) features in the input that are suf\ufb01cient\nin themselves to yield the same classi\ufb01cation (i.e. PPs).\n(2) It also \ufb01nds a minimal amount of features that should be absent (i.e. remain background) in the\ninput to prevent the classi\ufb01cation result from changing (i.e. PNs).\n(3) It does (1) and (2) \"close\" to the data manifold using a state-of-the-art convolutional autoencoder\n(CAE) [25] so as to obtain more \"realistic\" explanations.\nWe enhance our methods to do (3), so that the resulting explanations are more likely to be close to\nthe true data manifold and thus match human intuition rather than arbitrary perturbations that may\nchange the classi\ufb01cation. Of course, learning a good representation using an autoencoder may not be\npossible in all situations due to limitations such as insuf\ufb01cient data or bad data quality. It also may\nnot be necessary if all combinations of feature values have semantics in the domain or the data does\nnot lie on low dimensional manifold as is the case with images.\nWe validate our approaches on three real-world datasets. The \ufb01rst is MNIST [40], from which we\ngenerate explanations with and without an autoencoder. The second is a procurement fraud dataset [9]\nfrom a large corporation containing millions of invoices that have different risk levels. The third one\nis a brain functional MRI (fMRI) imaging dataset from the publicly accessible Autism Brain Imaging\nData Exchange (ABIDE) I database [11], which comprises of resting-state fMRI acquisitions of\nsubjects diagnosed with autism spectrum disorder (ASD) and neurotypical individuals. For the latter\ntwo cases, we do not consider using autoencoders. This is because the fMRI dataset is insuf\ufb01ciently\nlarge especially given its high-dimensionality. For the procurement data, all combination of allowed\nfeature values are (intuitively) reasonable. In all three cases, we witness the power of our approach in\ncreating more precise explanations that also match human judgment.\n\n2 Related Work\n\nResearchers have put great efforts in devising algorithms for interpretable modeling. Examples\ninclude establishment for rule/decision lists [39, 36], prototype exploration [19, 13], developing\nmethods inspired by psychometrics [17] and learning human-consumable models [6]. Moreover,\nthere is also some interesting work which tries to formalize and quantify interpretability [10].\nA recent survey [24] looks primarily at two methods for understanding neural networks: a) Methods\n[26, 27] that produce a prototype for a given class, b) Explaining a neural network\u2019s decision on\nan input by highlighting relevant parts [1, 20, 30, 33]. Other works also investigate methods of the\ntype (b) for vision [34, 35, 29] and NLP applications [22]. Most of the these explanation methods,\nhowever, focus on features that are present, even if they may highlight negatively contributing\nfeatures to the \ufb01nal classi\ufb01cation. As such, they do not identify features that should be necessarily\nand suf\ufb01ciently present or absent to justify for an individual example its classi\ufb01cation by the model.\nThere are methods which perturb the input and remove features [32], however these are more from an\nevaluation standpoint where a given explanation is quantitatively evaluated based on such procedures.\nRecently, there has been a piece of work [31] that tries to \ufb01nd suf\ufb01cient conditions to justify\nclassi\ufb01cation decisions. As such, this work tries to \ufb01nd feature values whose presence conclusively\nimplies a class. Hence, these are global rules (called anchors) that are suf\ufb01cient in predicting a class.\nOur PPs and PNs on the other hand are customized for each input. Moreover, a dataset may not\nalways possess such anchors, although one can almost always \ufb01nd PPs and PNs. There is also work\n[43] that tries to \ufb01nd stable insight that can be conveyed to the user in a (asymmetric) binary setting\nfor smallish neural networks.\n\n3\n\n\fIt is also important to note that our method is related to methods that generate adversarial examples\n[5, 7]. However, there are certain key differences. Firstly, the (untargeted) attack methods are largely\nunconstrained where additions and deletions are performed simultaneously, while in our case for PPs\nand PNs we only allow deletions and additions respectively. Secondly, our optimization objective\nfor PPs is itself distinct as we are searching for features that are minimally suf\ufb01cient in themselves\nto maintain the original classi\ufb01cation. As such, our work demonstrates how attack methods can be\nadapted to create effective explanation methods.\n\n3 Contrastive Explanations Method\nThis section details the proposed contrastive explanations method. Let X denote the feasible data\nspace and let (x0, t0) denote an example x0 \u2208 X and its inferred class label t0 obtained from a neural\nnetwork model. The modi\ufb01ed example x \u2208 X based on x0 is de\ufb01ned as x = x0 + \u03b4, where \u03b4 is a\nperturbation applied to x0. Our method of \ufb01nding pertinent positives/negatives is formulated as an\noptimization problem over the perturbation variable \u03b4 that is used to explain the model\u2019s prediction\nresults. We denote the prediction of the model on the example x by Pred(x), where Pred(\u00b7) is any\nfunction that outputs a vector of prediction scores for all classes, such as prediction probabilities and\nlogits (unnormalized probabilities) that are widely used in neural networks, among others.\nTo ensure the modi\ufb01ed example x is still close to the data manifold of natural examples, we propose\nto use an autoencoder to evaluate the closeness of x to the data manifold. We denote by AE(x) the\nreconstructed example of x using the autoencoder AE(\u00b7).\n\n3.1 Finding Pertinent Negatives (PN)\n\nFor pertinent negative analysis, one is interested in what is missing in the model prediction. For any\nnatural example x0, we use the notation X /x0 to denote the space of missing parts with respect to\nx0. We aim to \ufb01nd an interpretable perturbation \u03b4 \u2208 X /x0 to study the difference between the most\nprobable class predictions in arg maxi[Pred(x0)]i and arg maxi[Pred(x0 + \u03b4)]i. Given (x0, t0), our\nmethod \ufb01nds a pertinent negative by solving the following optimization problem:\n\nc \u00b7 f neg\n\n\u03ba (x0, \u03b4) + \u03b2(cid:107)\u03b4(cid:107)1 + (cid:107)\u03b4(cid:107)2\n\n2 + \u03b3(cid:107)x0 + \u03b4 \u2212 AE(x0 + \u03b4)(cid:107)2\n2.\n\n(1)\n\nmin\n\u03b4\u2208X /x0\n\nWe elaborate on the role of each term in the objective function (1) as follows. The \ufb01rst term f neg\n\u03ba (x0, \u03b4)\nis a designed loss function that encourages the modi\ufb01ed example x = x0 + \u03b4 to be predicted as a\ndifferent class than t0 = arg maxi[Pred(x0)]i. The loss function is de\ufb01ned as:\n\n\u03ba (x0, \u03b4) = max{[Pred(x0 + \u03b4)]t0 \u2212 max\nf neg\ni(cid:54)=t0\n\n[Pred(x0 + \u03b4)]i,\u2212\u03ba}\n\n(2)\n\nwhere [Pred(x0 + \u03b4)]i is the i-th class prediction score of x0 + \u03b4. The hinge-like loss function favors\nthe modi\ufb01ed example x to have a top-1 prediction class different from that of the original example x0.\nThe parameter \u03ba \u2265 0 is a con\ufb01dence parameter that controls the separation between [Pred(x0 + \u03b4)]t0\nand maxi(cid:54)=t0[Pred(x0 + \u03b4)]i. The second and the third terms \u03b2(cid:107)\u03b4(cid:107)1 + (cid:107)\u03b4(cid:107)2\n2 in (1) are jointly called\nthe elastic net regularizer, which is used for ef\ufb01cient feature selection in high-dimensional learning\nproblems [44]. The last term (cid:107)x0 + \u03b4 \u2212 AE(x0 + \u03b4)(cid:107)2\n2 is an L2 reconstruction error of x evaluated\nby the autoencoder. This is relevant provided that a well-trained autoencoder for the domain is\nobtainable. The parameters c, \u03b2, \u03b3,\u2265 0 are the associated regularization coef\ufb01cients.\n\n3.2 Finding Pertinent Positives (PP)\n\nFor pertinent positive analysis, we are interested in the critical features that are readily present in\nthe input. Given a natural example x0, we denote the space of its existing components by X \u2229 x0.\nHere we aim at \ufb01nding an interpretable perturbation \u03b4 \u2208 X \u2229 x0 such that after removing it from x0,\narg maxi[Pred(x0)]i = arg maxi[Pred(\u03b4)]i. That is, x0 and \u03b4 will have the same top-1 prediction\nclass t0, indicating that the removed perturbation \u03b4 is representative of the model prediction on\nx0. Similar to \ufb01nding pertinent negatives, we formulate \ufb01nding pertinent positives as the following\noptimization problem:\n\nc \u00b7 f pos\n\n\u03ba (x0, \u03b4) + \u03b2(cid:107)\u03b4(cid:107)1 + (cid:107)\u03b4(cid:107)2\n\n2 + \u03b3(cid:107)\u03b4 \u2212 AE(\u03b4)(cid:107)2\n2,\n\nmin\n\n\u03b4\u2208X\u2229x0\n\n(3)\n\n4\n\n\fAlgorithm 1 Contrastive Explanations Method (CEM)\n\nInput: example (x0, t0), neural network model N and (optionally (\u03b3 > 0)) an autoencoder AE\n1) Solve (1) and obtain,\n\u03b4neg \u2190 argmin\u03b4\u2208X /x0 c \u00b7 f neg\n2) Solve (3) and obtain,\n\u03b4pos \u2190 argmin\u03b4\u2208X\u2229x0 c \u00b7 f pos\nreturn \u03b4pos and \u03b4neg. {Our Explanation: Input x0 is classi\ufb01ed as class t0 because features\n\u03b4pos are present and because features \u03b4neg are absent. Code at https://github.com/IBM/\nContrastive-Explanation-Method }\n\n2 + \u03b3(cid:107)x0 + \u03b4 \u2212 AE(x0 + \u03b4)(cid:107)2\n2.\n2 + \u03b3(cid:107)\u03b4 \u2212 AE(\u03b4)(cid:107)2\n2.\n\n\u03ba (x0, \u03b4) + \u03b2(cid:107)\u03b4(cid:107)1 + (cid:107)\u03b4(cid:107)2\n\u03ba (x0, \u03b4) + \u03b2(cid:107)\u03b4(cid:107)1 + (cid:107)\u03b4(cid:107)2\n\nwhere the loss function f pos\n\n\u03ba (x0, \u03b4) is de\ufb01ned as\n\u03ba (x0, \u03b4) = max{max\nf pos\ni(cid:54)=t0\n\n[Pred(\u03b4)]i \u2212 [Pred(\u03b4)]t0,\u2212\u03ba}.\n\n(4)\n\nIn other words, for any given con\ufb01dence \u03ba \u2265 0, the loss function f pos\nis greater than maxi(cid:54)=t0 [Pred(\u03b4)]i by at least \u03ba.\n\n\u03ba is minimized when [Pred(\u03b4)]t0\n\n3.3 Algorithmic Details\n\nWe apply a projected fast iterative shrinkage-thresholding algorithm (FISTA) [2] to solve problems\n(1) and (3). FISTA is an ef\ufb01cient solver for optimization problems involving L1 regularization.\nTake pertinent negative as an example, assume X = [\u22121, 1]p, X /x0 = [0, 1]p and let g(\u03b4) =\n\u03ba (x0, \u03b4) + (cid:107)\u03b4(cid:107)2\nf neg\n2 denote the objective function of (1) without the\nL1 regularization term. Given the initial iterate \u03b4(0) = 0, projected FISTA iteratively updates the\nperturbation I times by\n\n2 + \u03b3(cid:107)x0 + \u03b4 \u2212 AE(x0 + \u03b4)(cid:107)2\n\n\u03b4(k+1) = \u03a0[0,1]p{S\u03b2(y(k) \u2212 \u03b1k\u2207g(y(k)))};\ny(k+1) = \u03a0[0,1]p{\u03b4(k+1) +\n\n(\u03b4(k+1) \u2212 \u03b4(k))},\n\nk\n\n(6)\nwhere \u03a0[0,1]p denotes the vector projection onto the set X /x0 = [0, 1]p, \u03b1k is the step size, y(k) is\na slack variable accounting for momentum acceleration with y(0) = \u03b4(0), and S\u03b2 : Rp (cid:55)\u2192 Rp is an\nelement-wise shrinkage-thresholding function de\ufb01ned as\n\nk + 3\n\n(5)\n\n(cid:40) zi \u2212 \u03b2,\n\n[S\u03b2(z)]i =\n\n0,\nzi + \u03b2,\n\nif zi > \u03b2;\nif |zi| \u2264 \u03b2;\nif zi < \u2212\u03b2,\n\n(7)\n\nk=1 such that f neg\n\n\u03ba (x0, \u03b4(k\u2217)) = 0 and k\u2217 = arg mink\u2208{1,...,I} \u03b2(cid:107)\u03b4(cid:107)1 + (cid:107)\u03b4(cid:107)2\n\nfor any i \u2208 {1, . . . , p}. The \ufb01nal perturbation \u03b4(k\u2217) for pertinent negative analysis is selected from the\nset {\u03b4(k)}I\n2. A similar\nprojected FISTA optimization approach is applied to pertinent positive analysis.\nEventually, as seen in Algorithm 1, we use both the pertinent negative \u03b4neg and the pertinent positive\n\u03b4pos obtained from our optimization methods to explain the model prediction. The last term in both\n(1) and (3) will be included only when an accurate autoencoder is available, else \u03b3 is set to zero.\n\n4 Experiments\n\nThis section provides experimental results on three representative datasets, including the handwritten\ndigits dataset MNIST, a procurement fraud dataset obtained from a large corporation having millions\nof invoices and tens of thousands of vendors, and a brain imaging fMRI dataset containing brain\nactivity patterns for both normal and autistic individuals. We compare our approach with previous\nstate-of-the-art methods and demonstrate our superiority in being able to generate more accurate and\nintuitive explanations. Implementation details of projected FISTA are given in the supplement.\n\n4.1 Handwritten Digits\n\nWe \ufb01rst report results on the handwritten digits MNIST dataset. In this case, we provide examples of\nexplanations for our method with and without an autoencoder.\n\n5\n\n\f4.1.1 Setup\n\nThe handwritten digits are classi\ufb01ed using a feed-forward convolutional neural network (CNN)\ntrained on 60,000 training images from the MNIST benchmark dataset. The CNN has two sets of\nconvolution-convolution-pooling layers, followed by three fully-connected layers. Further details\nabout the CNN whose test accuracy was 99.4% and a detailed description of the CAE which consists\nof an encoder and a decoder component are given in the supplement.\n\n4.1.2 Results\n\nOur CEM method is applied to MNIST with\na variety of examples illustrated in Figure 2.\nIn addition to what was shown in Figure 1 in\nthe introduction, results using a convolutional\nautoencoder (CAE) to learn the pertinent posi-\ntives and negatives are displayed. While results\nwithout an CAE are quite convincing, the CAE\nclearly improves the pertinent positives and neg-\natives in many cases. Regarding pertinent pos-\nitives, the cyan highlighted pixels in the column\nwith CAE (CAE CEM PP) are a superset to\nthe cyan-highlighted pixels in column without\n(CEM PP). While these explanations are at the\nsame level of con\ufb01dence regarding the classi\ufb01er,\nexplanations using an AE are visually more in-\nterpretable. Take for instance the digit classi\ufb01ed\nas a 2 in row 2. A small part of the tail of a 2\nis used to explain the classi\ufb01er without a CAE,\nwhile the explanation using a CAE has a much\nthicker tail and larger part of the vertical curve.\nIn row 3, the explanation of the 3 is quite clear,\nbut the CAE highlights the same explanation\nbut much thicker with more pixels. The same\npattern holds for pertinent negatives. The hor-\nizontal line in row 4 that makes a 4 into a 9 is\nmuch more pronounced when using a CAE. The\nchange of a predicted 7 into a 9 in row 5 using a\nCAE is much more pronounced. The other rows\nexhibit similar patterns, and further examples\ncan be found in the supplement.\nThe two state-of-the-art methods we use for ex-\nplaining the classi\ufb01er in Figure 2 are LRP and\nLIME. LRP experiments used the toolbox from\n[21] and LIME code was adapted from https://github.com/marcotcr/lime. LRP has a vi-\nsually appealing explanation at the pixel level. Most pixels are deemed irrelevant (green) to the\nclassi\ufb01cation (note the black background of LRP results was actually neutral). Positively relevant\npixels (yellow/red) are mostly consistent with our pertinent positives, though the pertinent positives\ndo highlight more pixels for easier visualization. The most obvious such examples are row 3 where\nthe yellow in LRP outlines a similar 3 to the pertinent positive and row 6 where the yellow outlines\nmost of what the pertinent positive provably deems necessary for the given prediction. There is little\nnegative relevance in these examples, though we point out two interesting cases. In row 4, LRP shows\nthat the little curve extending the upper left of the 4 slightly to the right has negative relevance (also\nshown by CEM as not being positively pertinent). Similarly, in row 3, the blue pixels in LRP are a\npart of the image that must obviously be deleted to see a clear 3. LIME is also visually appealing.\nHowever, the results are based on superpixels - the images were \ufb01rst segmented and relevant segments\nwere discovered. This explains why most of the pixels forming the digits are found relevant. While\nboth methods give important intuitions, neither illustrate what is necessary and suf\ufb01cient about the\nclassi\ufb01er results as does our contrastive explanations method.\n\nFigure 2: CEM versus LRP and LIME on MNIST.\nPP/PN are highlighted in cyan/pink respectively.\nFor LRP, green is neutral, red/yellow is positive rel-\nevance, and blue is negative relevance. For LIME,\nred is positive relevance and white is neutral.\n\n6\n\n\fID Risk\n1\nLow\n\nEvents\n1, 2, 9\n\nPP\n2, 9\n\n2 Medium 2, 4, 7\n\n2, 4\n\n3\n\nHigh\n\n1, 4, 5,\n11\n\n1, 4, 11\n\n6\n\n2,\n9\n\nPN Expert Feedback\n7\n\n... vendor being registered and having a DUNs number makes\nthe invoice low risk. However, if it came from a low CPI country\nthen the risk would be uplifted given that the invoice amount is\nalready high.\n... the vendor being registered with the company keeps the risk\nmanageable given that it is a risky commodity code. Nonethe-\nless, if he was part of any of the FPL lists the invoice would\nmost de\ufb01nitely be blocked.\n... the high invoice amount, the risky commodity code and no\nphysical address makes this invoice high risk. The risk level\nwould de\ufb01nitely have been somewhat lesser if the vendor was\nregistered in VMF and DUNs.\n\nTable 2: Above we see 3 example invoices (IDs anonymized), one at low risk, one at medium and\none at high risk level. The corresponding events that triggered and the PPs and PNs identi\ufb01ed by\nour method are shown. We also report human expert feedback, which validates the quality of our\nexplanations. The numbers that the events correspond to are given in Section 4.2.1.\n\n4.2 Procurement Fraud\n\nIn this experiment, we evaluated our methods on a real procurement dataset obtained from a large\ncorporation. This nicely complements our other experiments on image datasets.\n\n4.2.1 Setup\n\nThe data spans a one-year period and consists of millions of invoices submitted by over tens\nof thousands vendors across 150 countries. The invoices were labeled as being either low\nrisk, medium risk, or high risk based on a large team that approves these invoices. To\nmake such an assessment, besides just the invoice data, we and the team had access to mul-\ntiple public and private data sources such as vendor master \ufb01le (VMF), risky vendors list\n(RVL), risky commodity list (RCL), \ufb01nancial index (FI), forbidden parties list (FPL) [4, 37],\ncountry perceptions index (CPI) [18],\ntax havens list (THL) and Dun & Bradstreet num-\nbers (DUNs) [3]. Details describing each of these data sources are given in the supplement.\nBased on the above data sources, there are tens\nof features and events whose occurrence hints\nat the riskiness of an invoice. Here are some\nrepresentative ones. 1) if the spend with a par-\nticular vendor is signi\ufb01cantly higher than with\nother vendors in the same country, 2) if a vendor\nis registered with a large corporation and thus\nits name appears in VMF, 3) if a vendor belongs\nto RVL, 4) if the commodity on the invoice be-\nlongs to RCL, 5) if the maturity based on FI is\nlow, 6) if vendor belongs to FPL, 7) if a vendor\nis in a high risk country (i.e. CPI < 25), 8) if a vendor or its bank account is located in a tax haven,\n9) if a vendor has a DUNs number, 10) if a vendor and the employee bank account numbers match,\n11) if a vendor only possesses a PO box with no street address.\nWith these data, we trained a three-layer neural network with fully connected layers, 512 recti\ufb01ed\nlinear units and a three-way softmax function. The 10-fold cross validation accuracy of the network\nwas high (91.6%).\n\nTable 1: Above we see the percentage of invoices\non which the explanations were deemed accept-\nable by experts. For LIME and LRP we picked\npositively relevant features as proxies for PPs.\n\nMethod\nCEM\nLIME\nLRP\n\nPP % Match\n\nPN % Match\n\n90.3\n86.6\n88.2\n\n94.7\nN/A\nN/A\n\n4.2.2 Results\n\nWith the help of domain experts, we evaluated the different explanation methods. We randomly chose\n15 invoices that were classi\ufb01ed as low risk, 15 classi\ufb01ed as medium risk and 15 classi\ufb01ed as high\nrisk. We asked for feedback on these 45 invoices in terms of whether or not the pertinent positives\nand pertinent negatives highlighted by each of the methods was suitable to produce the classi\ufb01cation.\nTo evaluate each method, we computed the percentage of invoices with explanations agreed by the\nexperts based on this feedback.\n\n7\n\n\fFigure 3: CEM versus LRP on pre-processed resting-state brain fMRI connectivity data from the\nopen-access ABIDE I database. (A) Seven networks of functionally coupled regions across the\ncerebral cortex [8]. Color scheme: Purple: Visual (VIS), blue: Somatomotor (SMN), green: Dorsal\nAttention (DAN), violet: Ventral Attention (VAN), cream; Limbic (LN), orange: Frontoparietal\n(FPN), and red: default mode (DMN). (B) CEM PPs/PNs of a classi\ufb01ed autistic brain are in the\nupper/lower triangle respectively. (C) A network-level view of the ROIs (region of interest) involving\nPP and PN functional connections (FCs) in the classi\ufb01ed autistic (denoted as A) and neurotypical\n(denoted as T) subjects. For both (B) and (C), bolder the color higher the strength of the PP and PN\nFCs. (D) For LRP, positive relevance of FCs is depicted in a similar manner as in (C).\n\nIn Table 1, we see the percentage of times the pertinent positives matched with the experts judgment\nfor the different methods as well as additionally the pertinent negatives for ours. We observe that\nin both cases our explanations closely match human judgment. We of course used proxies for the\ncompeting methods as neither of them identify PPs or PNs. There were no really good proxies for\nPNs as negatively relevant features are conceptually quite different as discussed in the supplement.\nTable 2 shows 3 example invoices, one belonging to each class and the explanations produced by our\nmethod along with the expert feedback. We see that the expert feedback validates our explanations\nand showcases the power of pertinent negatives in making the explanations more complete as well\nas intuitive to reason with. An interesting aspect here is that the medium risk invoice could have\nbeen perturbed towards low risk or high risk. However, our method found that it is closer (minimum\nperturbation) to being high risk and thus suggested a pertinent negative that takes it into that class.\nSuch informed decisions can be made by our method as it searches for the most \"crisp\" explanation,\narguably similar to those of humans.\n\n4.3 Brain Functional Imaging\n\nIn this experiment we look at explaining why a certain individual was classi\ufb01ed as autistic as opposed\nto a normal/typical individual.\n\n4.3.1 Setup\n\nThe brain imaging dataset employed in this study is the Autism Brain Imaging Data Exchange\n(ABIDE) I [11], a large publicly available dataset consisting of resting-state fMRI acquisitions of\nsubjects diagnosed with autism spectrum disorder (ASD), as well as of neuro-typical individuals.\nPrecise details about standard ways in which this data was preprocessed is given in the supplement.\nEventually, we had a 200x200 connectivity matrix consisting of real valued correlations for each\nsubject. There were 147 ASD and 146 typical subjects.\nWe trained a single-layer neural network model on TensorFlow. The parameters of the model were\nregularized by an elastic-net regularizer. The leave-one-out cross validation testing accuracy is around\n61.17% that matches the state-of-the-art results [28, 14, 38]. The logits of this network are used as\nmodel prediction scores, and we set X = [0, 1]p, X /x0 = [0, 1]p/x0 and X \u2229 x0 = [0, 1]p \u2229 x0 for\nany natural example x0 \u2208 X .\n\n4.3.2 Results\n\nWith the help of domain experts, we evaluated the performance of CEM and LRP, which performed\nthe best. LIME was challenging to use in this case, since the brain activity patterns are spread over\nthe whole image and no reasonable segmentation of the images forming superpixels was achievable\nhere. Per pixel regression results were signi\ufb01cantly worse than LRP.\n\n8\n\n\fTen subjects were randomly chosen, of which \ufb01ve were classi\ufb01ed as autistic and the rest as neuro-\ntypical. Since the resting-state functional connectivity within and between large-scale brain functional\nnetworks [42] (see Fig. 3A) are often found to be altered in brain disorders including autism, we\ndecided to compare the performance of CEM and LRP in terms of identifying those atypical patterns.\nFig. 3B shows the strong pertinent positive (upper triangle) and pertinent negative (lower triangle)\nfunctional connections (FC) of a classi\ufb01ed ASD subject produced by the CEM method. We further\ngroup these connections with respect to the associated brain network (Fig. 3C). Interestingly, in four\nout of \ufb01ve classi\ufb01ed autistic subjects, pertinent positive FCs are mostly (with a probability > 0.26)\nassociated with the visual network (VIS, shown in purple in Fig 3A). On the other hand, pertinent\nnegative FCs in all \ufb01ve subjects classi\ufb01ed as autistic preferably (with a probability > 0.42) involve the\ndefault mode network (DMN, red regions in Fig. 3A). This trend appears to be reversed in subjects\nclassi\ufb01ed as typical (Fig. 3C). In all \ufb01ve typical subjects, pertinent positive FCs involve DMN (with\nprobability > 0.25), while the pertinent negative FCs correspond to VIS. Taken together, these results\nare consistent with earlier studies, suggesting atypical pattern of brain connectivity in autism [16].\nThe results obtained using CEM further suggest under-connectivity in DMN and over-connectivity\nin visual network, in agreement with prior \ufb01ndings [16, 23]. LRP also identi\ufb01es positively relevant\nFCs that mainly involve DMN regions in all \ufb01ve typical subjects (Fig. 3D). However, LRP associates\npositively relevant FCs from the visual network in only 40% of autistic subjects (Fig. 3D). These\n\ufb01ndings imply superior performance of CEM compared to LRP in robust identi\ufb01cation of pertinent\npositive information from brain functional connectome data of different populations. The extraction\nof pertinent positive and negative features by CEM can further help reduce error (false positives and\nfalse negatives) in such diagnoses.\n\n4.4 Quantitative Evaluation\n\nIn all the above experiments we also quantitatively evaluated our results by passing the PPs, and the\nPNs added to the original input, as independent inputs to the corresponding classi\ufb01ers. We wanted to\nsee here the percentage of times the PPs are classi\ufb01ed into the same class as the original input and\nanalogously the percentage of times the addition of PNs produced a different classi\ufb01cation than the\noriginal input. This type of quantitative evaluation is similar to previous studies [32].\nWe found for both these cases and on all three datasets that our PPs and PNs are 100% effective\nin maintaining or switching classes respectively. This means that our approach can be trusted in\nproducing highly informative and potentially sparse (or minimal) PPs and PNs that are also predictive\non diverse domains.\n\n5 Discussion\n\nIn the previous sections, we showed how our method can be effectively used to create meaningful\nexplanations in different domains that are presumably easier to consume as well as more accurate.\nIt\u2019s interesting that pertinent negatives play an essential role in many domains, where explanations\nare important. As such, it seems though that they are most useful when inputs in different classes are\n\"close\" to each other. For instance, they are more important when distinguishing a diagnosis of \ufb02u\nor pneumonia, rather than say a microwave from an airplane. If the inputs are extremely different\nthen probably pertinent positives are suf\ufb01cient to characterize the input, as there are likely to be many\npertinent negatives, which will presumably overwhelm the user.\nWe believe that our explanation method CEM can be useful for other applications where the end goal\nmay not be to just obtain explanations. For instance, we could use it to choose between models that\nhave the same test accuracy. A model with possibly better explanations may be more robust. We\ncould also use our method for model debugging, i.e., \ufb01nding biases in the model in terms of the type\nof errors it makes or even in extreme case for model improvement.\nIn summary, we have provided a novel explanation method called CEM, which \ufb01nds not only what\nshould be minimally present in the input to justify its classi\ufb01cation by black box classi\ufb01ers such\nas neural networks, but also \ufb01nds contrastive perturbations, in particular, additions, that should be\nnecessarily absent to justify the classi\ufb01cation. To the best of our knowledge this is the \ufb01rst explanation\nmethod that achieves this goal. We have validated the ef\ufb01cacy of our approach on multiple datasets\nfrom different domains, and shown the power of such explanations in terms of matching human\nintuition, thus making for more complete and well-rounded explanations.\n\n9\n\n\fAcknowledgement\n\nWe would like to thank the anonymous reviewers for their constructive comments.\n\nReferences\n[1] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. M\u00fcller, and W. Samek. On pixel-wise\nexplanations for non-linear classi\ufb01er decisions by layer-wise relevance propagation. PloS one,\n10(7):e0130140, 2015.\n\n[2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse\n\nproblems. SIAM journal on imaging sciences, 2(1):183\u2013202, 2009.\n\n[3] D. . Bradstreet. Duns numbers. In US Govt. 2013. http://fedgov.dnb.com/webform.\n\n[4] I.\n\n. S. Bureau.\n\nhttp://www.bis.doc.gov/index.php/policy-guidance/lists-of-parties-of-concern/denied-\npersons-list.\n\nDenied persons\n\nlist.\n\nIn US Dept. of Commerce, 2013.\n\n[5] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE\n\nSymposium on Security and Privacy, 2017.\n\n[6] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. Intelligible models for\nhealthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the\n21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD\n\u201915, pages 1721\u20131730, New York, NY, USA, 2015. ACM.\n\n[7] P.-Y. Chen, Y. Sharma, H. Zhang, J. Yi, and C.-J. Hsieh. Ead: Elastic-net attacks to deep neural\n\nnetworks via adversarial examples. In AAAI, 2018.\n\n[8] R. C. Craddock, G. A. James, P. E. Holtzheimer, X. P. Hu, and H. S. Mayberg. A whole\nbrain fmri atlas generated via spatially constrained spectral clustering. Human brain mapping,\n33(8):1914\u20131928, 2012.\n\n[9] A. Dhurandhar, B. Graves, R. Ravi, G. Maniachari, and M. Ettl. Big data system for analyzing\nrisky entities. In ACM SIGKDD conference on Knowledge Discovery and Data Mining (KDD),\n2015.\n\n[10] A. Dhurandhar, V. Iyengar, R. Luss, and K. Shanmugam. Tip: Typifying the interpretability of\n\nprocedures. arXiv preprint arXiv:1706.02952, 2017.\n\n[11] A. Di Martino, C.-G. Yan, Q. Li, E. Denio, F. X. Castellanos, K. Alaerts, J. S. Anderson, M. As-\nsaf, S. Y. Bookheimer, M. Dapretto, et al. The autism brain imaging data exchange: towards\na large-scale evaluation of the intrinsic brain architecture in autism. Molecular psychiatry,\n19(6):659, 2014.\n\n[12] F. Doshi-Velez, R. B. Mason Kortz, C. Bavitz, D. O. Sam Gershman, S. Schieber, J. Waldo,\nD. Weinberger, and A. Wood. Accountability of ai under the law: The role of explanation. arXiv\npreprint arXiv:1711.01134, 2017.\n\n[13] K. Gurumoorthy, A. Dhurandhar, and G. Cecchi. Protodash: Fast interpretable prototype\n\nselection. arXiv preprint arXiv:1707.01212, 2017.\n\n[14] A. S. Heinsfeld, A. R. Franco, R. C. Craddock, A. Buchweitz, and F. Meneguzzi. Identi\ufb01cation\nof autism spectrum disorder using deep learning and the abide dataset. NeuroImage: Clinical,\n17:16\u201323, 2018.\n\n[15] A. Herman. Are you visually intelligent? what you don\u2019t see is as important as what you do see.\n\nMedical Daily, 2016.\n\n[16] J. V. Hull, Z. J. Jacokes, C. M. Torgerson, A. Irimia, and J. D. Van Horn. Resting-state functional\n\nconnectivity in autism spectrum disorders: A review. Frontiers in psychiatry, 7:205, 2017.\n\n10\n\n\f[17] T. Id\u00e9 and A. Dhurandhar. Supervised item response models for informative prediction. Knowl.\n\nInf. Syst., 51(1):235\u2013257, Apr. 2017.\n\n[18] T. Intl. Corruption perceptions index. 2013. http://www.transparency.org/research/cpi/overview.\n\n[19] B. Kim, R. Khanna, and O. Koyejo. Examples are not enough, learn to criticize! criticism for\n\ninterpretability. In In Advances of Neural Inf. Proc. Systems, 2016.\n\n[20] P.-J. Kindermans, K. T. Sch\u00fctt, M. Alber, K.-R. M\u00fcller, D. Erhan, B. Kim, and S. D\u00e4hne.\nLearning how to explain neural networks: Patternnet and patternattribution. In Intl. Conference\non Learning Representations (ICLR), 2018.\n\n[21] S. Lapuschkin, A. Binder, G. Montavon, K.-R. M\u00fcller, and W. Samek. The lrp toolbox for\n\narti\ufb01cial neural networks. Journal of Machine Learning Research, 17(114):1\u20135, 2016.\n\n[22] T. Lei, R. Barzilay, and T. Jaakkola. Rationalizing neural predictions. arXiv preprint\n\narXiv:1606.04155, 2016.\n\n[23] A. Liska, H. You, and P. Das. Relationship between static and dynamic brain functional\n\nconnectivity in autism spectrum disorders. presented at the ISMRM, in Honolulu, 2017.\n\n[24] G. Montavon, W. Samek, and K.-R. M\u00fcller. Methods for interpreting and understanding deep\n\nneural networks. Digital Signal Processing, 2017.\n\n[25] A. Mousavi, G. Dasarathy, and R. G. Baraniuk. Deepcodec: Adaptive sensing and recovery via\n\ndeep convolutional neural networks. arXiv preprint arXiv:1707.03386, 2017.\n\n[26] A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, and J. Clune. Synthesizing the preferred inputs\nfor neurons in neural networks via deep generator networks. In Advances in Neural Information\nProcessing Systems, pages 3387\u20133395, 2016.\n\n[27] A. Nguyen, J. Yosinski, and J. Clune. Multifaceted feature visualization: Uncovering the\ndifferent types of features learned by each neuron in deep neural networks. arXiv preprint\narXiv:1602.03616, 2016.\n\n[28] J. A. Nielsen, B. A. Zielinski, P. T. Fletcher, A. L. Alexander, N. Lange, E. D. Bigler, J. E.\nLainhart, and J. S. Anderson. Multisite functional connectivity mri classi\ufb01cation of autism:\nAbide results. Frontiers in human neuroscience, 7:599, 2013.\n\n[29] J. Oramas, K. Wang, and T. Tuytelaars. Visual explanation by interpretation: Improving visual\n\nfeedback capabilities of deep neural networks. In arXiv:1712.06302, 2017.\n\n[30] M. Ribeiro, S. Singh, and C. Guestrin. \"why should i trust you?\u201d explaining the predictions of\nany classi\ufb01er. In ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining,\n2016.\n\n[31] M. T. Ribeiro, S. Singh, and C. Guestrin. Anchors: High-precision model-agnostic explanations.\n\nIn AAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2018.\n\n[32] W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K.-R. M\u00fcller. Evaluating the visualiza-\ntion of what a deep neural network has learned. In IEEE Transactions on Neural Networks and\nLearning Systems, 2017.\n\n[33] S.-I. L. Scott Lundberg. Uni\ufb01ed framework for interpretable methods. In In Advances of Neural\n\nInf. Proc. Systems, 2017.\n\n[34] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam:\nVisual explanations from deep networks via gradient-based localization. See https://arxiv.\norg/abs/1610.02391 v3, 2016.\n\n[35] K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising\n\nimage classi\ufb01cation models and saliency maps. CoRR, abs/1312.6034, 2013.\n\n[36] G. Su, D. Wei, K. Varshney, and D. Malioutov. Interpretable two-level boolean rule learning for\n\nclassi\ufb01cation. In https://arxiv.org/abs/1606.05798, 2016.\n\n11\n\n\f[37] A. M.\n\nSystem.\n\nhttps://www.sam.gov/portal/public/SAM/.\n\nExcluded\n\nparties\n\nlist.\n\nIn US Govt.,\n\n2013.\n\n[38] R. Tejwani, A. Liska, H. You, J. Reinen, and P. Das. Autism classi\ufb01cation using brain functional\n\nconnectivity dynamics and machine learning. NIPS workshop BigNeuro, 2017.\n\n[39] F. Wang and C. Rudin. Falling rule lists. In In AISTATS, 2015.\n\n[40] Y. B. Yann LeCun, Leon Bottou and P. Haffner. Gradient-based learning applied to document\n\nrecognition. Proceedings of the IEEE, 86(11):2278\u2013 2324, 1998.\n\n[41] P. N. Yannella and O. Kagan. Analysis: Article 29 working party guidelines on automated\ndecision making under gdpr. 2018. https://www.cyberadviserblog.com/2018/01/analysis-article-\n29-working-party-guidelines-on-automated-decision-making-under-gdpr/.\n\n[42] B. T. Yeo, F. M. Krienen, J. Sepulcre, M. R. Sabuncu, D. Lashkari, M. Hollinshead, J. L.\nRoffman, J. W. Smoller, L. Z\u00f6llei, J. R. Polimeni, et al. The organization of the human\ncerebral cortex estimated by intrinsic functional connectivity. Journal of neurophysiology,\n106(3):1125\u20131165, 2011.\n\n[43] X. Zhang, A. Solar-Lezama, and R. Singh. Interpreting neural network judgments via minimal,\n\nstable, and symbolic corrections. 2018. https://arxiv.org/abs/1802.07384.\n\n[44] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the\n\nRoyal Statistical Society: Series B (Statistical Methodology), 67(2):301\u2013320, 2005.\n\n12\n\n\f", "award": [], "sourceid": 347, "authors": [{"given_name": "Amit", "family_name": "Dhurandhar", "institution": "IBM Research"}, {"given_name": "Pin-Yu", "family_name": "Chen", "institution": "IBM Research AI"}, {"given_name": "Ronny", "family_name": "Luss", "institution": "IBM Research"}, {"given_name": "Chun-Chen", "family_name": "Tu", "institution": "University of Michigan"}, {"given_name": "Paishun", "family_name": "Ting", "institution": "University of Michigan"}, {"given_name": "Karthikeyan", "family_name": "Shanmugam", "institution": "IBM Research, NY"}, {"given_name": "Payel", "family_name": "Das", "institution": "IBM Research"}]}