{"title": "A Unified Approach to Interpreting Model Predictions", "book": "Advances in Neural Information Processing Systems", "page_first": 4765, "page_last": 4774, "abstract": "Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.", "full_text": "A Uni\ufb01ed Approach to Interpreting Model\n\nPredictions\n\nScott M. Lundberg\n\nSu-In Lee\n\nPaul G. Allen School of Computer Science\n\nPaul G. Allen School of Computer Science\n\nUniversity of Washington\n\nSeattle, WA 98105\n\nslund1@cs.washington.edu\n\nDepartment of Genome Sciences\n\nUniversity of Washington\n\nSeattle, WA 98105\n\nsuinlee@cs.washington.edu\n\nAbstract\n\nUnderstanding why a model makes a certain prediction can be as crucial as the\nprediction\u2019s accuracy in many applications. However, the highest accuracy for large\nmodern datasets is often achieved by complex models that even experts struggle to\ninterpret, such as ensemble or deep learning models, creating a tension between\naccuracy and interpretability. In response, various methods have recently been\nproposed to help users interpret the predictions of complex models, but it is often\nunclear how these methods are related and when one method is preferable over\nanother. To address this problem, we present a uni\ufb01ed framework for interpreting\npredictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature\nan importance value for a particular prediction. Its novel components include: (1)\nthe identi\ufb01cation of a new class of additive feature importance measures, and (2)\ntheoretical results showing there is a unique solution in this class with a set of\ndesirable properties. The new class uni\ufb01es six existing methods, notable because\nseveral recent methods in the class lack the proposed desirable properties. Based\non insights from this uni\ufb01cation, we present new methods that show improved\ncomputational performance and/or better consistency with human intuition than\nprevious approaches.\n\n1\n\nIntroduction\n\nThe ability to correctly interpret a prediction model\u2019s output is extremely important. It engenders\nappropriate user trust, provides insight into how a model may be improved, and supports understanding\nof the process being modeled. In some applications, simple models (e.g., linear models) are often\npreferred for their ease of interpretation, even if they may be less accurate than complex ones.\nHowever, the growing availability of big data has increased the bene\ufb01ts of using complex models, so\nbringing to the forefront the trade-off between accuracy and interpretability of a model\u2019s output. A\nwide variety of different methods have been recently proposed to address this issue [5, 8, 9, 3, 4, 1].\nBut an understanding of how these methods relate and when one method is preferable to another is\nstill lacking.\nHere, we present a novel uni\ufb01ed approach to interpreting model predictions.1 Our approach leads to\nthree potentially surprising results that bring clarity to the growing space of methods:\n\n1. We introduce the perspective of viewing any explanation of a model\u2019s prediction as a model itself,\nwhich we term the explanation model. This lets us de\ufb01ne the class of additive feature attribution\nmethods (Section 2), which uni\ufb01es six current methods.\n\n1https://github.com/slundberg/shap\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\f2. We then show that game theory results guaranteeing a unique solution apply to the entire class of\nadditive feature attribution methods (Section 3) and propose SHAP values as a uni\ufb01ed measure of\nfeature importance that various methods approximate (Section 4).\n\n3. We propose new SHAP value estimation methods and demonstrate that they are better aligned\nwith human intuition as measured by user studies and more effectually discriminate among model\noutput classes than several existing methods (Section 5).\n\n2 Additive Feature Attribution Methods\n\nThe best explanation of a simple model is the model itself; it perfectly represents itself and is easy to\nunderstand. For complex models, such as ensemble methods or deep networks, we cannot use the\noriginal model as its own best explanation because it is not easy to understand. Instead, we must use a\nsimpler explanation model, which we de\ufb01ne as any interpretable approximation of the original model.\nWe show below that six current explanation methods from the literature all use the same explanation\nmodel. This previously unappreciated unity has interesting implications, which we describe in later\nsections.\nLet f be the original prediction model to be explained and g the explanation model. Here, we focus\non local methods designed to explain a prediction f (x) based on a single input x, as proposed in\nLIME [5]. Explanation models often use simpli\ufb01ed inputs x(cid:48) that map to the original inputs through a\nmapping function x = hx(x(cid:48)). Local methods try to ensure g(z(cid:48)) \u2248 f (hx(z(cid:48))) whenever z(cid:48) \u2248 x(cid:48).\n(Note that hx(x(cid:48)) = x even though x(cid:48) may contain less information than x because hx is speci\ufb01c to\nthe current input x.)\n\nDe\ufb01nition 1 Additive feature attribution methods have an explanation model that is a linear\nfunction of binary variables:\n\nM(cid:88)\n\ng(z(cid:48)) = \u03c60 +\n\n\u03c6iz(cid:48)\ni,\n\n(1)\n\nwhere z(cid:48) \u2208 {0, 1}M , M is the number of simpli\ufb01ed input features, and \u03c6i \u2208 R.\n\ni=1\n\nMethods with explanation models matching De\ufb01nition 1 attribute an effect \u03c6i to each feature, and\nsumming the effects of all feature attributions approximates the output f (x) of the original model.\nMany current methods match De\ufb01nition 1, several of which are discussed below.\n\n2.1 LIME\n\nThe LIME method interprets individual model predictions based on locally approximating the model\naround a given prediction [5]. The local linear explanation model that LIME uses adheres to Equation\n1 exactly and is thus an additive feature attribution method. LIME refers to simpli\ufb01ed inputs x(cid:48) as\n\u201cinterpretable inputs,\u201d and the mapping x = hx(x(cid:48)) converts a binary vector of interpretable inputs\ninto the original input space. Different types of hx mappings are used for different input spaces. For\nbag of words text features, hx converts a vector of 1\u2019s or 0\u2019s (present or not) into the original word\ncount if the simpli\ufb01ed input is one, or zero if the simpli\ufb01ed input is zero. For images, hx treats the\nimage as a set of super pixels; it then maps 1 to leaving the super pixel as its original value and 0\nto replacing the super pixel with an average of neighboring pixels (this is meant to represent being\nmissing).\nTo \ufb01nd \u03c6, LIME minimizes the following objective function:\n\n\u03be = arg min\n\ng\u2208G\n\nL(f, g, \u03c0x(cid:48)) + \u2126(g).\n\n(2)\n\nFaithfulness of the explanation model g(z(cid:48)) to the original model f (hx(z(cid:48))) is enforced through\nthe loss L over a set of samples in the simpli\ufb01ed input space weighted by the local kernel \u03c0x(cid:48). \u2126\npenalizes the complexity of g. Since in LIME g follows Equation 1 and L is a squared loss, Equation\n2 can be solved using penalized linear regression.\n\n2\n\n\f2.2 DeepLIFT\n\nn(cid:88)\n\nDeepLIFT was recently proposed as a recursive prediction explanation method for deep learning\n[8, 7]. It attributes to each input xi a value C\u2206xi\u2206y that represents the effect of that input being set\nto a reference value as opposed to its original value. This means that for DeepLIFT, the mapping\nx = hx(x(cid:48)) converts binary values into the original inputs, where 1 indicates that an input takes its\noriginal value, and 0 indicates that it takes the reference value. The reference value, though chosen\nby the user, represents a typical uninformative background value for the feature.\nDeepLIFT uses a \"summation-to-delta\" property that states:\n\nC\u2206xi\u2206o = \u2206o,\n\n(3)\n\nwhere o = f (x) is the model output, \u2206o = f (x) \u2212 f (r), \u2206xi = xi \u2212 ri, and r is the reference input.\nIf we let \u03c6i = C\u2206xi\u2206o and \u03c60 = f (r), then DeepLIFT\u2019s explanation model matches Equation 1 and\nis thus another additive feature attribution method.\n\ni=1\n\n2.3 Layer-Wise Relevance Propagation\n\nThe layer-wise relevance propagation method interprets the predictions of deep networks [1]. As\nnoted by Shrikumar et al., this menthod is equivalent to DeepLIFT with the reference activations of all\nneurons \ufb01xed to zero. Thus, x = hx(x(cid:48)) converts binary values into the original input space, where\n1 means that an input takes its original value, and 0 means an input takes the 0 value. Layer-wise\nrelevance propagation\u2019s explanation model, like DeepLIFT\u2019s, matches Equation 1.\n\n2.4 Classic Shapley Value Estimation\n\nThree previous methods use classic equations from cooperative game theory to compute explanations\nof model predictions: Shapley regression values [4], Shapley sampling values [9], and Quantitative\nInput In\ufb02uence [3].\nShapley regression values are feature importances for linear models in the presence of multicollinearity.\nThis method requires retraining the model on all feature subsets S \u2286 F , where F is the set of all\nfeatures. It assigns an importance value to each feature that represents the effect on the model\nprediction of including that feature. To compute this effect, a model fS\u222a{i} is trained with that feature\npresent, and another model fS is trained with the feature withheld. Then, predictions from the two\nmodels are compared on the current input fS\u222a{i}(xS\u222a{i}) \u2212 fS(xS), where xS represents the values\nof the input features in the set S. Since the effect of withholding a feature depends on other features\nin the model, the preceding differences are computed for all possible subsets S \u2286 F \\ {i}. The\nShapley values are then computed and used as feature attributions. They are a weighted average of all\npossible differences:\n\n|S|!(|F| \u2212 |S| \u2212 1)!\n\n|F|!\n\n(cid:2)fS\u222a{i}(xS\u222a{i}) \u2212 fS(xS)(cid:3) .\n\n(4)\n\n(cid:88)\n\n\u03c6i =\n\nS\u2286F\\{i}\n\nFor Shapley regression values, hx maps 1 or 0 to the original input space, where 1 indicates the input\nis included in the model, and 0 indicates exclusion from the model. If we let \u03c60 = f\u2205(\u2205), then the\nShapley regression values match Equation 1 and are hence an additive feature attribution method.\nShapley sampling values are meant to explain any model by: (1) applying sampling approximations\nto Equation 4, and (2) approximating the effect of removing a variable from the model by integrating\nover samples from the training dataset. This eliminates the need to retrain the model and allows fewer\nthan 2|F| differences to be computed. Since the explanation model form of Shapley sampling values\nis the same as that for Shapley regression values, it is also an additive feature attribution method.\nQuantitative input in\ufb02uence is a broader framework that addresses more than feature attributions.\nHowever, as part of its method it independently proposes a sampling approximation to Shapley values\nthat is nearly identical to Shapley sampling values. It is thus another additive feature attribution\nmethod.\n\n3\n\n\f3 Simple Properties Uniquely Determine Additive Feature Attributions\n\nA surprising attribute of the class of additive feature attribution methods is the presence of a single\nunique solution in this class with three desirable properties (described below). While these properties\nare familiar to the classical Shapley value estimation methods, they were previously unknown for\nother additive feature attribution methods.\nThe \ufb01rst desirable property is local accuracy. When approximating the original model f for a speci\ufb01c\ninput x, local accuracy requires the explanation model to at least match the output of f for the\nsimpli\ufb01ed input x(cid:48) (which corresponds to the original input x).\nProperty 1 (Local accuracy)\n\nM(cid:88)\n\nf (x) = g(x(cid:48)) = \u03c60 +\n\n\u03c6ix(cid:48)\n\ni\n\n(5)\n\nThe explanation model g(x(cid:48)) matches the original model f (x) when x = hx(x(cid:48)), where \u03c60 =\nf (hx(0)) represents the model output with all simpli\ufb01ed inputs toggled off (i.e. missing).\n\ni=1\n\nThe second property is missingness. If the simpli\ufb01ed inputs represent feature presence, then missing-\nness requires features missing in the original input to have no impact. All of the methods described in\nSection 2 obey the missingness property.\n\nProperty 2 (Missingness)\n\ni = 0 =\u21d2 \u03c6i = 0\nx(cid:48)\nMissingness constrains features where x(cid:48)\ni = 0 to have no attributed impact.\n\n(6)\n\nThe third property is consistency. Consistency states that if a model changes so that some simpli\ufb01ed\ninput\u2019s contribution increases or stays the same regardless of the other inputs, that input\u2019s attribution\nshould not decrease.\nProperty 3 (Consistency) Let fx(z(cid:48)) = f (hx(z(cid:48))) and z(cid:48) \\ i denote setting z(cid:48)\nmodels f and f(cid:48), if\nfor all inputs z(cid:48) \u2208 {0, 1}M , then \u03c6i(f(cid:48), x) \u2265 \u03c6i(f, x).\nTheorem 1 Only one possible explanation model g follows De\ufb01nition 1 and satis\ufb01es Properties 1, 2,\nand 3:\n\nx(z(cid:48) \\ i) \u2265 fx(z(cid:48)) \u2212 fx(z(cid:48) \\ i)\n\ni = 0. For any two\n\n(7)\n\nx(z(cid:48)) \u2212 f(cid:48)\nf(cid:48)\n(cid:88)\n\n|z(cid:48)|!(M \u2212 |z(cid:48)| \u2212 1)!\n\n\u03c6i(f, x) =\n\n(8)\nwhere |z(cid:48)| is the number of non-zero entries in z(cid:48), and z(cid:48) \u2286 x(cid:48) represents all z(cid:48) vectors where the\nnon-zero entries are a subset of the non-zero entries in x(cid:48).\n\nz(cid:48)\u2286x(cid:48)\n\nM !\n\n[fx(z(cid:48)) \u2212 fx(z(cid:48) \\ i)]\n\nTheorem 1 follows from combined cooperative game theory results, where the values \u03c6i are known\nas Shapley values [6]. Young (1985) demonstrated that Shapley values are the only set of values\nthat satisfy three axioms similar to Property 1, Property 3, and a \ufb01nal property that we show to be\nredundant in this setting (see Supplementary Material). Property 2 is required to adapt the Shapley\nproofs to the class of additive feature attribution methods.\nUnder Properties 1-3, for a given simpli\ufb01ed input mapping hx, Theorem 1 shows that there is only one\npossible additive feature attribution method. This result implies that methods not based on Shapley\nvalues violate local accuracy and/or consistency (methods in Section 2 already respect missingness).\nThe following section proposes a uni\ufb01ed approach that improves previous methods, preventing them\nfrom unintentionally violating Properties 1 and 3.\n\n4 SHAP (SHapley Additive exPlanation) Values\n\nWe propose SHAP values as a uni\ufb01ed measure of feature importance. These are the Shapley values\nof a conditional expectation function of the original model; thus, they are the solution to Equation\n\n4\n\n\fFigure 1: SHAP (SHapley Additive exPlanation) values attribute to each feature the change in the\nexpected model prediction when conditioning on that feature. They explain how to get from the\nbase value E[f (z)] that would be predicted if we did not know any features to the current output\nf (x). This diagram shows a single ordering. When the model is non-linear or the input features are\nnot independent, however, the order in which features are added to the expectation matters, and the\nSHAP values arise from averaging the \u03c6i values across all possible orderings.\n\n8, where fx(z(cid:48)) = f (hx(z(cid:48))) = E[f (z) | zS], and S is the set of non-zero indexes in z(cid:48) (Figure 1).\nBased on Sections 2 and 3, SHAP values provide the unique additive feature importance measure that\nadheres to Properties 1-3 and uses conditional expectations to de\ufb01ne simpli\ufb01ed inputs. Implicit in this\nde\ufb01nition of SHAP values is a simpli\ufb01ed input mapping, hx(z(cid:48)) = zS, where zS has missing values\nfor features not in the set S. Since most models cannot handle arbitrary patterns of missing input\nvalues, we approximate f (zS) with E[f (z) | zS]. This de\ufb01nition of SHAP values is designed to\nclosely align with the Shapley regression, Shapley sampling, and quantitative input in\ufb02uence feature\nattributions, while also allowing for connections with LIME, DeepLIFT, and layer-wise relevance\npropagation.\nThe exact computation of SHAP values is challenging. However, by combining insights from current\nadditive feature attribution methods, we can approximate them. We describe two model-agnostic\napproximation methods, one that is already known (Shapley sampling values) and another that is\nnovel (Kernel SHAP). We also describe four model-type-speci\ufb01c approximation methods, two of\nwhich are novel (Max SHAP, Deep SHAP). When using these methods, feature independence and\nmodel linearity are two optional assumptions simplifying the computation of the expected values\n(note that \u00afS is the set of features not in S):\n\nf (hx(z(cid:48))) = E[f (z) | zS]\n= Ez \u00afS|zS [f (z)]\n\u2248 Ez \u00afS [f (z)]\n\u2248 f ([zS, E[z \u00afS]]).\n\nSHAP explanation model simpli\ufb01ed input mapping\nexpectation over z \u00afS | zS\nassume feature independence (as in [9, 5, 7, 3])\nassume model linearity\n\n(9)\n(10)\n(11)\n(12)\n\n4.1 Model-Agnostic Approximations\n\nIf we assume feature independence when approximating conditional expectations (Equation 11), as\nin [9, 5, 7, 3], then SHAP values can be estimated directly using the Shapley sampling values method\n[9] or equivalently the Quantitative Input In\ufb02uence method [3]. These methods use a sampling\napproximation of a permutation version of the classic Shapley value equations (Equation 8). Separate\nsampling estimates are performed for each feature attribution. While reasonable to compute for a\nsmall number of inputs, the Kernel SHAP method described next requires fewer evaluations of the\noriginal model to obtain similar approximation accuracy (Section 5).\n\nKernel SHAP (Linear LIME + Shapley values)\n\nLinear LIME uses a linear explanation model to locally approximate f, where local is measured in the\nsimpli\ufb01ed binary input space. At \ufb01rst glance, the regression formulation of LIME in Equation 2 seems\nvery different from the classical Shapley value formulation of Equation 8. However, since linear\nLIME is an additive feature attribution method, we know the Shapley values are the only possible\nsolution to Equation 2 that satis\ufb01es Properties 1-3 \u2013 local accuracy, missingness and consistency. A\nnatural question to pose is whether the solution to Equation 2 recovers these values. The answer\ndepends on the choice of loss function L, weighting kernel \u03c0x(cid:48) and regularization term \u2126. The LIME\nchoices for these parameters are made heuristically; using these choices, Equation 2 does not recover\nthe Shapley values. One consequence is that local accuracy and/or consistency are violated, which in\nturn leads to unintuitive behavior in certain circumstances (see Section 5).\n\n5\n\n\fBelow we show how to avoid heuristically choosing the parameters in Equation 2 and how to \ufb01nd the\nloss function L, weighting kernel \u03c0x(cid:48), and regularization term \u2126 that recover the Shapley values.\n\nTheorem 2 (Shapley kernel) Under De\ufb01nition 1, the speci\ufb01c forms of \u03c0x(cid:48), L, and \u2126 that make\nsolutions of Equation 2 consistent with Properties 1 through 3 are:\n\n\u2126(g) = 0,\n\n(M \u2212 1)\n\n\u03c0x(cid:48)(z(cid:48)) =\n\nL(f, g, \u03c0x(cid:48)) =\n\n(M choose |z(cid:48)|)|z(cid:48)|(M \u2212 |z(cid:48)|)\n\n,\n[f (hx(z(cid:48))) \u2212 g(z(cid:48))]2 \u03c0x(cid:48)(z(cid:48)),\n\n(cid:88)\n\nz(cid:48)\u2208Z\n\nwhere |z(cid:48)| is the number of non-zero elements in z(cid:48).\n\n(cid:80)M\n\nThe proof of Theorem 2 is shown in the Supplementary Material.\nIt is important to note that \u03c0x(cid:48)(z(cid:48)) = \u221e when |z(cid:48)| \u2208 {0, M}, which enforces \u03c60 = fx(\u2205) and f (x) =\ni=0 \u03c6i. In practice, these in\ufb01nite weights can be avoided during optimization by analytically\n\neliminating two variables using these constraints.\nSince g(z(cid:48)) in Theorem 2 is assumed to follow a linear form, and L is a squared loss, Equation 2\ncan still be solved using linear regression. As a consequence, the Shapley values from game theory\ncan be computed using weighted linear regression.2 Since LIME uses a simpli\ufb01ed input mapping\nthat is equivalent to the approximation of the SHAP mapping given in Equation 12, this enables\nregression-based, model-agnostic estimation of SHAP values. Jointly estimating all SHAP values\nusing regression provides better sample ef\ufb01ciency than the direct use of classical Shapley equations\n(see Section 5).\nThe intuitive connection between linear regression and Shapley values is that Equation 8 is a difference\nof means. Since the mean is also the best least squares point estimate for a set of data points, it is\nnatural to search for a weighting kernel that causes linear least squares regression to recapitulate\nthe Shapley values. This leads to a kernel that distinctly differs from previous heuristically chosen\nkernels (Figure 2A).\n\n4.2 Model-Speci\ufb01c Approximations\n\nWhile Kernel SHAP improves the sample ef\ufb01ciency of model-agnostic estimations of SHAP values, by\nrestricting our attention to speci\ufb01c model types, we can develop faster model-speci\ufb01c approximation\nmethods.\n\nLinear SHAP\n\nFor linear models, if we assume input feature independence (Equation 11), SHAP values can be\napproximated directly from the model\u2019s weight coef\ufb01cients.\n\nCorollary 1 (Linear SHAP) Given a linear model f (x) =(cid:80)M\n\nj=1 wjxj + b: \u03c60(f, x) = b and\n\n\u03c6i(f, x) = wj(xj \u2212 E[xj])\n\nThis follows from Theorem 2 and Equation 11, and it has been previously noted by \u0160trumbelj and\nKononenko [9].\n\nLow-Order SHAP\n\nSince linear regression using Theorem 2 has complexity O(2M + M 3), it is ef\ufb01cient for small values\nof M if we choose an approximation of the conditional expectations (Equation 11 or 12).\n\n2During the preparation of this manuscript we discovered this parallels an equivalent constrained quadratic\n\nminimization formulation of Shapley values proposed in econometrics [2].\n\n6\n\n\fFigure 2: (A) The Shapley kernel weighting is symmetric when all possible z(cid:48) vectors are ordered\nby cardinality there are 215 vectors in this example. This is distinctly different from previous\nheuristically chosen kernels. (B) Compositional models such as deep neural networks are comprised\nof many simple components. Given analytic solutions for the Shapley values of the components, fast\napproximations for the full model can be made using DeepLIFT\u2019s style of back-propagation.\n\nMax SHAP\n\nUsing a permutation formulation of Shapley values, we can calculate the probability that each input\nwill increase the maximum value over every other input. Doing this on a sorted order of input values\nlets us compute the Shapley values of a max function with M inputs in O(M 2) time instead of\nO(M 2M ). See Supplementary Material for the full algorithm.\n\nDeep SHAP (DeepLIFT + Shapley values)\n\nWhile Kernel SHAP can be used on any model, including deep models, it is natural to ask whether\nthere is a way to leverage extra knowledge about the compositional nature of deep networks to improve\ncomputational performance. We \ufb01nd an answer to this question through a previously unappreciated\nconnection between Shapley values and DeepLIFT [8]. If we interpret the reference value in Equation\n3 as representing E[x] in Equation 12, then DeepLIFT approximates SHAP values assuming that\nthe input features are independent of one another and the deep model is linear. DeepLIFT uses a\nlinear composition rule, which is equivalent to linearizing the non-linear components of a neural\nnetwork. Its back-propagation rules de\ufb01ning how each component is linearized are intuitive but were\nheuristically chosen. Since DeepLIFT is an additive feature attribution method that satis\ufb01es local\naccuracy and missingness, we know that Shapley values represent the only attribution values that\nsatisfy consistency. This motivates our adapting DeepLIFT to become a compositional approximation\nof SHAP values, leading to Deep SHAP.\nDeep SHAP combines SHAP values computed for smaller components of the network into SHAP\nvalues for the whole network. It does so by recursively passing DeepLIFT\u2019s multipliers, now de\ufb01ned\nin terms of SHAP values, backwards through the network as in Figure 2B:\n\nmxj f3 =\n\u2200j\u2208{1,2} myifj =\n\nmyif3 =\n\n\u03c6i(f3, x)\nxj \u2212 E[xj]\n\u03c6i(fj, y)\nyi \u2212 E[yi]\n\n2(cid:88)\n\nmyifj mxj f3\n\nchain rule\n\nj=1\n\n\u03c6i(f3, y) \u2248 myif3 (yi \u2212 E[yi])\n\nlinear approximation\n\n(13)\n\n(14)\n\n(15)\n\n(16)\n\nSince the SHAP values for the simple network components can be ef\ufb01ciently solved analytically\nif they are linear, max pooling, or an activation function with just one input, this composition\nrule enables a fast approximation of values for the whole model. Deep SHAP avoids the need to\nheuristically choose ways to linearize components. Instead, it derives an effective linearization from\nthe SHAP values computed for each component. The max function offers one example where this\nleads to improved attributions (see Section 5).\n\n7\n\nf3f2f1f3f2f1hapley(A)(B)\fFigure 3: Comparison of three additive feature attribution methods: Kernel SHAP (using a debiased\nlasso), Shapley sampling values, and LIME (using the open source implementation). Feature\nimportance estimates are shown for one feature in two models as the number of evaluations of the\noriginal model function increases. The 10th and 90th percentiles are shown for 200 replicate estimates\nat each sample size. (A) A decision tree model using all 10 input features is explained for a single\ninput. (B) A decision tree using only 3 of 100 input features is explained for a single input.\n\n5 Computational and User Study Experiments\n\nWe evaluated the bene\ufb01ts of SHAP values using the Kernel SHAP and Deep SHAP approximation\nmethods. First, we compared the computational ef\ufb01ciency and accuracy of Kernel SHAP vs. LIME\nand Shapley sampling values. Second, we designed user studies to compare SHAP values with\nalternative feature importance allocations represented by DeepLIFT and LIME. As might be expected,\nSHAP values prove more consistent with human intuition than other methods that fail to meet\nProperties 1-3 (Section 2). Finally, we use MNIST digit image classi\ufb01cation to compare SHAP with\nDeepLIFT and LIME.\n\n5.1 Computational Ef\ufb01ciency\n\nTheorem 2 connects Shapley values from game theory with weighted linear regression. Kernal SHAP\nuses this connection to compute feature importance. This leads to more accurate estimates with fewer\nevaluations of the original model than previous sampling-based estimates of Equation 8, particularly\nwhen regularization is added to the linear model (Figure 3). Comparing Shapley sampling, SHAP, and\nLIME on both dense and sparse decision tree models illustrates both the improved sample ef\ufb01ciency\nof Kernel SHAP and that values from LIME can differ signi\ufb01cantly from SHAP values that satisfy\nlocal accuracy and consistency.\n\n5.2 Consistency with Human Intuition\n\nTheorem 1 provides a strong incentive for all additive feature attribution methods to use SHAP\nvalues. Both LIME and DeepLIFT, as originally demonstrated, compute different feature importance\nvalues. To validate the importance of Theorem 1, we compared explanations from LIME, DeepLIFT,\nand SHAP with user explanations of simple models (using Amazon Mechanical Turk). Our testing\nassumes that good model explanations should be consistent with explanations from humans who\nunderstand that model.\nWe compared LIME, DeepLIFT, and SHAP with human explanations for two settings. The \ufb01rst\nsetting used a sickness score that was higher when only one of two symptoms was present (Figure 4A).\nThe second used a max allocation problem to which DeepLIFT can be applied. Participants were told\na short story about how three men made money based on the maximum score any of them achieved\n(Figure 4B). In both cases, participants were asked to assign credit for the output (the sickness score\nor money won) among the inputs (i.e., symptoms or players). We found a much stronger agreement\nbetween human explanations and SHAP than with other methods. SHAP\u2019s improved performance for\nmax functions addresses the open problem of max pooling functions in DeepLIFT [7].\n\n5.3 Explaining Class Differences\n\nAs discussed in Section 4.2, DeepLIFT\u2019s compositional approach suggests a compositional approxi-\nmation of SHAP values (Deep SHAP). These insights, in turn, improve DeepLIFT, and a new version\n\n8\n\n(A)(B)SHAPShapley samplingLIMETrue Shapley valueDense original modelSparse original modelFeature importance\fFigure 4: Human feature impact estimates are shown as the most common explanation given among\n30 (A) and 52 (B) random individuals, respectively. (A) Feature attributions for a model output value\n(sickness score) of 2. The model output is 2 when fever and cough are both present, 5 when only\none of fever or cough is present, and 0 otherwise. (B) Attributions of pro\ufb01t among three men, given\naccording to the maximum number of questions any man got right. The \ufb01rst man got 5 questions\nright, the second 4 questions, and the third got none right, so the pro\ufb01t is $5.\n\nFigure 5: Explaining the output of a convolutional network trained on the MNIST digit dataset. Orig.\nDeepLIFT has no explicit Shapley approximations, while New DeepLIFT seeks to better approximate\nShapley values. (A) Red areas increase the probability of that class, and blue areas decrease the\nprobability. Masked removes pixels in order to go from 8 to 3. (B) The change in log odds when\nmasking over 20 random images supports the use of better estimates of SHAP values.\n\nincludes updates to better match Shapley values [7]. Figure 5 extends DeepLIFT\u2019s convolutional\nnetwork example to highlight the increased performance of estimates that are closer to SHAP values.\nThe pre-trained model and Figure 5 example are the same as those used in [7], with inputs normalized\nbetween 0 and 1. Two convolution layers and 2 dense layers are followed by a 10-way softmax\noutput layer. Both DeepLIFT versions explain a normalized version of the linear layer, while SHAP\n(computed using Kernel SHAP) and LIME explain the model\u2019s output. SHAP and LIME were both\nrun with 50k samples (Supplementary Figure 1); to improve performance, LIME was modi\ufb01ed to use\nsingle pixel segmentation over the digit pixels. To match [7], we masked 20% of the pixels chosen to\nswitch the predicted class from 8 to 3 according to the feature attribution given by each method.\n\n6 Conclusion\n\nThe growing tension between the accuracy and interpretability of model predictions has motivated\nthe development of methods that help users interpret predictions. The SHAP framework identi\ufb01es\nthe class of additive feature importance methods (which includes six previous methods) and shows\nthere is a unique solution in this class that adheres to desirable properties. The thread of unity that\nSHAP weaves through the literature is an encouraging sign that common principles about model\ninterpretation can inform the development of future methods.\nWe presented several different estimation methods for SHAP values, along with proofs and ex-\nperiments showing that these values are desirable. Promising next steps involve developing faster\nmodel-type-speci\ufb01c estimation methods that make fewer assumptions, integrating work on estimating\ninteraction effects from game theory, and de\ufb01ning new explanation model classes.\n\n9\n\n(A)(B)LIMESHAPHumanOrig. DeepLIFTLIMESHAPHumanOrig. DeepLiftNew DeepLiftSHAPInputExplain 8Explain 3Masked(A)(B)LIMEOrig. DeepLiftNew DeepLiftSHAPLIMEChange in log-odds2030405060\fAcknowledgements\n\nThis work was supported by a National Science Foundation (NSF) DBI-135589, NSF CAREER\nDBI-155230, American Cancer Society 127332-RSG-15-097-01-TBG, National Institute of Health\n(NIH) AG049196, and NSF Graduate Research Fellowship. We would like to thank Marco Ribeiro,\nErik \u0160trumbelj, Avanti Shrikumar, Yair Zick, the Lee Lab, and the NIPS reviewers for feedback that\nhas signi\ufb01cantly improved this work.\n\nReferences\n\n[1] Sebastian Bach et al. \u201cOn pixel-wise explanations for non-linear classi\ufb01er decisions by layer-\n\nwise relevance propagation\u201d. In: PloS One 10.7 (2015), e0130140.\n\n[2] A Charnes et al. \u201cExtremal principle solutions of games in characteristic function form: core,\nChebychev and Shapley value generalizations\u201d. In: Econometrics of Planning and Ef\ufb01ciency\n11 (1988), pp. 123\u2013133.\n\n[3] Anupam Datta, Shayak Sen, and Yair Zick. \u201cAlgorithmic transparency via quantitative input\nin\ufb02uence: Theory and experiments with learning systems\u201d. In: Security and Privacy (SP), 2016\nIEEE Symposium on. IEEE. 2016, pp. 598\u2013617.\n\n[4] Stan Lipovetsky and Michael Conklin. \u201cAnalysis of regression in game theory approach\u201d. In:\n\nApplied Stochastic Models in Business and Industry 17.4 (2001), pp. 319\u2013330.\n\n[5] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. \u201cWhy should i trust you?: Explaining\nthe predictions of any classi\ufb01er\u201d. In: Proceedings of the 22nd ACM SIGKDD International\nConference on Knowledge Discovery and Data Mining. ACM. 2016, pp. 1135\u20131144.\n\n[6] Lloyd S Shapley. \u201cA value for n-person games\u201d. In: Contributions to the Theory of Games\n\n2.28 (1953), pp. 307\u2013317.\n\n[7] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. \u201cLearning Important Features\nThrough Propagating Activation Differences\u201d. In: arXiv preprint arXiv:1704.02685 (2017).\n\n[8] Avanti Shrikumar et al. \u201cNot Just a Black Box: Learning Important Features Through Propa-\n\ngating Activation Differences\u201d. In: arXiv preprint arXiv:1605.01713 (2016).\n\n[9] Erik \u0160trumbelj and Igor Kononenko. \u201cExplaining prediction models and individual predictions\nwith feature contributions\u201d. In: Knowledge and information systems 41.3 (2014), pp. 647\u2013665.\n[10] H Peyton Young. \u201cMonotonic solutions of cooperative games\u201d. In: International Journal of\n\nGame Theory 14.2 (1985), pp. 65\u201372.\n\n10\n\n\f", "award": [], "sourceid": 2493, "authors": [{"given_name": "Scott", "family_name": "Lundberg", "institution": "University of Washington"}, {"given_name": "Su-In", "family_name": "Lee", "institution": "University of Washington"}]}