{"title": "Explaining Deep Learning Models -- A Bayesian Non-parametric Approach", "book": "Advances in Neural Information Processing Systems", "page_first": 4514, "page_last": 4524, "abstract": "Understanding and interpreting how machine learning (ML) models make decisions have been a big challenge. While recent research has proposed various technical approaches to provide some clues as to how an ML model makes individual predictions, they cannot provide users with an ability to inspect a model as a complete entity. In this work, we propose a novel technical approach that augments a Bayesian non-parametric regression mixture model with multiple elastic nets. Using the enhanced mixture model, we can extract generalizable insights for a target model through a global approximation. To demonstrate the utility of our approach, we evaluate it on different ML models in the context of image recognition. The empirical results indicate that our proposed approach not only outperforms the state-of-the-art techniques in explaining individual decisions but also provides users with an ability to discover the vulnerabilities of the target ML models.", "full_text": "Explaining Deep Learning Models \u2013 A Bayesian\n\nNon-parametric Approach\n\nWenbo Guo\n\nThe Pennsylvania State University\n\nwzg13@ist.psu.edu\n\nSui Huang\nNet\ufb02ix Inc.\n\nshuang@netflix.com\n\nYunzhe Tao\n\nColumbia University\ny.tao@columbia.edu\n\nXinyu Xing\n\nThe Pennsylvania State University\n\nxxing@ist.psu.edu\n\nLin Lin\n\nThe Pennsylvania State University\n\nllin@psu.edu\n\nAbstract\n\nUnderstanding and interpreting how machine learning (ML) models make decisions\nhave been a big challenge. While recent research has proposed various technical\napproaches to provide some clues as to how an ML model makes individual\npredictions, they cannot provide users with an ability to inspect a model as a\ncomplete entity. In this work, we propose a novel technical approach that augments\na Bayesian non-parametric regression mixture model with multiple elastic nets.\nUsing the enhanced mixture model, we can extract generalizable insights for a\ntarget model through a global approximation. To demonstrate the utility of our\napproach, we evaluate it on different ML models in the context of image recognition.\nThe empirical results indicate that our proposed approach not only outperforms\nthe state-of-the-art techniques in explaining individual decisions but also provides\nusers with an ability to discover the vulnerabilities of the target ML models.\n\n1\n\nIntroduction\n\nWhen comparing with relatively simple learning techniques such as decision trees and K-nearest\nneighbors, it is well acknowledged that complex learning models \u2013 particularly, deep neural networks\n(DNN) \u2013 usually demonstrate superior performance in classi\ufb01cation and prediction. However, they\nare almost completely opaque, even to the engineers that build them [20]. Presumably as such, they\nhave not yet been widely adopted in critical problem domains, such as diagnosing deadly diseases [13]\nand making million-dollar trading decisions [14].\nTo address this problem, prior research proposes to derive an interpretable explanation for the output\nof a DNN. With that, people could understand, trust and effectively manage a deep learning model.\nFrom a technical prospective, this can be interpreted as pinpointing the most important features in the\ninput of a deep learning model. In the past, the techniques designed and developed primarily focus on\ntwo kinds of methods \u2013 (1) whitebox explanation that derives interpretation for a deep learning model\nthrough forward or backward propagation approach [26, 36], and (2) blackbox explanation that infers\nexplanations for individual decisions through local approximation [21, 23]. While both demonstrate\na great potential to help users interpret an individual decision, they lack an ability to extract insights\nfrom the target ML model that could be generalized to future cases. In other words, existing methods\ncould not shed lights on the general sensitivity level of a target model to speci\ufb01c input dimensions\nand hence fall short in foreseeing when prediction errors might occur for future cases.\nIn this work, we propose a new technical approach that not only explains an individual decision\nbut, more importantly, extracts generalizable insights from the target model. As we will show in\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fSection 4, we de\ufb01ne such insights as the general sensitivity level of a target model to speci\ufb01c input\ndimensions. We demonstrate that model developers could use them to identify model strengths as\nwell as model vulnerabilities. Technically, our approach introduces multiple elastic nets to a Bayesian\nnon-parametric regression mixture model. Then, it utilizes this model to approximate a target model\nand thus derives its generalizable insight and explanation for its individual decision. The rationale\nbehind this approach is as follows.\nA Bayesian non-parametric regression mixture model can approximate arbitrary probability density\nwith high accuracy [22]. As we will discuss in Section 3, with multiple elastic nets, we can augment\na regression mixture model with an ability to extract patterns (generalizable insights) even from a\nlearning model that takes as input data of different extent of correlations. Given the pattern, we\ncould extrapolate input features that are critical to the overall performance of an ML model. This\ninformation can be used to facilitate one to scrutinize a model\u2019s overall strengths and weaknesses.\nBesides extracting generalizable insights, the proposed model can also provide users with more\nunderstandable and accountable explanations. We will demonstrate this characteristic in Section 4.\n\n2 Related Work\n\nMost of the works related to model interpretation lie in demystifying complicated ML models through\nwhitebox and blackbox mechanisms. Here, we summarize these works and discuss their limitations.\nIt should be noticed that we do not include those works that identify training samples that are most\nresponsible for a given prediction (e.g., [12, 15]) and those works that build a self-interpretable deep\nlearning model [7, 33].\nThe whitebox mechanism augments a learning model with the ability to yield explanations for\nindividual predictions. Generally, the techniques in this kind of mechanism follow two lines of\napproaches \u2013 \u0096 occluding a fraction of a single input sample and identifying what portions of the\nfeatures are important for classi\ufb01cation [4, 6, 17, 36, 37], and \u0097 computing the gradient of an output\nwith respect to a given input sample and pinpointing what features are sensitive to the prediction of\nthat sample [1, 8, 24, 25, 26, 29, 32]. While both can give users an explanation for a single decision\nthat a learning model reach, they are not suf\ufb01cient to provide a global understanding of a learning\nmodel, nor capable of exposing its strengths and weaknesses. In addition, they typically cannot\nbe generally applied to explaining prediction outcomes of other ML models because most of the\ntechniques following this mechanism are designed for a speci\ufb01c ML model and require altering that\nlearning model.\nThe blackbox mechanism treats an ML model as a black box, and produces explanations by locally\nlearning an interpretable model around a prediction. For example, LIME [23] and SHAP [21] are the\nsame kind of explanation techniques that sample perturbed instances around a single data sample and\n\ufb01t a linear model to perform local explanations. Going beyond the explanation of a single prediction,\nthey both can be extended to explain the model as a complete entity by selecting a small number\nof representative individual predictions and their explanations. However, explanations obtained\nthrough such approaches cannot describe the full mapping learned by an ML model. In this work,\nour proposed technique derives a generalizable insight directly from a target model, which provides\nus with the ability to unveil model weaknesses and strengths.\n\n3 Technical Approach\n\n3.1 Background\n\nA Bayesian non-parametric regression mixture model (i.e., mixture model for short) consists of\nmultiple Gaussian distributions:\n\nyi|xi, \u0398 \u223c\n\n\u03c0jN (yi | xi\u03b2j, \u03c32\nj ),\n\n(1)\nwhere \u0398 denotes the parameter set, xi \u2208 Rp is the i-th data sample of the sample feature matrix\nXT \u2208 Rp\u00d7n, and yi is the corresponding prediction in y \u2208 Rn, which is the predictions of n samples.\n\u03c01:\u221e are the probabilities tied to the distributions with the sum equal to 1, and \u03b21:\u221e and \u03c32\n1:\u221e\nrepresent the parameters of regression models, with \u03b2j \u2208 Rp and \u03c32\n\nj \u2208 R.\n\n\u221e(cid:88)\n\nj=1\n\n2\n\n\f1.\n\nIn general, model (1) can be viewed as a combination of in\ufb01nite number of regression models and be\nused to approximate any learning model with high accuracy. Given a learning model g : Rp \u2192 R,\nwe can therefore approximate g(\u00b7) with a mixture model using {X, y}, a set of data samples as well\nas their corresponding predictions obtained from model g, i.e., yi = g(xi). For any data sample xi,\nwe can then identify a regression model \u02c6yi = xi\u03b2j + \u0001i, which best approximates the local decision\nboundary near xi\nNote that in this paper, we assume that a single mixture component is suf\ufb01cient to approximate the\nlocal decision boundary around xi. Despite the assumption doesnot hold in some cases, the proposed\nmodel can be relaxed and extended to deal with these cases. More speci\ufb01cally, instead of directly\nassigning each instance to one mixture component, we can assign an instance at a mode level [10],\n(i.e., assigning the instance to a combination of multiple mixture components). When explaining a\nsingle instance, we can linearly combine the corresponding regression coef\ufb01cients in a mode.\nRecent research [23] has demonstrated that such a linear regression model can be used for assessing\nhow the feature space affects a decision by inspecting the weights (model coef\ufb01cients) of the features\npresent in the input. As a result, similar to prior research [23], we can take this linear regression\nmodel to pinpoint the important features and take them as an explanation for the corresponding\nindividual decision.\nIn addition to model approximation and explanation mentioned above, another characteristic of a\nmixture model is that it can enable multiple training samples to share the same regression model and\nthus preserve only dominant patterns in data. With this, we can signi\ufb01cantly reduce the amount of\nexplanations derived from training data and utilize them as the generalizable insight of a target model.\n\n3.2 Challenge and Technical Overview\n\nDespite the great characteristics of a mixture model, it is still challenging for us to use it for deriving\ngeneralizable insights or individual explanation. This is because a regression mixture model does\nnot always guarantee a success in model approximation, especially when it deals with samples with\ndiverse feature correlations and data sparsity.\nTo tackle this challenge, an instinctive reaction is to introduce an elastic net to a Bayesian regression\nmixture model. Past research [9, 18, 38] has demonstrated that an elastic net encourages the grouping\neffects among variables so that highly correlated variables tend to be in or out of a mixture model\ntogether. Therefore, it can potentially augment the aforementioned method with the ability of dealing\nwith the situation where the features of a high dimensional sample are highly correlated. However, a\nkey limitation of this approach could manifest, especially when it deals with samples with diverse\nfeature correlation and data sparsity.\nIn the following, we address this issue by establishing a dirichlet process mixture model with multiple\nelastic nets (DMM-MEN). Different from previous research [35], our approach allows the regularization\nterms to has the \ufb02exibility to reduce a lasso or ridge under some sample categories, while maintaining\nthe properties of the elastic net under other categories. With the multiple elastic nets, the model is\nable to capture the different levels of feature correlation and sparsity in the data. the In the following,\nwe provide more details of this hierarchical Bayesian non-parametric model.\n\n3.3 Technical Details\n\nDirichlet Process Regression Mixture Model. As is speci\ufb01ed in Equation (1), the amount of\nGaussian distributions is in\ufb01nite, which indicates that there are in\ufb01nite number of parameters that\nneed to be estimated. In practice, however, the amount of available data samples is limited and\ntherefore it is necessary to restrict the number of distributions. To do this, truncated Dirichlet process\nprior [11] can be applied, and Equation (1) can be written as\n\nyi|xi, \u0398 \u223c J(cid:88)\n\n\u03c0jN (yi | xi\u03b2j, \u03c32\nj ).\n\n(2)\n\n1For multi-class classi\ufb01cation tasks, this work approximates each class separately, and thus X denotes the\nsamples in the same class and g(X) represents the corresponding predictions. Given that y is a probability\nvector, we conduct logit transformation before \ufb01tting a regression mixture model.\n\nj=1\n\n3\n\n\fWhere J is the hyper-parameter that specify the upper bound of the number of mixture components.\nTo estimate the parameters \u0398, a Bayesian non-parametric approach \ufb01rst models \u03c01:J through a\n\u201cstick-breaking\u201d prior process. With such modeling, parameters \u03c01:J can then be computed by\n\n(1 \u2212 ul)\n\nfor j = 2, ..., J \u2212 1,\n\n(3)\n\nj\u22121(cid:89)\n\nl=1\n\nwith \u03c01 = u1 and \u03c0J = 1 \u2212(cid:80)J\u22121\n\n\u03c0j = uj\n\nj V\u03b2) with hyperparameters m\u03b2 and V\u03b2.\n\nj is set to follow an inverse Gamma prior, i.e., \u03c32\n\nl=1 \u03c0l. Here, ul follows a beta prior distribution, Beta(1, \u03b1), parame-\nterized by \u03b1, where \u03b1 can be drawn from Gamma(e, f ) with hyperparameters e and f. To make the\nj \u223c Inv-Gamma(a, b) with\ncomputation ef\ufb01cient, \u03c32\nhyperparameters a and b. Given \u03c32\n1:J, for conventional Bayesian regression mixture model, \u03b21:J can\nbe drawn from Gaussian distribution N (m\u03b2, \u03c32\nAs is described above, using a mixture model to approximate a learning model, for any data sample\nwe can identify a regression model to best approximate the prediction of that sample. This is due to\nthe fact that a mixture model can be interpreted as arising from a clustering procedure which depends\non the underlying latent component indicators z1:n. For each observation (xi, yi), zi = j indicates\nthat the observation was generated from the j-th Gaussian distribution, i.e., yi|zi = j \u223c N (xi\u03b2j, \u03c32\nj )\nwith P (zi = j) = \u03c0j.\nDirichlet Process Mixture Model with Multiple Elastic Nets. Recall that a conventional mixture\nmodel has dif\ufb01culty not only in dealing with high dimensional data and highly correlated features but\nalso in handling different types of data heterogeneity. We modify the conventional mixture model by\nresetting the prior distribution of \u03b21:J to realize multiple elastic nets. Speci\ufb01cally, we \ufb01rst de\ufb01ne\nmixture distribution\n\nP (\u03b2j|\u03bb1,1:K, \u03bb2,1:K, \u03c32\n\nj ) =\n\nwkfk(\u03b2j|\u03bb1,k, \u03bb2,k, \u03c32\nj ),\n\n(4)\n\nprobabilities with(cid:80)K\n\nj\n\nfk\n\n(cid:32)\n\n(cid:1) \u221d \u03a6\n\n\u00d7(cid:88)\n\n(cid:12)(cid:12)(cid:12) \u2212 \u03bb1,k\n\n(cid:32) \u2212\u03bb1,k\n2\u03c3(cid:112)\u03bb2,k\n\n(cid:0)\u03b2j|\u03bb1,k, \u03bb2,k, \u03c32\n\nwhere K denotes the total number of component distributions, and w1:K represent component\nks follow a Dirichlet distribution, i.e., w1, w2,\u00b7\u00b7\u00b7 , wK \u223c\nk=1 wk = 1. Let w(cid:48)\nDir(1/K). Since we add elastic net regularization to the regression coef\ufb01cient \u03b21:J, instead of\nthe aforementioned normal distribution, we adopt the Orthant Gaussian distribution as the prior\n(cid:33)\u2212p\ndistribution according to [9]. To be speci\ufb01c, each \u03b2k follows a Orthant Gaussian prior, whose density\nfunction fk can be de\ufb01ned as\n\n1(\u03b2j \u2208 OZ).\n(5)\nHere, \u03bbi,k (i = 1, 2) is a pair of parameters which controls lasso and ridge regression for the k-th\ncomponent, respectively. We set both to follow Gamma conjugate prior with \u03bb1,k \u223c Gamma(R, V /2)\nand \u03bb2,k \u223c Gamma(L, V /2), where R, L, and V are hyperparameters. \u03a6(\u00b7) is the cumulative\ndistribution function of the univariate standard Gaussian distribution, and Z = {\u22121, +1}p is a\ncollection of all possible p-vectors with elements \u00b11. Let Zl = 1 for \u03b2jl \u2265 0 and Zl = \u22121 for\n\u03b2jl < 0. Then, OZ \u2282 Rp can be determined by vector Z \u2208 Z, indicating the corresponding orthant.\nGiven the prior distribution of fk de\ufb01ned in (5), it is dif\ufb01cult to compute the posterior distribution\nand sample from it. To obtain a simpler form, we use the mixture representation of the prior\ndistribution (5). To be speci\ufb01c, we introduce a latent variable \u03c4 1:p and rewrite the (5) into the\nfollowing hierarchical form2\n\n\u03c32\nj\n\u03bb2,k\n\nN\n\n\u03b2j\n\n(cid:33)\n\n2\u03bb2,k\n\nZ,\n\nIp\n\nZ\u2208Z\n\nK(cid:88)\n\nk=1\n\n\u03b2j | \u03c4 j, \u03c32\n\nj , \u03bb2,cj \u223c N\n\n\u03b2j\n\nj , \u03bb1,cj , \u03bb2,cj \u223c p(cid:89)\n\n\u03c4 j | \u03c32\n\nInv-Gamma(0,1)\n\nl=1\n\n(cid:32)\n\n(cid:33)\n(cid:32)\n\n(cid:12)(cid:12)(cid:12) 0,\n\uf8eb\uf8ed\u03c4jl\n\n\u03c32\nj\n\u03bb2,cj\n\nS\u03c4 j\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) 1\n\n2\n\n,\n\n1\n2\n\n, and\n\n(cid:112)\u03bb2,cj\n\n\u03bb1,cj\n\n2\u03c3j\n\n(cid:33)2\uf8f6\uf8f8 ,\n\n(6)\n\n(7)\n\n2More details about the derivation of the scale mixture representation and the proof of equivalence can be\n\nfound in [9, 18].\n\n4\n\n\f(a) Generalizable insights extracted from MLP.\n\n(b) Generalizable insights extracted from CNNs.\n\nFigure 1: The illustration of Generalizable insights extracted from the MLP trained for recognizing\nhandwritten digits and the CNNs for \ufb01tting the Fashion-MNIST dataset. Each pattern contains 150\npixels, the importance of which is illustrated by the heat map. Due to the space limit, the results of\nother categories are shown in supplementary material.\n\nwhere \u03c4 j \u2208 Rp denotes latent variables and S\u03c4 j \u2208 Rp\u00d7p, with S\u03c4 j = diag(1 \u2212 \u03c4jl) for l = 1,\u00b7\u00b7\u00b7 , p.\nSimilar to component indicator zi introduced in the previous section, here, we introduce a set of\nlatent regularization indicators c1:J. For each parameter \u03b2j, cj = k indicates that parameter follows\ndistribution fk(\u00b7) with P (cj = k) = wk.\nPosterior Computation and Post-MCMC Analysis. We develop a customized MCMC method\ninvolving a combination of Gibbs sampling and Metropolis-Hastings algorithm for parameter infer-\nence [28]. Basically, it involves augmentation of the model parameter space by the aforementioned\nmixture component indicators z1:n and c1:J. These indicators enable simulation of relevant con-\nditional distributions for model parameters. As the MCMC proceeds, they can be estimated from\nrelevant conditional posteriors and thus we can jointly obtain posterior simulations for model pa-\nrameters and mixture component indicators. We provide the details of posterior distribution and the\nimplementation of updating the parameters in the supplementary material. Considering that \ufb01tting a\nmixture model with MCMC suffers from the well-known label switching problem, we use an iterative\nrelabeling algorithm introduced in [3].\n\n4 Evaluation\n\nRecall that the motivation of our proposed method is to increase the transparency for complex ML\nmodels so that users could leverage our approach to not only understand an individual decision\n(explainability) but also to obtain insights into the strength and vulnerabilities of the target model\n(scrutability). The experimental evaluation of the proposed method thus focuses on the aforemen-\ntioned two aspects \u2013 scrutability and explainability.\n\n4.1 Scrutability\n\nMethodology. As a \ufb01rst step, we utilize Keras [2] to train an MLP on MNIST dataset [16] and CNNs\nto classify clothing images in Fashion-MNIST dataset [34] respectively. These machine learning\nmethods represent the techniques most commonly used for the corresponding classi\ufb01cation tasks. We\ntrained these model to achieve more than decent classi\ufb01cation performance. We then treat these two\nmodels as our target models and apply our proposed approach to establish scrutability.\nWe de\ufb01ne the scrutability of an explanation method as the ability to distill generalizable insights\nfrom the model under examination. In this work, generalizable insights refer to feature importance\ninferences that could be generalized across all cases. Admittedly, the \ufb01delity of our proposed solution\nto the target model is an important prerequisite to any generalizable insights our solution extracts. In\nthis section, we carry out experiments to empirically evaluate the \ufb01delity while also demonstrating\nscrutability of our solution. We apply the following procedures to obtain experimentation data.\n\n1. Construct bootstrapped samples from the training data and nullify the top important pixels\nidenti\ufb01ed by our approach among positive cases while replacing the same pixels in negative\ncases with the mean value of those features among positive samples.\n\n2. Apply random pixel nulli\ufb01cation/replacement to the same bootstrapped samples used in\n\nprevious step from the training data.\n\n3. Construct test cases that register positive properties for the top important pixels while\n\nrandomly assign value for the remaining pixels.\n\n5\n\n01234T- shirtShirtSneakerBagAnkle boot\f(a) Bootstrapped positive samples. (b) Bootstrapped negative samples.\n\n(c) New testing cases.\n\nFigure 2: Results of \ufb01delity validation. Note that PCR in y-axis denotes positive classi\ufb01cation rate and NFeature\nin x-axis refers to number of features. In the legend, B indicates selecting features through our Bayesian approach\nand R represents selecting features through random pick. M and FM denote datasets MNIST and Fashion-MNIST\nrespectively. Due to the space limit, the results of other categories are shown in supplementary material.\n\n4. Construct randomly created test cases (i.e., assigning random value to all pixels) as baseline\n\nsamples for the new test cases.\n\nWe then compare the target model classi\ufb01cation performance among synthetic samples crafted via\nprocedures mentioned above. The intuition behind this exercise is that if the \ufb01delity/scrutability of\nour proposed solution holds, we should be able to see signi\ufb01cant impact on the classi\ufb01cation accuracy.\nMoreover, the magnitude of the impact should signi\ufb01cantly outweigh that observed from randomly\nmanipulating features. In the following, we describe our experiment tactics and \ufb01ndings in greater\ndetails.\nExperimental Results. Figure 1 illustrates the generalizable insights (i.e., important pixels in MNIST\nand Fashion-MNIST datasets) that our proposed solution distilled from the target MLP and CNNs\nmodels, respectively. To validate the faithfulness of these insights and establish \ufb01delity of our\nproposed solution, we conduct the following experiment.\nFirst, bootstrapped samples, each contains a random draw of 30% of the original cases, are constructed\nfrom the MNIST and Fashion-MNIST datasets. For cases that are originally identi\ufb01ed as positive for\ncorresponding classes by the target models (i.e., MLP and CNNs), we nullify top 50/75/100/125/150\nimportant features identi\ufb01ed by our proposed solution respectively, while forcing the value of\ncorresponding features in the negative samples equal to the mean value of those among the positive\nsamples. These manipulated cases are then supplied to the the target model and we measure the\nproportion of cases that those models would classify as positive under each condition. In addition,\nwe apply the same perturbations on randomly selected 50/75/100/125/150 features in the same\nbootstrapped sample and measure the target model\u2019s positive classi\ufb01cation rate after the manipulation\nas a baseline for comparison. We repeat such a process for 50 times for both datasets to account for\nthe statistical uncertainty in the measured classi\ufb01cation rate.\nFigure 3a, Figure 3b and supplementary material showcase some of the aforementioned bootstrapped\nsamples. Figure 2a and Figure 2b summarize the experimental results we obtain from the procedures\nmentioned above. As is illustrated in both \ufb01gures, the classi\ufb01cation rates of the target models on\nthese perturbed samples are impacted dramatically once we start manipulating top 50/75 important\nfeatures (i.e., around 9% of the pixels in each image) identi\ufb01ed by our proposed solution in these\nimages. However, we do not observe any signi\ufb01cant impact to the model\u2019s classi\ufb01cation performance\nif we randomly perturb the same number of pixels. Non-overlapping 95% con\ufb01dence intervals of\nthe post-manipulation classi\ufb01cation performance also reveal that the impact of these top features is\nsigni\ufb01cantly greater than the features selected at random. Moreover, the fact that we start observing\ndramatic impact in the target models\u2019 classi\ufb01cation performance after we manipulate less than 9%\nof the total features justi\ufb01es the faithfulness of our proposed approach to the ML models under\nexamination.\nTo further validate the \ufb01delity of the insights illustrated in Figure 1, we construct new testing cases\nbased on top 50/75/100/125/150 pixels deemed important by our proposed solution respectively\nand measure the proportion of these testing samples that are classi\ufb01ed as positive cases by the\ntarget models. We also create testing cases by randomly \ufb01lling 50/75/100/125/150 pixels within the\nimages and measure the positive classi\ufb01cation rate as a baseline. The intuition behind this exercise is\nthat, similar to the experiments described earlier, we would like to see signi\ufb01cantly higher positive\nclassi\ufb01cation rates leveraging the insights from our proposed solution than creating cases around\nrandomly selected pixels. In Figure 3c and supplementary material, we showcase some insights\n\n6\n\n\f(a) Bootstrapped positive samples.\n\n(b) Bootstrapped negative samples.\n\n(c) New cases samples.\n\nFigure 3: Samples manipulated or crafted for scrutability evaluation.\n\n(a) Original data samples.\n\n(b) DMM-MEN.\n\n(c) LIME.\n\n(d) SHAP.\n\nFigure 4: The examples explaining individual predictions obtained from MLP and CNNs. It should be\nnoted that, since the images in MNIST and Fashion-MNIST has black background, to better illustrate\nthe difference, we change segments of these images to grey if they are not selected.\n\ndriven testing cases. As is shown in Figure 2c, insights driven testing cases have much higher success\nrates than the cases created around random pixels. In fact, we observe that even if we randomly \ufb01ll\n150 pixels (which is close to 20% of the pixels in an image), the positive classi\ufb01cation rate remains\nextremely low across classes. On the contrary, we notice that with the cases created based on the\ntop 50 important pixels (i.e., 9% of all pixels in an image) deemed by our solution, we could already\nachieve around 50% success rate. For some speci\ufb01c outcome categories, we could even achieve a\nmuch higher success rate.\nIt is worth noting that aforementioned experiments also unveil the vulnerabilities and sensitivities\nof the target MLP and CNNs models. It does not seem to matter if a handwritten digit or a fashion\nproduct is visually recognizable in an image, the model will classify it to the corresponding category\nwith a high con\ufb01dence as long as the important features indicated in the heat map are \ufb01lled with\ngreater values (see Figure 3b). In other words, both the MLP and CNNs models evaluated in this\nstudy are very sensitive to these pixels but could also be vulnerable to pathological image samples\ncrafted based on such insights. Figure 3a and Figure 3c are two additional examples. A sample\n(Figure 3a) might carry the right semantics, the learning model still might be blind to that sample if\nthe pixels corresponding to important features are \ufb01lled with smaller values. On the other hand, a\nvery noisy sample (Figure 3c) could still be correctly classi\ufb01ed as long as the pixels corresponding to\nimportant features are assigned with decent values.\n\n4.2 Explainability\n\nOur proposed solution does not only extract generalizable insights from the target models but also\ndemonstrate superior performance in explaining individual decisions. To illustrate its superiority, we\ncompare our approach with a couple of state-of-the-art explainable approaches, namely LIME and\nSHAP. In particular, we evaluate the explainability of these approaches by comparing the explanation\nfeature maps and more importantly quantitatively measuring their relative superiority in identifying\nin\ufb02uential features in individual decisions.\nAs is introduced in the aforementioned section, we also evaluate the explainability of our proposed\nsolution on the VGG16 model [27] trained from ImageNet dataset [5]. Due to the ultra high\ndimensionality concern, which we will discuss in the following section, we adopt the methodology in\n[23] to generate data to explain individual decisions. More speci\ufb01cally, we create a new dataset by\nrandomly sampling around the data sample that needed to be explained, reducing the dimensionality\nof the newly crafted dataset by certain dimension reduction method [23] and \ufb01tting the approximation\nmodel.\n\n7\n\n\fTable 1: Quantitative evaluation results of explainability\n\nDMM-MEN\n\nProbability\n\n(Con\ufb01dence Interval)\n\n(99.74%, 100%)\n\n99.89%\n\n97.59%\n\n69.36%\n\n(97.32%, 97.89%)\n\n(47.88%, 90.18%)\n\nAccuracy\n\n100%\n\n100%\n\n85.6%\n\nMNIST\n\nFashion-MNIST\n\nImageNet\n\nLIME\n\nProbability\n\n(Con\ufb01dence Interval)\n\n(99.69%, 100%)\n\n99.84%\n\n93.49%\n\n47.46%\n\n(92.92%, 94.07%)\n\n(31.34%, 68.58%)\n\nAccuracy\n\n99.95%\n\n98.32%\n\n66.05%\n\nSHAP\n\nProbability\n\n(Con\ufb01dence Interval)\n\n(93.99%, 94.03%)\n\n(85.23%, 86.65%)\n\n94.01%\n\n86.03%\n\n7.85%\n\n(5.88%, 28.82%)\n\nAccuracy\n\n94.10%\n\n90.10%\n\n14.20%\n\nFigure 4a and supplementary material illustrate ten handwritten digits and ten fashion products\nrandomly selected from each of the classes in MNIST and Fashion-MNIST datasets, respectively.\nWe apply our solution as well as LIME and SHAP to each of the images shown in the \ufb01gure and then\nselect and highlight the top 20 segments that each approach deems important to the decision made\nby deep neural network classi\ufb01ers. The results are presented in Figure 4b, Figure 4c, Figure 4d\nand supplementary material for our approach, LIME and SHAP, respectively. As we can observe in\nthese \ufb01gures, our approach nearly perfectly highlights the contour of each digit and fashion product,\nwhereas LIME and SHAP identify only the partial contour of each digit and product and select more\nbackground parts than our approach.\nFigure 4a also has two images we randomly selected from ImageNet dataset. The left image\nhas only one object and the other image has two. Figure 4b to Figure 4d demonstrate the top\n10 segments pinpointed by three explanation techniques. The results shown in these \ufb01gures are\nconsistent with those of MNIST and Fashion-MNIST. More speci\ufb01cally, the proposed approach\ncan precisely highlight the object in the images, while the other approaches only partly identify\nthe object and even select some background noise as important features. In order to evaluate the\n\ufb01delity of these explanation results, we input these feature images back to VGG16 and record the\nprediction probabilities of the true labels (tiger cat, lion and tiger cat). Figure 4b achieved the highest\nprobabilities on each feature map, which from the left to right are 93.20%, 78.51% and 92.70%. Note\nthat in the fourth image of Figure 4b, while identifying a lion in the image, our approach highlights\nthe moustache of the cat, which seems like a wrong selection. However, if we exclude this part from\nthe image, the probability of the object belonging to lion drops from 78.51% to 20.31%. This result\nshowcases a false positive of VGG16 and indicates that we can still \ufb01nd the weakness of the target\nmodel even from the individual explanations.\nTo further quantify the relative performance in explainability, we also conduct the following experi-\nment. First we randomly select 10000 data samples from aforementioned datasets. Then, we apply\nour approach as well as two state-of-the-art solutions (i.e., LIME and SHAP) to extract top 20 important\nsegments (top 10 segments for ImageNet dataset). We then manipulate these samples based on the\nsegments identi\ufb01ed via three approaches. To be speci\ufb01c, we only keep the top important pixels intact\nwhile nullifying the remaining pixels and supply these manipulated samples to the target models\nand evaluate the classi\ufb01cation accuracy. Table 1 shows the accuracy of these feature images being\nclassi\ufb01ed to the corresponding truth categories as well as the means and the 95% con\ufb01dence interval\nof the prediction probabilities. The results indicate that our approach offers better resolution and\nmore granular explanations to individual predictions. One possible explanation is that both LIME\nand SHAP assume the local decision boundary of the target model to be linear while the proposed\napproach conducts the variable selection by applying a non-linear approximation.\nIt is known that Bayesian non-parametric models are computationally expensive. However, It does\nnot mean that we cannot use the proposed approach in the real-world applications. In fact, we have\nrecorded the latency of the proposed approach on explaining individual samples in three datasets. The\nrunning times are for MNIST, Fashion-MNIST and ImageNet are 37.5s, 44s and 139.2s, respectively.\nAs to approximating the global decision boundary, the running times are 105 mins on MNIST and\n115 mins on Fashion-MNIST. It is believed that the latency of our approach is still within the range\nof normal training time for complex ML models.\n\n5 Discussion\n\nScalability. As is shown in Section 4, our proposed solution does not impose incremental challenge\non scalability. We can still further accelerate the algorithm to improve its scalability. More speci\ufb01cally,\n\n8\n\n\fcurrent advances in Bayesian computation approaches allow the MCMC methods to be used for\nbig data analysis, such as adopting Bootstrap Metropolis\u2013Hastings Algorithm [19], applying divide\nand conquer approaches [30] and even taking advantage of GPU programming to speed up the\ncomputation [31].\nData Dimensionality. Our evaluation described in Section 4 indicates that the proposed solution\n(DMM-MEN) could extract generalizable insights even from high dimensional data (e.g. Fashion\nMNIST). However, when it comes to ultra high-dimensional data, getting generalizable insights\ncould still be a challenge. One obvious reason is that we do not have suf\ufb01cient data to infer all\nthe parameters. More importantly, even if we had enough data, it would be very computationally\nexpensive. Arguably, one solution is to reduce the dimensionality of such ultra high dimensional\ndata while preserving the original data distribution. However, take ImageNet dataset as an example.\nEven the state-of-the-art dimensionality reduction methods (i.e., the one used in [23]) could not\nsatisfactorily preserve the whole data distribution. This indeed speaks to the limitation of our proposed\nsolution in extracting generalizable insights when it comes to speci\ufb01c datasets. Nevertheless, it does\nnot affect our solution\u2019s ability in precisely explaining individual predictions even when it comes\nto ultra high dimensional data. As is shown in Section 4, our solution signi\ufb01cantly outperforms the\nstate-of-the-art solutions in explaining individual decisions made on ultra-high dimensional data\nsamples.\nOther Applications and Learning Models. While we evaluate and demonstrate the capability of\nour proposed technique only on the image recognition using deep learning models, the proposed\napproach is not limited to such a learning task and models. In fact, we also evaluated our technique\non other learning tasks with various learning models. We observed the consistent superiority in\nextracting global insight and explaining individual decisions. Due to the space limit, we specify those\nexperiment results in our supplementary material submitted along with this manuscript.\n\n6 Conclusion and Future Work\n\nThis work introduces a new technical approach to derive generalizable insights for complicated\nML models. Technically, it treats a target ML model as a black box and approximates its decision\nboundary through DMM-MEN. With this approach, model developers and users can approximate\ncomplex ML models with low errors and obtain better explanations of individual decisions. More\nimportantly, they can extract generalizable insights learned by a target model and use it to scrutinize\nmodel strengths and weaknesses. While our proposed approach exhibits outstanding performance in\nexplaining individual decisions, and provides a user with an ability to discover model weaknesses, its\nperformance may not be good enough when applied to interpreting temporal learning models (e.g.,\nrecurrent neural networks). This is due to the fact that, our approach takes features independently\nwhereas time series analysis deals with features temporally dependent. As part of the future work, we\nwill therefore equip our approach with the ability of dissecting temporal learning models.\n\nAcknowledgments We gratefully acknowledge the funding from NSF grant CNS-1718459\nand the support of NVIDIA Corporation with the donation of the GPU. We also would like to thank\nanonymous reviewers, Kaixuan Zhang, Xinran Li and Chenxin Ma for their helpful comments.\n\nReferences\n[1] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. M\u00fcller, and W. Samek. On pixel-wise\nexplanations for non-linear classi\ufb01er decisions by layer-wise relevance propagation. PloS one,\n2015.\n\n[2] F. Chollet et al. Keras, 2015.\n\n[3] A. J. Cron and M. West. Ef\ufb01cient classi\ufb01cation-based relabeling in mixture models. The\n\nAmerican Statistician, 2011.\n\n[4] P. Dabkowski and Y. Gal. Real time image saliency for black box classi\ufb01ers. In Proceedings of\n\nthe 31st Conference on Neural Information Processing Systems (NIPS), 2017.\n\n9\n\n\f[5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical\nIn Proceedings of the 22nd Conference on Computer Vision and Pattern\n\nimage database.\nRecognition. (CVPR), 2009.\n\n[6] R. Fong and A. Vedaldi. Interpretable explanations of black boxes by meaningful perturbation.\n\nIn Proceedings of the 16th International Conference on Computer Vision (ICCV), 2017.\n\n[7] N. Frosst and G. Hinton. Distilling a neural network into a soft decision tree. arXiv preprint\n\narXiv:1711.09784, 2017.\n\n[8] C. Gan, N. Wang, Y. Yang, D.-Y. Yeung, and A. G. Hauptmann. Devnet: A deep event network\nfor multimedia event detection and evidence recounting. In Proceedings of the 28th Conference\non Computer Vision and Pattern Recognition. (CVPR), 2015.\n\n[9] C. Hans. Elastic net regression modeling with the orthant normal prior. Journal of the American\n\nStatistical Association, 2011.\n\n[10] C. Hennig. Methods for merging gaussian mixture components. Advances in data analysis and\n\nclassi\ufb01cation, 2010.\n\n[11] H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the\n\nAmerican Statistical Association, 2001.\n\n[12] B. Kim, R. Khanna, and O. O. Koyejo. Examples are not enough, learn to criticize! criticism\nfor interpretability. In Proceedings of the 30th Conference on Neural Information Processing\nSystems (NIPS), 2016.\n\n[13] W. Knight. The dark secret at the heart of ai. https://www.technologyreview.com/s/\n\n604087/, 2017.\n\n[14] W. Knight.\n\nThe \ufb01nancial world wants to open ai\u2019s black boxes.\n\ntechnologyreview.com/s/604122/, 2017.\n\nhttps://www.\n\n[15] P. W. Koh and P. Liang. Understanding black-box predictions via in\ufb02uence functions. In\n\nProceedings of the 34th International Conference on Machine Learning (ICML), 2017.\n\n[16] Y. LeCun, C. Cortes, and C. J. Burges. The mnist database of handwritten digits, 1998.\n\n[17] J. Li, W. Monroe, and D. Jurafsky. Understanding neural networks through representation\n\nerasure. arXiv preprint arXiv:1612.08220, 2016.\n\n[18] Q. Li, N. Lin, et al. The bayesian elastic net. Bayesian Analysis, 2010.\n\n[19] F. Liang, J. Kim, and Q. Song. A bootstrap metropolis\u2013hastings algorithm for bayesian analysis\n\nof big data. Technometrics, 2016.\n\n[20] Z. C. Lipton. The mythos of model interpretability. arXiv preprint arXiv:1606.03490, 2016.\n\n[21] S. M. Lundberg and S.-I. Lee. A uni\ufb01ed approach to interpreting model predictions.\n\nIn\nProceedings of the 31st Conference on Neural Information Processing Systems (NIPS), 2017.\n\n[22] J.-M. Marin, K. Mengersen, and C. P. Robert. Bayesian modelling and inference on mixtures of\n\ndistributions. Handbook of statistics, 2005.\n\n[23] M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions\nof any classi\ufb01er. In Proceedings of the 22nd International Conference on Knowledge Discovery\nand Data Mining (KDD), 2016.\n\n[24] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual\nexplanations from deep networks via gradient-based localization. arxiv. org/abs/1610.02391 v3,\n2016.\n\n[25] A. Shrikumar, P. Greenside, and A. Kundaje. Learning important features through propagating\nactivation differences. In Proceedings of the 34th International Conference on Machine Learning\n(ICML), 2017.\n\n10\n\n\f[26] K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising\n\nimage classi\ufb01cation models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.\n\n[27] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image\nrecognition. In Proceedings of the 3rd International Conference on Learning Representations\n(ICLR), 2015.\n\n[28] A. Smith and G. Roberts. Bayesian computation via the gibbs sampler and related markov chain\n\nmonte carlo methods. Journal of the Royal Statistical Society. Series B, pages 3\u201323, 1993.\n\n[29] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity:\nThe all convolutional net. In Proceedings of the 3rd International Conference on Learning\nRepresentations Workshop (ICLR Workshop), 2015.\n\n[30] S. Srivastava, V. Cevher, Q. Dinh, and D. Dunson. Wasp: Scalable bayes via barycenters of\nsubset posteriors. In Proceedings of the 18th International Conference on Arti\ufb01cial Intelligence\nand Statistics (AISTATS), 2015.\n\n[31] M. Suchard, Q. Wang, C. Chan, F. J., A. Cron, and M. West. Understanding gpu program-\nming for statistical computation: Studies in massively parallel massive mixtures. Journal of\ncomputational and graphical statistics, 2010.\n\n[32] M. Sundararajan, A. Taly, and Q. Yan. Gradients of counterfactuals.\n\narXiv:1611.02639, 2016.\n\narXiv preprint\n\n[33] M. Wu, M. C. Hughes, S. Parbhoo, M. Zazzi, V. Roth, and F. Doshi-Velez. Beyond sparsity: Tree\nregularization of deep models for interpretability. In Proceedings of the 32nd AAAI Conference\non Arti\ufb01cial Intelligence (AAAI), 2018.\n\n[34] H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset for benchmarking\n\nmachine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.\n\n[35] H. Yang, D. Dunson, and D. Banks. The multiple bayesian elastic net. submitted for publication,\n\n2011.\n\n[36] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In Proceed-\n\nings of the 13rd European Conference on Computer Vision (ECCV), 2014.\n\n[37] L. M. Zintgraf, T. S. Cohen, T. Adel, and M. Welling. Visualizing deep neural network decisions:\nPrediction difference analysis. In Proceedings of the 5th International Conference on Learning\nRepresentations (ICLR), 2017.\n\n[38] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the\n\nRoyal Statistical Society: Series B (Statistical Methodology), 2005.\n\n11\n\n\f", "award": [], "sourceid": 2204, "authors": [{"given_name": "Wenbo", "family_name": "Guo", "institution": "Pennsylvania State University"}, {"given_name": "Sui", "family_name": "Huang", "institution": "Netflix"}, {"given_name": "Yunzhe", "family_name": "Tao", "institution": "Columbia University"}, {"given_name": "Xinyu", "family_name": "Xing", "institution": "Penn State University"}, {"given_name": "Lin", "family_name": "Lin", "institution": "The Pennsylvania State University"}]}