{"title": "Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections", "book": "Advances in Neural Information Processing Systems", "page_first": 4874, "page_last": 4885, "abstract": "We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user when the network's output is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfaction problems. The technique is evaluated on three neural network models: one predicting whether an applicant will pay a mortgage, one predicting whether a first-order theorem can be proved efficiently by a solver using certain heuristics, and the final one judging whether a drawing is an accurate rendition of a canonical drawing of a cat.", "full_text": "Interpreting Neural Network Judgments via\nMinimal, Stable, and Symbolic Corrections\n\nXin Zhang\nCSAIL, MIT\n\nxzhang@csail.mit.edu\n\nArmando Solar-Lezama\n\nCSAIL, MIT\n\nasolar@csail.mit.edu\n\nRishabh Singh\nGoogle Brain\n\nrising@google.com\n\nAbstract\n\nWe present a new algorithm to generate minimal, stable, and symbolic corrections to\nan input that will cause a neural network with ReLU activations to change its output.\nWe argue that such a correction is a useful way to provide feedback to a user when\nthe network\u2019s output is different from a desired output. Our algorithm generates\nsuch a correction by solving a series of linear constraint satisfaction problems. The\ntechnique is evaluated on three neural network models: one predicting whether an\napplicant will pay a mortgage, one predicting whether a \ufb01rst-order theorem can\nbe proved ef\ufb01ciently by a solver using certain heuristics, and the \ufb01nal one judging\nwhether a drawing is an accurate rendition of a canonical drawing of a cat.\n\n1\n\nIntroduction\n\nWhen machine learning is used to make decisions about people in the real world, it is extremely\nimportant to be able to explain the rationale behind those decisions. Unfortunately, for systems\nbased on deep learning, it is often not even clear what an explanation means; showing someone the\nsequence of operations that computed a decision provides little actionable insight. There have been\nsome recent advances towards making deep neural networks more interpretable (e.g. [21]) using\ntwo main approaches: i) generating input prototypes that are representative of abstract concepts\ncorresponding to different classes [23] and ii) explaining network decisions by computing relevance\nscores to different input features [1]. However, these explanations do not provide direct actionable\ninsights regarding how to cause the prediction to move from an undesirable class to a desirable class.\nIn this paper, we argue that for the speci\ufb01c class of judgment problems, minimal, stable, and symbolic\ncorrections are an ideal way of explaining a neural network decision. We use the term judgment to\nrefer to a particular kind of binary decision problem where a user presents some information to an\nalgorithm that is supposed to pass judgment on its input. The distinguishing feature of judgments\nrelative to other kinds of decision problems is that they are asymmetric; if I apply for a loan and I get\nthe loan, I am satis\ufb01ed, and do not particularly care for an explanation; even the bank may not care\nas long as on aggregate the algorithm makes the bank money. On the other hand, I very much care\nif the algorithm denies my mortgage application. The same is true for a variety of problems, from\ncollege admissions, to parole, to hiring decisions. In each of these cases, the user expects a positive\njudgment, and would like an actionable explanation to accompany a negative judgment.\nWe argue that a correction is a useful form of feedback; what could I have done differently to elicit\na positive judgment? For example, if I applied for a mortgage, knowing that I would have gotten a\npositive judgment if my debt to income ratio (DTI) was 10% lower is extremely useful; it is actionable\ninformation that I can use to adjust my \ufb01nances. We argue, however, that the most useful corrections\nare those that are minimal, stable and symbolic.\nFirst, in order for a correction to be actionable, the corrected input should be as similar as possible\nfrom the original offending input. For example, knowing that a lower DTI would have given me the\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\f(a) Mortgage Underwriting\n\n(b) Solver Performance Prediction\n\nFigure 1: Symbolic explanations generated by our approach for neural networks in different domains.\n\n(c) Drawing Tutoring\n\nloan is useful, but knowing that a 65 year old billionaire from Nebraska would have gotten the loan is\nnot useful. Minimality must be de\ufb01ned in terms of an error model which speci\ufb01es which inputs are\nsubject to change and how. For a bank loan, for example, debt, income and loan amount are subject\nto change within certain bounds, but I will not move to another state just to satisfy the bank.\nSecond, the suggested correction should be stable, meaning that there should be a neighborhood of\npoints surrounding the suggested correction for which the outcome is also positive. For example,\nif the algorithm tells me that a 10% lower DTI would have gotten me the mortgage, and then six\nmonths later I come back with a DTI that is 11% lower, I expect to get the mortgage, and will be\nextremely disappointed if the bank says, \u201coh, sorry, we said 10% lower, not 11% lower\u201d. So even\nthough for the neural network it may be perfectly reasonable to give positive judgments to isolated\npoints surrounded by points that get negative judgments, corrections that lead to such isolated points\nwill not be useful.\nFinally, even if the correction is minimal and robust, it is even better if rather than a single point,\nthe algorithm can produce a symbolic correction that provides some insight about the relationship\nbetween different variables. For example, knowing that for someone like me the bank expects a DTI\nof between 20% and 30% is more useful than just knowing a single value. And knowing something\nabout how that range would change as a function of my credit score would be even more useful still.\nIn this paper, we present the \ufb01rst algorithm capable of computing minimal stable symbolic corrections.\nGiven a neural network with ReLU activations and an input with a negative judgment, our algorithm\nproduces a symbolic description of a space of corrections such that any correction in that space\nis guaranteed to change the judgment. In the limit, the algorithm will \ufb01nd the closest region with\na volume above a given threshold. Internally, our algorithm reduces the problem into a series of\nlinear constraint satisfaction problems, which are solved using the Gurobi linear programming (LP)\nsolver [9]. We show that in practice, the algorithm is able to \ufb01nd good symbolic corrections in 12\nminutes on average for small but realistic networks. While the running time is dominated by solver\ninvocations, only under 2% of it is spent is spent on actual solving and the majority of the time\nis spent on creating these LP instances. We evaluate our approach on three neural networks: one\npredicting whether an applicant will pay a mortgage, one predicting whether a \ufb01rst-order theorem can\nbe proved ef\ufb01ciently by a solver using certain heuristics, and the other judging whether a drawing is\nan accurate rendition of a canonical drawing of a cat.\nExplanation showcases. Figure 1 shows example explanations generated by our approach on the\naforementioned networks. Figure 1(a) suggests a mortgage applicant to change DTI and interest rate\nin order to get their application accepted. While the red cross represents the original application, the\nblue triangle represents the symbolic correction (i.e. the region of points that all lead to a positive\noutcome). Since the user may only be able to change DTI and interest rates often vary between\napplications, it is essential to provide a symbolic correction rather than a concrete correction to\nmake the feedback actionable. Figure 1(b) suggests a user to reformulate a \ufb01rst-order theorem when\nthe network predicts it as challenging to solve. Intuitively, either reducing the problem size (by\ndecreasing average clauses lengths) or providing a partial solution (by adding unit clauses) would\nreduce the problem complexity. Finally, Figure 1(c) shows how to add lines to a drawing so that it\ngets recognized by the network as a canonical cat drawing. The red lines represent the original input,\nwhile the blue boxes represent the symbolic correction and the cyan lines represent one concrete\ncorrection in it. Brie\ufb02y, any concrete correction whose vertices fall into the blue boxes would make\nthe drawing pass the network\u2019s judgment. Comparing to the previous two corrections which only\ninvolves 2 features, this correction involves 8 features (coordinates of each vertex) and can go upto\n20 features. This highlights our approach\u2019s ability to generate relatively complex corrections.\n\n2\n\n0.080.10Interest Rate0.240.260.280.300.32Debt-to-Income0.51.0% Clauses That Are Unit02468Avg. Clause Length\f2 Background and Problem De\ufb01nition\n\nWe \ufb01rst introduce some notations we will use in explaining our approach. Suppose F is a neural\n1 of\nnetwork with ReLU activation. In the model we consider, the input to F is a (column) vector v0\nsize s0. The network computes the output of each ReLU (hidden or output) layer as\n\nvi+1 = fi(vi) = ReLU(W ivi + bi)\n\nWhere W i is an si+1 \u00d7 si matrix, bi is a vector of size si+1, and ReLU applies the recti\ufb01er function\nelementwise to the output of the linear operations.\nWe focus on classi\ufb01cation problems, where the classi\ufb01cation of input v is obtained by\n\nlF (v) \u2208 argmaxiF (v)[i].\n\nWe are speci\ufb01cally focused on binary classi\ufb01cation problems (that is, lF (v) \u2208 {0, 1}). The judgment\nproblem is a special binary classi\ufb01cation problem where one label is preferable than the other. We\nassume 1 is preferable throughout the paper.\nThe judgment interpretation problem concerns providing feedback in the form of corrections when\nlF (v) = 0. A correction \u03b4 is a real vector of input vector length such that lF (v + \u03b4) = 1. As\nmentioned previously, a desirable feedback should be a minimal, stable, and symbolic correction.\nWe \ufb01rst introduce what it means for a concrete correction \u03b4 to be minimal and stable. Minimality is\nde\ufb01ned in terms of a norm (cid:107)\u03b4(cid:107) on \u03b4 that measures the distance between the corrected input and the\noriginal input. For simplicity, we use L1 norm to measure the sizes of all vectors throughout Section 2\nand Section 3. We say \u03b4 is e-stable if for any \u03b4\n) = 1.\nA symbolic correction \u2206 is a connected set of concrete corrections. More concretely, we will use a\nset of linear constraints to represent a symbolic correction. We say a symbolic correction is e-stable\nif there exists a correction \u03b4 \u2208 \u2206 such that for any \u03b4\n(cid:48) \u2208 \u2206. We call\nsuch a correction a stable region center inside \u2206. To de\ufb01ne minimality, we de\ufb01ne the distance of\n\u2206 from the original input using the distance of a stable region center that has the smallest distance\namong all stable region centers. More formally:\n\n(cid:48)(cid:107) \u2264 e, we have lF (v + \u03b4\n\n(cid:48) \u2212 \u03b4(cid:107) \u2264 e, we have \u03b4\n\n(cid:48) such that if (cid:107)\u03b4 \u2212 \u03b4\n\n(cid:48) where (cid:107)\u03b4\n\n(cid:48)\n\ndise(\u2206) := min\u03b4\u2208S(cid:107)\u03b4(cid:107),\n\n(cid:48)\n\n(cid:48) \u2212 \u03b4(cid:107) \u2264 e =\u21d2 \u03b4\n\n.(cid:107)\u03b4\nwhere S := {\u03b4 \u2208 \u2206 | \u2200\u03b4\nso we de\ufb01ne dise(\u2206) := \u221e.\nWe can now de\ufb01ne the judgment interpretation problem.\nDe\ufb01nition 1. (Judgment Interpretation) Given a neural network F , an input vector v such that\nlF (v) = 0, and a real value e, a judgment interpretation is an e-stable symbolic correction \u2206 with\nthe minimum distance among all e-stable symbolic corrections.\n\n(cid:48) \u2208 \u2206}. When \u2206 is not e-stable, S will be empty,\n\n3 Our Approach\n\nAlgorithm 1 outlines our approach to \ufb01nd a judgment interpretation for a given neural network F\nand an input vector v. Besides these two inputs, it is parameterized by a real e and an integer n. The\nformer speci\ufb01es the radius parameter in our stability de\ufb01nition, while the latter speci\ufb01es how many\nfeatures are allowed to vary to produce the judgment interpretation. We parameterize the number of\nfeatures to change as high-dimension interpretations can be hard for end users to understand. For\ninstance, it is very easy for a user to understand if the explanation says their mortgage would be\napproved as long as they change the DTI and the credit score while keeping the other features as they\nwere. On the other hand, it is much harder to understand an an interpretation that involves all features\n(in our experiment, there are 21 features for the mortgage underwriting domain). The output is a\njudgment interpretation that is expressed in a system of linear constraints, which are in the form of\n\nAx + b \u2265 0,\n\nwhere x is a vector of variables, A is a matrix, and b is a vector.\nAlgorithm 1 \ufb01nds such an interpretation by iteratively invoking \ufb01ndProjectedInterpretation (Algo-\nrithm 2) to \ufb01nd an interpretation that varies a list of n features s. It returns the one with the least\n\n1Unless speci\ufb01ed, all vectors in the paper are by columns.\n\n3\n\n\fAlgorithm 1 Finding a judgment interpretation.\nINPUT A neural network F and an input vector v\n\nsuch that lF (v) = 0.\n\n|\n\nOUTPUT A judgment interpretation \u2206.\n1: PARAM A real value e and an integer number n.\n:= {s\ns is a subarray of [1, ...,|v|]\n2: Sn\nwith length n}\n3: \u2206 := N one, d := +\u221e\n4: for s \u2208 Sn do\n5: \u2206s := \ufb01ndProjectedInterpretation(F, v, s, e)\n6:\n7:\n8: return \u2206\n\n\u2206 := \u2206s, d := dise(\u2206s)\n\nif dise(\u2206s) < d then\n\nAlgorithm 2 \ufb01ndProjectedInterpretation\nINPUT A neural network F , an input vector v, an inte-\n\nger vector s, and a real number e.\n\nOUTPUT A symbolic correction \u2206s that only changes\n\nfeatures indexed by s.\n\nveri\ufb01ed linear regions to consider.\n\na := popHead(workList)\nfor p \u2208 [1, len(a)] do\n\n1: PARAM An integer m, the maximum number of\n2: regions := \u2205, workList := []\n3: \u03b40 := \ufb01ndMinimumConcreteCorrection(F, v, s)\n4: a0 := getActivations(F, \u03b40 + v)\n5: L0 := getRegionFromActivations(F, a0, v, s)\n6: regions := regions \u222a {L0}\n7: workList := append(workList, a0)\n8: while len(workList)! = 0 do\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16:\n17:\n18:\n19:\n20:\n21: return inferConvexCorrection(regions)\n\na(cid:48) := copy(a)\na(cid:48)[p] := \u00aca(cid:48)[p]\nL(cid:48) := getRegionFromActivations(F, a(cid:48), v, s)\nif L(cid:48) /\u2208 regions then\n\nregions := regions \u222a {L(cid:48)}\nif len(regions) = m then\n\nworkList := append(workList, a(cid:48))\n\nif checkRegionBoundary(F, a, p, v, s) then\n\nworkList := []\nbreak\n\ndistance. Recall that the distance is de\ufb01ned as dise(\u2206) = min\u03b4\u2208S(cid:107)\u03b4(cid:107), which can be evaluated by\nsolving a sequence of linear programming problems when L1 norm is used.\nWe next discuss \ufb01ndProjectedInterpretation which is the heart of our approach.\n\n3.1 Finding a Judgment Interpretation along given features\n\nIn order to \ufb01nd a judgment interpretation, we need to \ufb01nd a set of linear constraints that are minimal,\nstable, and veri\ufb01ed (that is, all corrections satisfying it will make the input classi\ufb01ed as 1). None of\nthese properties are trivial to satisfy given the complexity of any real-world neural network.\nWe \ufb01rst discuss how we address these challenges at a high level, then dive into the details of the\nalgorithm. To address minimality, we \ufb01rst \ufb01nd a single concrete correction that is minimum by\nleveraging an existing adversarial example generation technique [7] and then generate a symbolic\ncorrection by expanding upon it. To generate a stable and veri\ufb01ed correction, we exploit the fact that\nReLU-based neural networks are piece-wise linear functions. Brie\ufb02y, all the inputs that activate the\nsame set of neurons can be characterized by a set of linear constraints. We can further characterize the\nsubset of inputs that are classi\ufb01ed as 1 by adding an additional linear constraint. Therefore, we can use\na set of linear constraints to represent a set of veri\ufb01ed concrete corrections under certain activations.\nWe call this set of corrections a veri\ufb01ed linear region (or region for short). We \ufb01rst identify the region\nthat the initial concrete correction belongs to, then grow the set of regions by identifying regions\nthat are connected to existing regions. Finally, we infer a set of linear constraints whose concrete\ncorrections are a subset of ones enclosed by the set of discovered regions. Algorithm 2 details our\napproach, which we describe below.\nGenerating the initial region. We \ufb01rst \ufb01nd a minimum concrete correction \u03b40 by leveraging a\nmodi\ufb01ed version of the fast signed gradient method [7] that minimizes the L1 distance (on line\n3). More concretely, starting with a vector of 0s, we calculate \u03b40 by iteratively adding a modi\ufb01ed\ngradient that takes the sign of the most signi\ufb01cant dimension among the selected features until\nlF (v + \u03b40) = 1. For example, if the original gradient is [0.5, 1.0, 6.0,\u22126.0], the modi\ufb01ed gradient\nwould be [0, 0, 1.0, 0] or [0, 0, 0,\u22121.0]. Then we obtain the ReLU activations a0 for v + \u03b40 (by\ninvoking getActivations on line 4), which is a Boolean vector where each Boolean value represents\n\n4\n\n\fwhether a given neuron is activated. Finally, we obtain the initial region that \u03b40 falls into by invoking\ngetRegionFromActivations (on line 5), which is de\ufb01ned below:\ngetRegionFromActivations(F, a, v, s) := activationConstraints(F, a, v) \u2227 classConstraints(F, a, v)\n\nwhere activationConstraints(F, a, v) := (cid:86)\n\u2227 (cid:86)\nfeatureConstraints(s) :=(cid:86)\n\n(cid:86)\n\u2227 featureConstraints(s),\n(cid:86)\nm\u2208[1,|fj|]{Ga\nm\u2208[1,|fj|]{Ga\nr :=(cid:80)\nr (x + v) := wr \u00b7 f a\n\nj\u2208[1,k]\nj\u2208[1,k]\n\nwhere Ga\n\nclassConstraints(F, a, v) := F a(x + v)[1] > F a(x + v)[0],\n\nj /\u2208s x[j] = 0.\n\nr (x + v) \u2265 0 if a[r] = true}\nr (x + v) < 0 if a[r] = false},\nm\u22121(...f a\n1 (f a\n0 (x + v))) + br,\ni\u2208[1,j\u22121] |fi| + m\n\ni and ba\n\ni vi + ba\n\ni where W a\n\ni (vi) = W a\n\nto refer to layer i with its activations \u201c\ufb01xed\u201d to a.\nIn the de\ufb01nition above, we use the notation f a\ni\nMore formally, f a\ni have zeros in all the rows where the\nactivation indicated that recti\ufb01er in the original layer had produced a zero. We use k to represent the\nnumber of ReLU layers and |fj| to represent the number of neurons in the jth layer. Integer r indexes\nthe mth neuron in jth layer. Vector wr and real number br are the weights and the bias of neuron r\nrespectively. Intuitively, activationConstraints uses a set of linear constraints to encode the activation\nof each neuron.\nExpanding to connecting regions. After generating the initial region, Algorithm 1 tries to grow\nthe set of concrete corrections by identifying regions that are connected to existing regions (line\n6-20). How do we know whether a region is connected to another ef\ufb01ciently? There are 2n regions\nfor a network with n neurons and checking whether two sets of linear constraints intersect can be\nexpensive on high dimensions. Intuitively, two regions are likely connected if their activations only\ndiffer by one ReLU. However, this is not entirely correct given a region is not only constrained by the\nactivations by also the desired classi\ufb01cation.\nOur key insight is that, since a ReLU-based neural network is a continuous function, two regions are\nconnected if their activations differ by one neuron, and there are concrete corrections on the face of\none of the corresponding convex hulls, and this face corresponds to the differing neuron. Intuitively,\non the piece-wise function represented by a neural network, the sets of concrete corrections in two\nadjacent linear pieces are connected if there are concrete corrections on the boundary between them.\nFollowing the intuition, we de\ufb01ne checkRegionBoundary:\ncheckRegionBoundary(F, a, p, v, s) := isFeasible(boundaryConstraints(F, a, v, p)\n\nwhere\n\nboundaryConstraints(F, a, p, v) := (cid:86)\n\u2227 (cid:86)\n\u2227 (cid:86)\n\nwhere Ga\n\nr (x + v) := wr \u00b7 f a\n\nm\u22121(...f a\n\n1 (f a\n\n\u2227 classConstraints(F, a, v) \u2227 featureConstraints(s))\n(cid:86)\n(cid:86)\nm\u2208[1,|fj|]{Ga\n(cid:86)\nm\u2208[1,|fj|]{Ga\n0 (x + v))) + br and r :=(cid:80)\nm\u2208[1,|fj|]{Ga\n\nr (x + v) = 0 if r = p}\nr (x + v) \u2265 0 if a[r] = true and r! = p}\nr (x + v) < 0 if a[r] = false and r! = p}\n\ni\u2208[1,j\u22121] |fi| + m.\n\nj\u2208[1,k]\nj\u2208[1,k]\nj\u2208[1,k]\n\nBy leveraging checkRegionBoundary, Algorithm 2 uses a worklist algorithm to identify regions that\nare connected or transitively connected to the initial region until no more such regions can be found\nor the number of discovered regions reaches a prede\ufb01ned upper bound m (line 8-20).\nInfer the \ufb01nal explanation. Finally, Algorithm 2 infers a set of linear constraints whose correspond-\ning concrete corrections are contained in the discovered regions. Moreover, to satisfy the stability\nconstraint, we want this set to be as large as possible. Intuitively, we want to \ufb01nd a convex hull\n(represented by the returning constraints) that is contained in a polytope (represented by the regions),\nsuch that the volume of the convex hull is maximized. Further, we infer constraints that represent\nrelatively simple shapes, such as simplexes or boxes, for two reasons. First, explanations in simpler\nshapes are easier for the end user to understand; secondly, it is relatively ef\ufb01cient to calculate the\nvolume of a simplex or a box.\nThe procedure inferConvexCorrection implements the above process using a greedy algorithm. In the\ncase of simplexes, we \ufb01rst randomly choose a discovered region and randomly sample a simplex\ninside it. Then for each vertex, we move it by a very small distance in a random direction such that\n(1) the simplex is still contained in the set of discovered regions, and (2) the volume increases. The\n\n5\n\n\fprocess stops until the volume cannot be increased further. For boxes, the procedure is similar except\nthat we move the surfaces rather than the vertices.\nNote that our approach is sound but not optimal or complete. In other words, whenever Algorithm 1\n\ufb01nds a symbolic correction, the correction is veri\ufb01ed and stable, but it is not guaranteed to be\nminimal. Also, when our approach fails to \ufb01nd a stable symbolic correction, it does not mean that\nsuch corrections do not exist. However, in practice, we \ufb01nd that our approach is able to \ufb01nd stable\ncorrections for most of the time and the distances of the discovered corrections are small enough to\nbe useful (as we shall see in Section 4.2).\n\n3.2 Extensions\n\nWe \ufb01nish this section by discussing several extensions to our approach.\nHandling categorical features. Categorical features are typically represented using one-hot en-\ncoding and directly applying Algorithm 2 on the embedding can result in a symbolic correction\ncomprising invalid concrete corrections. To address this issue, we enumerate the embeddings repre-\nsenting different values of categorical features and apply Algorithm 2 to search symbolic corrections\nunder each of them.\nExtending for multiple classes. Our approach can be easily extended for multiple classes as long\nas there is only one desirable class. Concretely, we need to: 1) guide the initial concrete correction\ngeneration (to the desirable class), which has been studied in the literature of adversarial example\ngeneration; 2) extend classConstraints so that the desired class gets a higher weight than any other\nclass. Compared to the binary case, the classConstraints only grows linearly by the number of classes,\nwhile the majority of the constraints are the ones encoding the activations. Thus, the time should not\ngrow signi\ufb01cantly, as we shall see in subsection 4.2. That said, the focus of our paper is judgment\nproblems which do binary classi\ufb01cations.\nExtending to non-ReLU activations. Our approach applies without any change as long as the\nactivation functions are continuous and can be approximated using piece-wise linear functions. For\nnetworks whose activations are continuous but cannot be approximated using piece-wise linear\nfunctions, we can still apply our algorithm but need constraints that are more expressive than linear\nconstraints to represent veri\ufb01ed regions. When activations are not continuous, our approach no longer\napplies as our method of testing whether two regions are connected relies on them being continuous.\nIncorporating prior knowledge on features. When the user has constraints or preferences over the\nfeatures, our approach can be extended to incorporate such prior knowledge in the following ways: 1)\nif some features cannot be changed, we can avoid searching feature combinations involving them,\nwhich also saves computationally; 2) if a feature can only be changed to a value in an interval, we\nsimply add this interval as a constraint to LP formulation; 3) if some features are preferable to change,\nwe can adjust the coef\ufb01cients of features in the distance function accordingly.\n\n4 Empirical Evaluation\n\nWe evaluate our approach on three neural network models from different domains.\n\n4.1 Experiment Setup\n\nImplementation. We implemented our approach in a tool called POLARIS, which is written in three\nthousand lines of Python code. To implement \ufb01ndMinimumConcreteCorrection, we used a customized\nversion of the CleverHans library [24]. To implement isFeasible which checks feasibility of generated\nlinear constraints, we applied the commercial linear programming solver Gurobi 7.5.2 [9].\nNeural networks. Table 1 summarizes the statistics of the neural networks. The mortgage under-\nwriting network predicts whether an applicant would default on the loan. Its architecture is akin to\nstate-of-the-art neural networks for predicting mortgage risks [28], and has a recall of 90% and a\nprecision of 6%. It is trained to have a high recall to be conservative in accepting applications. The\nsolver performance prediction network predicts whether a \ufb01rst-order theorem can be solved ef\ufb01ciently\nby a solver based on static and dynamic characteristics of the instance. We chose its architecture\nusing a grid search. The drawing tutoring network judges whether a drawing is an accurate rendition\n\n6\n\n\fTable 1: Summary of the neural networks used in our evaluation.\n\n# ReLUs Dataset (train/val./test: 50/50/25)\n1,000 Applications and performance of\n34 million Single-Family loans [6]\nStatistics of 6,118 \ufb01rst-order\ntheorems and their solving times [13]\n\n800\n\n# features F1 Accuracy\n\n21\n\n51\n\n0.118\n\n0.8\n\n0.74\n\n0.792\n\nApplication Network Structure\nMortgage\nUnderwriting\nSolver\nPerformance\nPrediction\n\n5 dense layers of\n200 ReLUs each\n8 dense layers of\n100 ReLUs each\n3 1-D conv. layers\n(\ufb01lter shape: [5,4,8])\nand 1 dense layer\nof 1,024 ReLUs\n\nDrawing\nTutoring\n\n4,096\n\n0.12 million variants of a canonical\ncat drawing and 0.12 million of cat\ndrawings from Google QuickDraw[8]\n\n512\n\n0.995 0.995\n\nof a canonical drawing of a cat. A drawing is represented by a set of line segments on a 256 \u00d7 256\ncanvas, each of which is represented by the coordinates of its vertices. A drawing comprises up to\n128 lines, which leads to 512 features.\nEvaluation inputs. For the \ufb01rst two applications, we randomly chose 100 inputs in the test sets that\nwere rejected by the networks. For drawing tutoring, we used 100 variants of the canonical drawing\nand randomly removed subsets of line segments so that they get rejected by the network.\nAlgorithm con\ufb01gurations. Our approach is parameterized by the number of features n allowed\nto change simultaneously, the maximum number of regions to consider m, the stability metric, the\ndistance metric, and the shape of the generated symbolic correction. We set n = 2 for mortgage\nunderwriting and solver performance prediction as corrections of higher dimensions on them are\nhard for end users to understand. Moreover, we limit the mutable features to 5 features each that are\nplausible for the end user to change. Details of these features are described in Appendix B.1. As for\ndrawing tutoring, we set n \u2208 [1, 20], which allows us to add up to 5 line segments. To reduce the\ncomputational cost, we use a generative network to recommend the features to change rather than\nenumerating all combinations of features. The network is a variational autoencoder that completes\ndrawing sketches [10]. We set m = 100 and discuss the effect of using different m later. For the\nstability metric and the distance metric, we use a weighted L\u221e norm and a weighted L1 respectively\nfor mortgage underwriting and solver performance prediction, which are described in Appendix B.1.\nFor drawing tutoring, we measure the distance of a correction by the number of features changed (L0),\nwhich re\ufb02ects how many lines are added. We say a correction is stable if it contains at least 3 pixels\nin each dimension. Finally, we use triangles to represent the corrections for mortgage underwriting\nand solver performance prediction, while we use axis-aligned boxes for drawing tutoring. The blue\nrectangles in Figure 1(c) are projections of a box correction on coordinates of added line vertices.\nExperiment environment. All the experiments were run on a Dell XPS 8900 Desktop with 16GB\nRAM and an Intel I7 4GHZ quad-core processor running Ubuntu 16.04.\n\n4.2 Experiment Results\n\nWe \ufb01rst discuss how often POLARIS generates stable corrections and how far away these corrections\nare from the original input. We then study the ef\ufb01ciency. Next, we discuss the effect of varying\nm, the maximum number of regions to consider. We then compare against grid search. Finally, we\ndiscuss the performance of POLARIS when there are multiple classes.\nStability and minimality. For the selected 100 inputs that are rejected by each network, POLARIS\nsuccessfully generated symbolic corrections for 85 inputs of mortgage underwriting, 81 inputs of\nsolver performance prediction, and 75 inputs of drawing tutoring. For the remaining inputs, it is\neither the case that the corrections found by POLARIS were discarded for being unstable, or the case\nthat POLARIS failed to \ufb01nd an initial concrete correction due to the incompleteness of the applied\nadversarial example generation algorithm. These results show that POLARIS is effective in \ufb01nding\nsymbolic corrections that are stable and veri\ufb01ed.\nWe next discuss how similar these corrections are to the original input. Figure 2 lists the sorted\ndistances of the aforementioned 85 symbolic corrections. For mortgage application and solver\nperformance prediction, the distance is de\ufb01ned using a weighted L1 norm, where the weight for each\nfeature is 1/(max-min) (see Appendix B.1). The average distances of corrections generated on these\ntwo applications are 0.31 and 0.25 respectively. Brie\ufb02y, the former would mean, for example, to\ndecrease the DTI by 19.5% or to increase the interest rate by 3%, while the latter would mean, for\nexample, to add 25% more unit clauses or horn clauses. Moreover, the smallest distances for these\n\n7\n\n\f(a) Mortgage Underwriting\n\n(b) Solver Performance Prediction\n\n(c) Drawing Tutoring\n\nFigure 2: Distances of judgment interpretations generated by POLARIS.\n\n(a) Mortgage Underwriting\n\n(b) Solver Performance Prediction\n\n(c) Drawing Tutoring\n\nFigure 3: Running time of POLARIS on each input.\n\ntwo applications are only 0.016 and 0.03. As for drawing tutoring, the distance is measured by the\nnumber of features to change (that is, number of added lines \u00d7 4). As \ufb01gure 2(c) shows, the sizes\nof the corrections range from 1 line to 5 lines, with 2 lines being the average. In conclusion, the\ncorrections found by POLARIS are often small enough to be actionable for end users.\nTo better understand these corrections qualitatively, we inspect several corrections more closely in\nAppendix B.2. We also include more example corrections in Appendix B.3.\nEf\ufb01ciency. Figure 3 shows the sorted running time of POLARIS across all inputs for our three\napplications. On average, POLARIS takes around 20 minutes, 2 minutes, and 13 minutes to generate\ncorrections for each input of the three applications respectively. We \ufb01rst observe POLARIS consumes\nthe least time on solver performance prediction. It is not only because solver performance prediction\nhas the smallest network but also because the search often terminates much earlier before reaching\nthe maximum number of regions to consider (m=100). On the other hand, POLARIS often reaches this\nlimit on the other two applications. Although drawing tutoring has a larger network than mortgage\nunderwriting, POLARIS consumes less time on it. This is because POLARIS uses a generative network\nto decide which features to change for drawing tutoring, which leads to one invocation to Algorithm 2\nper input. On the other hand, for mortgage underwriting, POLARIS needs to invoke Algorithm 2 for\nmultiple times per input which searches under a combination of different features. However, a single\ninvocation to Algorithm 2 is still faster for mortgage underwriting.\nAfter closer inspection, we \ufb01nd the running time is dominated by invocations to the LP solver. We\nhave two observations about the invocation time. First, most of the time is spent in instance creation\nrather than actual solving due to the poor performance of python binding of Gurobi. For instance, in\nmortgage underwriting, while each instance creation takes around 60ms, the actual solving typically\nonly takes around 1ms. As a result, POLARIS can be made even more ef\ufb01cient if we re-implement\nit using C++ or if Gurobi improves the python binding. Second, the LP solver scales well as the\nsize of the network and the number of dimensions grow. For example, compared to the solving time\n(1ms) in mortgage underwriting, where the network comprises 1,000 neurons and the corrections are\n2-dimension, the solving time only grows up to around 7ms in drawing tutoring, where the network\ncomprises 4,096 neurons and the corrections are up to 20-dimension. This indicates that POLARIS\nhas the potential to scale to even larger networks with higher input dimensions.\nVarying maximum number of regions. Table 2 shows the results of varying maximum number of\nregions to consider (m) for four randomly selected inputs of mortgage underwriting. To simplify the\ndiscussion, we only study corrections generated under DTI and interest rate. As the table shows, both\nthe volume and running time increase roughly linearly as the number of explored regions grows.\nComparing to sampling by a grid. An alternative approach to generate judgment interpretations\nis to sample by a grid. Since there may be unviable inputs between two adjacent viable inputs, a\ngrid with \ufb01ne granularity is needed to produce a symbolic correction with high con\ufb01dence. However,\nthis is not feasible if there are continuous features or the input dimension is high. For instance, the\ncorrections generated on drawing tutoring may involve up to 20 features. Even if we only sample 3\npixels along each feature, it would require over 3 billion samples. Our approach on the other hand,\nveri\ufb01es a larger number of concrete corrections at once by verifying a linear region.\n\n8\n\n020406080100Loan Application0.51.01.5Correction Distanceaverage = 0.31020406080100Theorem0.51.01.5Correction Distanceaverage = 0.25020406080100Drawing5101520Correction Distanceaverage = 7.68020406080100Loan Application0100020003000Running Time (Seconds)average = 1292020406080100Theorem0100020003000Running Time (Seconds)average = 147020406080100Drawing0100020003000Running Time (Seconds)average = 781\fTable 2: Effect of varying the maximum number of regions to consider.\n\n# explored regions volume\n88, 100, 100, 100\n88, 205, 214, 500\n\ntime (in seconds)\nm\n102, 191, 141, 118\n100\n500\n100, 374, 288, 517\n1000 88, 205, 214, 1000 2.4, 26.3, 21.9, 10.2 100, 375, 290, 1115\n2000 88, 205, 214, 1325 2.4, 26.3, 21.9, 11.2 101, 375, 291, 1655\n\n2.4, 10.3, 9.2, 1.29\n2.4, 26.3, 21.9, 6.9\n\nFigure 4: Running time of PO-\nLARIS for multiple classes.\n\nScaling to multiclass classi\ufb01cation. As discussed in Section 3.2, when extending for multiple\nclasses, the runtime of our approach should not grow signi\ufb01cantly compared to similar binary class\nsettings. As an empirical justi\ufb01cation, we extended the network in our solver performance prediction\napplication to a six-class one. It predicts which solver heuristic out of \ufb01ve can ef\ufb01ciently solve a\nproblem or concludes that none can. To enable a fair comparison, the new network has a similar\nstructure (8 hidden dense layers each of which has 100 ReLUs). We chose one heuristic as the\ndesirable class. Figure 4 shows the sorted running time of POLARIS across the inputs. While it took\n147 seconds on average to produce an explanation for the original network, it took 140 seconds for\nthe new network on average.\n\n5 Related Work\n\nOur work is related to previous works on interpreting neural networks in terms of the problem [21],\nand works on generating adversarial examples [7] in terms of the underlying techniques.\nMuch work on interpretability has gone into analyzing the results produced by a convolutional\nnetwork that does image classi\ufb01cation. The Activation Maximization approach and its follow-ups\nvisualize learnt high-level features by \ufb01nding inputs that maximize activations of given neurons [5,\n12, 17, 31, 23]. Zeiler and Fergus [33] uses deconvolution to visualize what a network has learnt.\nNot just limited to image domains, more recent works try to build interpretability as part of the\nnetwork itself [25, 18, 30, 32, 19]. There are also works that try to explain a neural network by\nlearning a more interpretable model [26, 16, 3]. Lundberg et al. [20] and Kindersmans et al. [14]\nassign importance values to features for a particular prediction. Koh and Liang [15] trace a prediction\nback to the training data. Anchors [27] identi\ufb01es features that are suf\ufb01cient to preserve current\nclassi\ufb01cation. Similar to our work, Dhurandhar et al. [4] infers minimum perturbations that would\nchange the current classi\ufb01cation. While we infer a stable symbolic corrections representing a set\nof perturbations, they infer a single concrete correction. As stated in our introduction, it provides\nmany bene\ufb01ts for a correction to be symbolic and stable. In summary, the problem de\ufb01nition of\njudgement interpretation is new, and none of the existing approaches can directly solve it. Moreover,\nthese approaches typically generate a single input prototype or relevant features, but do not result in\ncorrections or a space of inputs that would lead the prediction to move from an undesirable class to a\ndesirable class.\nAdversarial examples were \ufb01rst introduced by Szegedy and et al. [29], where box-constrained L-\nBFGS is applied to generate them. Various approaches have been proposed later. The fast gradient\nsign method [7] calculates an adversarial perturbation by taking the sign of the gradient. The Jacobian-\nbased Saliency Map Attack (JSMA) [11] applies a greedy algorithm based a saliency map which\nmodels the impact each pixel has on the resulting classi\ufb01cation. Deepfool [22] is an untargeted\nattack optimized for the L2 norm. Bastani at al. [2] applies linear programming to \ufb01nd an adversarial\nexample under the same activations. While these techniques are similar to ours in the sense that\nthey also try to \ufb01nd minimum corrections, the produced corrections are concrete and correspond to\nindividual inputs. On the other hand, our corrections are symbolic and correspond to sets of inputs.\n\n6 Conclusion\n\nWe proposed a new approach to interpret a neural network by generating minimal, stable, and\nsymbolic corrections that would change its output. Such an interpretation is a useful way to provide\nfeedback to a user when the neural network fails to produce a desirable output. We designed and\nimplemented the \ufb01rst algorithm for generating such corrections, and demonstrated its effectiveness\non three neural network models from different real-world domains.\n\n9\n\n020406080100Theorem0100020003000Running Time (Seconds)average = 140\fAcknowledgments\n\nWe thank the reviewers for their insightful comments and useful suggestions. This work was funded\nin part by ONR PERISCOPE MURI, award N00014-17-1-2699.\n\nReferences\n[1] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. M\u00fcller, and W. Samek. On pixel-wise\nexplanations for non-linear classi\ufb01er decisions by layer-wise relevance propagation. PloS one,\n10(7):e0130140, 2015.\n\n[2] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. V. Nori, and A. Criminisi. Measuring\nneural net robustness with constraints. In Advances in Neural Information Processing Systems\n29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016,\nBarcelona, Spain, pages 2613\u20132621, 2016.\n\n[3] O. Bastani, C. Kim, and H. Bastani. Interpretability via model extraction. CoRR, abs/1706.09773,\n\n2017.\n\n[4] A. Dhurandhar, P. Chen, R. Luss, C. Tu, P. Ting, K. Shanmugam, and P. Das. Explanations\nbased on the missing: Towards contrastive explanations with pertinent negatives. CoRR,\nabs/1802.07623, 2018.\n\n[5] D. Erhan, Y. Bengio, A. Courville, and P. Vincent. Visualizing higher-layer features of a deep\n\nnetwork. University of Montreal, 1341(3):1, 2009.\n\n[6] Fannie Mae. Fannie Mae single-family loan performance data. http://www.fanniemae.com/\nportal/funding-the-market/data/loan-performance-data.html, 2017. Accessed:\n2018-02-07.\n\n[7] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.\n\nCoRR, abs/1412.6572, 2014.\n\n[8] I. Google. The Quick, Draw! Dataset. https://github.com/googlecreativelab/\n\nquickdraw-dataset, 2017. Accessed: 2018-05-13.\n\n[9] Gurobi Optimization, Inc. Gurobi optimizer reference manual. http://www.gurobi.com,\n\n2018.\n\n[10] D. Ha and D. Eck. A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477,\n\n2017.\n\n[11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016\nIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV,\nUSA, June 27-30, 2016, pages 770\u2013778, 2016.\n\n[12] G. E. Hinton. A practical guide to training restricted boltzmann machines. In Neural Networks:\n\nTricks of the Trade - Second Edition, pages 599\u2013619. Springer, 2012.\n\n[13] S. B. H. James P Bridge and L. C. Paulson. First-order theorem proving Data Set. https://\narchive.ics.uci.edu/ml/datasets/First-order+theorem+proving, 2013. Accessed:\n2018-05-13.\n\n[14] P.-J. Kindermans, K. T. Sch\u00fctt, M. Alber, K.-R. M\u00fcller, D. Erhan, B. Kim, and S. D\u00e4hne.\nLearning how to explain neural networks: Patternnet and patternattribution. In International\nConference on Learning Representations, 2018.\n\n[15] P. W. Koh and P. Liang. Understanding black-box predictions via in\ufb02uence functions. In\nProceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney,\nNSW, Australia, 6-11 August 2017, pages 1885\u20131894, 2017.\n\n[16] V. Krakovna and F. Doshi-Velez. Increasing the interpretability of recurrent neural networks\n\nusing hidden markov models. CoRR, abs/1606.05320, 2016.\n\n10\n\n\f[17] H. Lee, R. B. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for\nscalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual\nInternational Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June\n14-18, 2009, pages 609\u2013616, 2009.\n\n[18] T. Lei, R. Barzilay, and T. S. Jaakkola. Rationalizing neural predictions. In Proceedings of the\n2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin,\nTexas, USA, November 1-4, 2016, pages 107\u2013117, 2016.\n\n[19] O. Li, H. Liu, C. Chen, and C. Rudin. Deep learning for case-based reasoning through prototypes:\n\nA neural network that explains its predictions. CoRR, abs/1710.04806, 2017.\n\n[20] S. M. Lundberg and S. Lee. A uni\ufb01ed approach to interpreting model predictions. In Advances\nin Neural Information Processing Systems 30: Annual Conference on Neural Information\nProcessing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 4768\u20134777, 2017.\n\n[21] G. Montavon, W. Samek, and K. M\u00fcller. Methods for interpreting and understanding deep\n\nneural networks. CoRR, abs/1706.07979, 2017.\n\n[22] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: A simple and accurate method\nto fool deep neural networks. In 2016 IEEE Conference on Computer Vision and Pattern\nRecognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 2574\u20132582, 2016.\n\n[23] A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, and J. Clune. Synthesizing the preferred inputs\nfor neurons in neural networks via deep generator networks. In Advances in Neural Information\nProcessing Systems, pages 3387\u20133395, 2016.\n\n[24] N. Papernot, N. Carlini, I. Goodfellow, R. Feinman, F. Faghri, A. Matyasko, K. Hambardzumyan,\nY.-L. Juang, A. Kurakin, R. Sheatsley, A. Garg, and Y.-C. Lin. cleverhans v2.0.0: an adversarial\nmachine learning library. arXiv preprint arXiv:1610.00768, 2017.\n\n[25] P. H. O. Pinheiro and R. Collobert. From image-level to pixel-level labeling with convolutional\nnetworks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015,\nBoston, MA, USA, June 7-12, 2015, pages 1713\u20131721, 2015.\n\n[26] M. T. Ribeiro, S. Singh, and C. Guestrin. \"Why should I trust you?\": Explaining the predictions\nIn Proceedings of the 22nd ACM SIGKDD International Conference on\nof any classi\ufb01er.\nKnowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pages\n1135\u20131144, 2016.\n\n[27] M. T. Ribeiro, S. Singh, and C. Guestrin. Anchors: High-precision model-agnostic explanations.\nIn Proceedings of the Thirty-Second AAAI Conference on Arti\ufb01cial Intelligence, New Orleans,\nLouisiana, USA, February 2-7, 2018, 2018.\n\n[28] J. Sirignano, A. Sadhwani, and K. Giesecke. Deep learning for mortgage risk. arXiv preprint\n\narXiv:1607.02470, 2016.\n\n[29] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus.\n\nIntriguing properties of neural networks. CoRR, abs/1312.6199, 2013.\n\n[30] S. Tan, K. C. Sim, and M. J. F. Gales. Improving the interpretability of deep neural networks\nIn 2015 IEEE Workshop on Automatic Speech Recognition and\nwith stimulated learning.\nUnderstanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pages 617\u2013623,\n2015.\n\n[31] A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu. Pixel recurrent neural networks. In\nProceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York\nCity, NY, USA, June 19-24, 2016, pages 1747\u20131756, 2016.\n\n[32] C. Wu, P. Karanasou, M. J. F. Gales, and K. C. Sim. Stimulated deep neural network for\nspeech recognition. In Interspeech 2016, 17th Annual Conference of the International Speech\nCommunication Association, San Francisco, CA, USA, September 8-12, 2016, pages 400\u2013404,\n2016.\n\n11\n\n\f[33] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In Computer\nVision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014,\nProceedings, Part I, pages 818\u2013833, 2014.\n\n12\n\n\f", "award": [], "sourceid": 2367, "authors": [{"given_name": "Xin", "family_name": "Zhang", "institution": "Massachusetts Institute of Technology"}, {"given_name": "Armando", "family_name": "Solar-Lezama", "institution": "MIT"}, {"given_name": "Rishabh", "family_name": "Singh", "institution": "Google Brain"}]}