Review for NeurIPS paper: Model Interpretability through the Lens of Computational Complexity

NeurIPS 2020

Model Interpretability through the Lens of Computational Complexity

Review 1

Summary and Contributions: The paper formalizes notions of explanations for FBDDs, perceptrons and ReLU networks in XAI as computational problems and studies the computational complexity of these problems in terms of standard complexity classes and parameterized complexity.

Strengths: From the point of view of computational complexity theory, interpretability is not much studied yet. Classifying the complexity of explanations is a natural step. Therefore this paper is a welcome contribution. The big picture that for perceptrons and decision diagrams problems are easy, but for general neural networks are hard, is well known. I think several results of the paper are folklore, but nevertheless it is useful to have those results carefully formulated and presented. The most interesting, novel and relevant results are the parameterized complexity results in Section 5. Parameterized complexity could be discussed in more detail, especially the W(Maj) hierarchy, which is not widely known and could have other uses in deep learning.

Weaknesses: An important concern with the paper is that the problems introduced in the paper are closely related to problems studied under different names in the literature, and there should be more discussion of these connections. This applies especially to ``sufficient reason''. A certificate of an input for a Boolean function is a well-known concept, which even has a section in the textbook of Arora-Barak cited by the paper (and is closely related to Huang's sensitivity theorem, a recent sensational result on Boolean functions). A certificate is the same as a sufficient reason in the paper, even the meanings of the terms are not that different. This kind of duplication is often an impediment for the development of research, especially for Boolean functions, which come up in many different contexts. Minimum change problems have also been studied before (though, as far as I know, not in the same form as here).

Correctness: Yes.

Clarity: The paper is very clearly written.

Relation to Prior Work: Please see above. NB: the author response promises to improve the presentation in this respect.

Reproducibility: Yes

Additional Feedback: The paper addressed the comments in a satisfactory manner.

Review 2

Summary and Contributions: The paper proposes to measure the interpretability of different machine learning models in terms of the computational complexity of solving certain problems associated with explaining the classification of an example. It presents four concrete problems (queries) and determines their complexity for 3 standard ML models: FBDDs, perceptrons, and multi-layer perceptrons. In the models which are intuitively less interpretable, the problems have higher computational complexity.

Strengths: This is a very well-written paper that takes a thoughtful approach to an important problem. The theoretical results are sound. The use of parametrized complexity to distinguish between different depth MLPs is nice.

Weaknesses: - A number of the proofs (given in the supplementary material) are straightforward. -The authors should make clearer which theoretical results required novel proofs and which are routine and/or follow easily from previous work. - It's not surprising that problems involving MLPs have higher complexity than problems involving FBDDs and perceptrons. - The paper of Darwiche and Marquise on the Knowledge Compilation Map should be discussed in related work. -The complexity of the minimum sufficient reason question does not seem like a particularly good measure for interpretability. It's not clear why computing a sufficient reason would need to be "minimum". Also, consider e.g. monotone DNF, which could be regarded as an easily interpretable representation. It's easy to to compute a minimum sufficient reason for a positive output, but NP-hard for a negative output.

Correctness: The results appear to be correct and carefully done (I did not check the details of all the proofs).

Clarity: Very well written.

Relation to Prior Work: Discussion of previous work is mostly good, but misses some relevant work in knowledge compilation. Darwiche and Marquis wrote a paper in 2002 entitled A Knowledge Compilation map that is closely related to this paper. It covers many representations of Boolean functions including FBDDs (not just OBDDs) and considers the complexity of a number of different queries including implication (same as check sufficient reason) and counting (closely related to count completions). In the paper, they state that implication and counting can both be done in poly time for FBDDs, and they provide justification for the statement with appropriate citations.

Reproducibility: Yes

Additional Feedback: Shortest implicant is Sigma_2^P complete for Boolean formulas. So it isn't necessary to reduce from Shortest Implicant Core, unless you want to prove something about depth 2 MLPs.

Review 3

Summary and Contributions: The paper sets out to find a characterization of the concept of interpretability of models, and propose one in terms of computational complexity by focusing on specific kinds of post-hoc queries.

Strengths: -- Important goal of pinpointing an elusive concept. -- Novel and sound complexity results. -- Relevance to the NeurIPS audience.

Weaknesses: -- The link between interpretability and complexity is made by definition, but not analyzed for adequacy. -- Related to the previous point, significance is questionable.

Correctness: All claims are sound to the extent that I checked.

Clarity: The manuscript is well written, and the authors do a good job of making the material reasonably self-contained.

Relation to Prior Work: In relation to my main concern (see below), the concept of interpretability does not seem to by linked in any way to complexity; a broader discussion of the literature might shed some light on this matter.

Reproducibility: Yes

Additional Feedback: As mentioned above, my main concern is with the basic idea of linking -- by definition -- model interepretability with computational complexity. There does not seem to be a well-founded basis for doing this, since interpretability is, to the best of my knowledge, an inherently subjective concept. This does not mean that studying the relationship between the two is useless, but perhaps it would be more interesting to set out from a more neutral position of looking for a correlation between them and seeing where the results lead. In its current form, the conclusions seem to be largely theoretical (as the authors themselves state); this is of course not inherently a bad thing, but it's very difficult to evaluate their significance. Minor comment: Footnote 1 may be avoided or made clearer by pointing out that their exists a decision problem complexity class that "corresponds" to #P, called PP, that allows to make a fair comparison. *** AFTER REBUTTAL I have revised my score to reflect a minor change of view after reading the authors' rebuttal and the comments by other reviewers.