NeurIPS 2020

ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping

Meta Review

This paper proposes a model for simultaneous classification and feature attribution in the context of medical image classification. The model uses GAN to learn two representations from pairs (x, y) of input images of different classes. One representation is class-relevant (z^a, a for attribution) and the other is class-irrelevant (z^c, c for content). The class-relevant representation is used for classification. Both representations are fed to a generator G to synthesize images so as to achieve domain translation. While G(z^c_x, z^a_x) approximately recovers x, G(z^c_x, z^a_y) is a translation of x into the class of y (what x would look like if it were from the class of y). Consequently, the difference G(z^c_x, z^a_y) - x can be used as a visualization (attribution) of the difference of the two classes in the domain of x. Empirically, the work shows strong improvement over previous benchmarks considered on the medical datasets tested. There was a long series of extensive discussions (12 long posts). One reviewer remains unconvinced about the novelty and some technical issues, while two other reviewers are strongly supportive. Overall, the idea seems interesting and the work is solid. The fact that it has generated so much debates among the reviewers is a good sign. It shows that the model is worth a further attention and study.