Model Class Reliance for Random Forests

Part of Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020)

Bibtex »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Authors

Gavin Smith, Roberto Mansilla, James Goulding

Abstract

<p>Variable Importance (VI) has traditionally been cast as the process of estimating each variables contribution to a predictive model's overall performance. Analysis of a single model instance, however, guarantees no insight into a variables relevance to underlying generative processes. Recent research has sought to address this concern via analysis of Rashomon sets - sets of alternative model instances that exhibit equivalent predictive performance to some reference model, but which take different functional forms. Measures such as Model Class Reliance (MCR) have been proposed, that are computed against Rashomon sets, in order to ascertain how much a variable must be relied on to make robust predictions, or whether alternatives exist. If MCR range is tight, we have no choice but to use a variable; if range is high then there exists competing, perhaps fairer models, that provide alternative explanations of the phenomena being examined. Applications are wide, from enabling construction of `fairer' models in areas such as recidivism, health analytics and ethical marketing. Tractable estimation of MCR for non-linear models is currently restricted to Kernel Regression under squared loss \cite{fisher2019all}. In this paper we introduce a new technique that extends computation of Model Class Reliance (MCR) to Random Forest classifiers and regressors. The proposed approach addresses a number of open research questions, and in contrast to prior Kernel SVM MCR estimation, runs in linearithmic rather than polynomial time. Taking a fundamentally different approach to previous work, we provide a solution for this important model class, identifying situations where irrelevant covariates do not improve predictions.</p>