Hongge Chen, Huan Zhang, Si Si, Yang Li, Duane Boning, Cho-Jui Hsieh
We study the robustness verification problem of tree based models, including random forest (RF) and gradient boosted decision tree (GBDT). Formal robustness verification of decision tree ensembles involves finding the exact minimal adversarial perturbation or a guaranteed lower bound of it. Existing approaches cast this verification problem into a mixed integer linear programming (MILP) problem, which finds the minimal adversarial distortion in exponential time so is impractical for large ensembles. Although this verification problem is NP-complete in general, we give a more precise complexity characterization. We show that there is a simple linear time algorithm for verifying a single tree, and for tree ensembles the verification problem can be cast as a max-clique problem on a multi-partite boxicity graph. For low dimensional problems when boxicity can be viewed as constant, this reformulation leads to a polynomial time algorithm. For general problems, by exploiting the boxicity of the graph, we devise an efficient verification algorithm that can give tight lower bounds on robustness of decision tree ensembles, and allows iterative improvement and any-time termination. On RF/GBDT models trained on a variety of datasets, we significantly outperform the lower bounds obtained by relaxing the MILP formulation into a linear program (LP), and are hundreds times faster than solving MILPs to get the exact minimal adversarial distortion. Our proposed method is capable of giving tight robustness verification bounds on large GBDTs with hundreds of deep trees.