{"title": "Robustness Verification of Tree-based Models", "book": "Advances in Neural Information Processing Systems", "page_first": 12317, "page_last": 12328, "abstract": "We study the robustness verification problem of tree based models, including random forest (RF) and gradient boosted decision tree (GBDT).\nFormal robustness verification of decision tree ensembles involves finding the exact minimal adversarial perturbation or a guaranteed lower bound of it. Existing approaches cast this verification problem into a mixed integer linear programming (MILP) problem, which finds the minimal adversarial distortion in exponential time so is impractical for large ensembles. Although this verification problem is NP-complete in general, we give a more precise complexity characterization. We show that there is a simple linear time algorithm for verifying a single tree, and for tree ensembles the verification problem can be cast as a max-clique problem on a multi-partite boxicity graph. For low dimensional problems when boxicity can be viewed as constant, this reformulation leads to a polynomial time algorithm. For general problems, by exploiting the boxicity of the graph, we devise an efficient verification algorithm that can give tight lower bounds on robustness of decision tree ensembles, and allows iterative improvement and any-time termination. On RF/GBDT models trained on a variety of datasets, we significantly outperform the lower bounds obtained by relaxing the MILP formulation into a linear program (LP), and are hundreds times faster than solving MILPs to get the exact minimal adversarial distortion. Our proposed method is capable of giving tight robustness verification bounds on large GBDTs with hundreds of deep trees.", "full_text": "Robustness Veri\ufb01cation of Tree-based Models\n\n1Department of EECS, MIT\n\n2Department of Computer Science, UCLA\n\n3Google Research\n\nHongge Chen*,1 Huan Zhang*,2\n\nSi Si3 Yang Li3 Duane Boning1 Cho-Jui Hsieh2,3\n\nchenhg@mit.edu, huan@huan-zhang.com, sisidaisy@google.com\nliyang@google.com, boning@mtl.mit.edu, chohsieh@cs.ucla.edu\n\n*Hongge Chen and Huan Zhang contributed equally.\n\nAbstract\n\nWe study the robustness veri\ufb01cation problem for tree based models, including\ndecision trees, random forests (RFs) and gradient boosted decision trees (GBDTs).\nFormal robustness veri\ufb01cation of decision tree ensembles involves \ufb01nding the\nexact minimal adversarial perturbation or a guaranteed lower bound of it. Existing\napproaches \ufb01nd the minimal adversarial perturbation by a mixed integer linear\nprogramming (MILP) problem, which takes exponential time so is impractical for\nlarge ensembles. Although this veri\ufb01cation problem is NP-complete in general,\nwe give a more precise complexity characterization. We show that there is a\nsimple linear time algorithm for verifying a single tree, and for tree ensembles\nthe veri\ufb01cation problem can be cast as a max-clique problem on a multi-partite\ngraph with bounded boxicity. For low dimensional problems when boxicity can\nbe viewed as constant, this reformulation leads to a polynomial time algorithm.\nFor general problems, by exploiting the boxicity of the graph, we develop an\nef\ufb01cient multi-level veri\ufb01cation algorithm that can give tight lower bounds on\nrobustness of decision tree ensembles, while allowing iterative improvement and\nany-time termination. On RF/GBDT models trained on 10 datasets, our algorithm is\nhundreds of times faster than a previous approach that requires solving MILPs, and\nis able to give tight robustness veri\ufb01cation bounds on large GBDTs with hundreds\nof deep trees.\n\nIntroduction\n\n1\nRecent studies have demonstrated that neural network models are vulnerable to adversarial\nperturbations\u2014a small and human imperceptible input perturbation can easily change the predicted\nlabel [37, 17, 6, 15]. This has created serious security threats to many real applications so it becomes\nimportant to formally verify the robustness of machine learning models. Usually, the robustness\nveri\ufb01cation problem can be cast as \ufb01nding the minimal adversarial perturbation to an input example\nthat can change the predicted class label. A series of robustness veri\ufb01cation algorithms have been\ndeveloped for neural network models [21, 38, 43, 42, 41, 47, 16, 35], where ef\ufb01cient algorithms are\nmostly based on convex relaxations of nonlinear activation functions of neural networks [32].\nWe study the robustness veri\ufb01cation problem of tree-based models, including a single decision tree\nand tree ensembles such as random forests (RFs) and gradient boosted decision trees (GBDTs). These\nmodels have been widely used in practice [12, 22, 46] and recent studies have demonstrated that both\nRFs and GBDTs are vulnerable to adversarial perturbations [20, 13, 9]. It is thus important to develop\na formal robustness veri\ufb01cation algorithm for tree-based models. Robustness veri\ufb01cation requires\ncomputing the minimal adversarial perturbation. [20] showed that computing minimal adversarial\nperturbation for tree ensemble is NP-complete in general, and they proposed a Mixed-Integer Linear\nProgramming (MILP) based approach to compute the minimal adversarial perturbation. Although\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fexact veri\ufb01cation is NP-hard, in order to have an ef\ufb01cient veri\ufb01cation algorithm for real applications\nwe seek to answer the following questions:\n\u2022 Do we have polynomial time algorithms for exact veri\ufb01cation under some special circumstances?\n\u2022 For general tree ensemble models with a large number of trees, can we ef\ufb01ciently compute\n\nmeaningful lower bounds on robustness while scaling to large tree ensembles?\n\nIn this paper, we answer the above-mentioned questions af\ufb01rmatively by formulating the veri\ufb01cation\nproblem of tree ensembles as a graph problem. First, we show that for a single decision tree,\nrobustness veri\ufb01cation can be done exactly in linear time. Then we show that for an ensemble of K\ntrees, the veri\ufb01cation problem is equivalent to \ufb01nding the maximum cliques in a K-partite graph,\nand the graph is in a special form with boxicity equal to the input feature dimension. Therefore,\nfor low-dimensional problems, veri\ufb01cation can be done in polynomial time with maximum clique\nsearching algorithms. Finally, for large-scale tree ensembles, we propose a multiscale veri\ufb01cation\nalgorithm by exploiting the boxicity of the graph, which can give tight lower bounds on robustness.\nFurthermore, it supports any-time termination: we can stop the algorithm at any time to obtain a\nreasonable lower bound given a computation time constraint. Our proposed algorithm is ef\ufb01cient and\nis scalable to large tree ensemble models. For instance, on a large multi-class GBDT with 200 trees\nrobustly trained (using [9]) on the MNIST dataset, we obtained 78% veri\ufb01ed robustness accuracy on\ntest set with maximum (cid:96)\u221e perturbation \u0001 = 0.2 and the time used for verifying each test example is\n12.6 seconds, whereas the MILP method uses around 10 min for each test example.\n2 Background and Related Work\nAdversarial Robustness For simplicity, we consider a multi-class classi\ufb01cation model f : Rd \u2192\n{1, . . . , C} where d is the input dimension and C is number of classes. For an input example x,\nassuming that y0 = f (x) is the correct label, the minimal adversarial perturbation is de\ufb01ned by\n(1)\n\n(cid:107)\u03b4(cid:107)\u221e s.t. f (x + \u03b4) (cid:54)= y0.\n\nr\u2217 = min\n\n\u03b4\n\nNote that we focus on the (cid:96)\u221e norm measurement in this paper which is widely used in recent\nstudies [25, 43, 5]. Exactly solving (1) is usually intractable. For example, if f (\u00b7) is a neural network,\n(1) is non-convex and [21] showed that solving (1) is NP-complete for ReLU networks.\nAdversarial attacks are algorithms developed for \ufb01nding a feasible solution \u00af\u03b4 of (1), where (cid:107)\u00af\u03b4(cid:107)\u221e is\nan upper bound of r\u2217. Many algorithms have been proposed for attacking machine learning models\n[17, 23, 6, 25, 10, 11, 18, 3, 13, 28, 24, 45]. Most practical attacks cannot guarantee to reach the\nminimal adversarial perturbation r\u2217 due to the non-convexity of (1). Therefore, attacking algorithms\ncannot provide any formal guarantee on model robustness [1, 40].\nOn the other hand, robustness veri\ufb01cation algorithms are designed to \ufb01nd the exact value or a\nlower bound of r\u2217. An exact veri\ufb01er needs to solve (1) to the global optimal, so typically we resort\nto relaxed veri\ufb01ers that give lower bounds. After a veri\ufb01cation algorithm \ufb01nds a lower bound r, it\nguarantees that no adversarial example exists within a radius r ball around x. This is important for\ndeploying machine learning algorithms to safety-critical applications such as autonomous vehicles or\naircraft control systems [21, 19].\nFor veri\ufb01cation, instead of solving (1) we can also solve the following decision problem of robust-\nness veri\ufb01cation\n\nDoes there exist an x(cid:48) \u2208 Ball(x, \u0001) such that f (x(cid:48)) (cid:54)= y0?\n\n(2)\nIn our setting Ball(x, \u0001) := {x(cid:48) : (cid:107)x(cid:48) \u2212 x(cid:107)\u221e \u2264 \u0001}. If we can answer this decision (\u201cyes\u201d/\u201cno\u201d)\nproblem, a binary search can give us the value of r\u2217, so the complexity of (2) is in the same order of (1).\nFurthermore, solving (1) using an approximation algorithm (with answer \u201cunknown\u201d allowed) can\nlead to a lower bound of r\u2217, which is useful for veri\ufb01cation. The decision version is also widely used\nin the veri\ufb01cation community since \u201cveri\ufb01ed accuracy under \u0001 perturbation\u201d is an important metric,\nwhich is de\ufb01ned as the portion of test samples that the answers to (2) are \u201cno\u201d. Veri\ufb01cation methods\nfor neural networks have been studied extensively in the past few years [43, 44, 42, 47, 35, 16, 36].\n\nAdversarial Robustness of Tree-based Models Unlike neural networks, decision-tree based mod-\nels are non-continuous step functions, and thus existing neural network veri\ufb01cation techniques\ncannot be directly applied. In [2], a single decision tree was veri\ufb01ed to evaluate the robustness of\nreinforcement learning policies. For tree ensembles, [20] showed that solving (1) for general tree\n\n2\n\n\fensemble models is NP-complete, so no polynomial time algorithm can compute r\u2217 for arbitrary trees\nunless P=NP. A Mixed Integer Linear Programming (MILP) algorithm was thus proposed in [20] to\ncompute (1) in exponential time. Recently, [14] and [33] verify the robustness of tree ensembles using\nan SMT solver, which is also NP-complete in its natural formulation. Additionally, an approximate\nbound for tree ensembles was proposed recently in [39] by directly combining the bounds of each\ntree together, which can be seen as a special case of our proposed method.\nOn the other hand, robustness can be empirically evaluated through adversarial attacks [27]. Some\nhard-label attacking algorithms for neural networks, including the boundary attack [3] and OPT-\nattack [13], can be applied to tree based models since they only require function evaluation of the\nnon-smooth (hard-label) decision function f (\u00b7). These attacks computes an upper bound of r\u2217. In\ncontrast, our work focuses on ef\ufb01ciently computing a tight lower bound of r\u2217 for ensemble trees.\n3 Proposed Algorithm\nThe exact veri\ufb01cation problem of tree ensemble is NP-complete by its nature, and here we propose a\nseries of ef\ufb01cient veri\ufb01cation algorithms for real applications. First, we will introduce a linear time\nalgorithm for exactly computing the minimal adversarial distortion r\u2217 for verifying a single decision\ntree. For an ensemble of trees, we cast the veri\ufb01cation problem into a max-clique searching problem\nin K-partite graphs. For large-scale tree ensembles, we then propose an ef\ufb01cient multi-level algorithm\nfor verifying an ensemble of decision trees.\n\n3.1 Exactly Verifying a Single Tree in Linear Time\nAlthough computing r\u2217 for a tree ensemble is NP-complete [20], we show that a linear time\nalgorithm exists for \ufb01nding the minimum adversarial perturbation and computing r\u2217 for a single\ndecision tree. We assume the decision tree has n nodes and the root node is indexed as 0. For a\ngiven example x = [x1, . . . , xd] with d features, starting from the root, x traverses the decision tree\nmodel until reaching a leaf node. Each internal node, say node i, has two children and a univariate\nfeature-threshold pair (ti, \u03b7i) to determine the traversal direction\u2014x will be passed to the left child if\nxti \u2264 \u03b7i and to the right child otherwise. Each leaf node has a value vi corresponding to the predicted\nclass label for a classi\ufb01cation tree, or a real value for a regression tree.\nConceptually, the main idea of our single tree veri\ufb01cation algorithm is to compute a d-dimensional\nbox for each leaf node such that any example in this box will fall into this leaf. Mathematically, the\nnode i\u2019s box is de\ufb01ned as the Cartesian product Bi = (li\nd] of d intervals on the\nreal line. By de\ufb01nition, the root node has box [\u2212\u221e,\u221e] \u00d7 \u00b7\u00b7\u00b7 \u00d7 [\u2212\u221e,\u221e] and given the box of an\ninternal node i, its children\u2019s boxes can be obtained by changing only one interval of the box based on\nthe split condition (ti, \u03b7i). More speci\ufb01cally, if p, q are node i\u2019s left and right child node respectively,\nthen we set their boxes Bp = (lp\nd] by setting\n\n1] \u00d7 \u00b7\u00b7\u00b7 \u00d7 (li\n\nd, rp\n\n1, rp\n\nd, ri\n\n1, ri\n\n1] \u00d7 \u00b7\u00b7\u00b7 \u00d7 (lp\nif t (cid:54)= ti\nif t = ti\n\n,\n\n(cid:26)(li\n\n1] \u00d7 \u00b7\u00b7\u00b7 \u00d7 (lq\nd] and Bq = (lq\n1, rq\nt, ri\nt]\n(max{li\n\nt, \u03b7i}, ri\nt]\n\n(lq\nt , rq\n\nt ] =\n\nd, rq\nif t (cid:54)= ti\nif t = ti.\n\n(cid:26)(li\n\n(lp\nt , rp\n\nt ] =\n\nt, ri\nt]\nt, min{ri\n(li\n\nt, \u03b7i}]\n\n(3)\n\nAfter computing the boxes for internal nodes, we can also obtain the boxes for leaf nodes using (3).\nTherefore computing the boxes for all the leaf nodes of a decision tree can be done by a depth-\ufb01rst\nsearch traversal of the tree with time complexity O(nd).\nWith the boxes computed for each leaf node, the minimum perturbation required to change x to go to\na leaf node i can be written as a vector \u0001(x, Bi) \u2208 Rd de\ufb01ned as\nif xt \u2208 (li\nt, ri\nt]\nif xt > ri\nt\nif xt \u2264 li\nt.\n\nxt \u2212 ri\nt \u2212 xt\nli\n\n\uf8f1\uf8f2\uf8f30\n\n\u0001(x, Bi)t :=\n\nThen the minimal distortion can be computed as r\u2217 = mini:vi(cid:54)=y0 (cid:107)\u0001(x, Bi)(cid:107)\u221e, where y0 is the\noriginal label of x, and vi is the label for leaf node i. To \ufb01nd r\u2217, we check Bi for all leaves and\nchoose the smallest perturbation. This is a linear-time algorithm for exactly verifying the robustness\nof a single decision tree. In fact, this O(nd) time algorithm is used to illustrate the concept of \u201cboxes\u201d\nthat will be used later on for the tree ensemble case. If our \ufb01nal goal is to verify a single tree, we\ncan have a more ef\ufb01cient algorithm by combining the distance computation (4) in the tree traversal\nprocedure, and the resulting algorithm will take only O(n) time. This algorithm is presented as\nAlgorithm 3 in the Appendix.\n\n(4)\n\nt\n\n3\n\n\f3.2 Verifying Tree Ensembles by Max-clique Enumeration\nNow we discuss the robustness veri\ufb01cation for tree ensembles. Assuming the tree ensemble has K\ndecision trees, we use S(k) to denote the set of leaf nodes of tree k and m(k)(x) to denote the function\nthat maps the input example x to the leaf node of tree k according to its traversal rule. Given an input\nexample x, the tree ensemble will pass x to each of these K trees independently and x reaches K\nleaf nodes i(k) = m(k)(x) for all k = 1, . . . , K. Each leaf node will assign a prediction value vi(k).\nFor simplicity we start with the binary classi\ufb01cation case, with x\u2019s original label being y0 = \u22121 and\nwe want to turn it into +1. For binary classi\ufb01cation the prediction of the tree ensemble is computed\nk vi(k) ), which covers both GBDTs and random forests, two widely used tree ensemble\nk vi(k)) < 0 for x, and our task is to\n\nby sign((cid:80)\nmodels. Assume x has a label y0 = \u22121, which means sign((cid:80)\n\nverify if the sign of the summation can be \ufb02ipped within Ball(x, \u0001).\nWe consider the decision problem of robustness veri\ufb01cation (2). A naive analysis will need to check\nall the points in Ball(x, \u0001) which is uncountably in\ufb01nite. To reduce the search space to \ufb01nite, we\nstart by de\ufb01ning some notation: let C = {(i(1), . . . , i(K)) | i(k) \u2208 S(k), \u2200k = 1, . . . , L} to be all the\npossible tuples of leaf nodes and let C(x) = [m(1)(x), . . . , m(K)(x)] be the function that maps x to\nthe corresponding leaf nodes. Therefore, a tuple C \u2208 C directly determines the model prediction\n\nk vi(k). Now we de\ufb01ne a valid tuple for robustness veri\ufb01cation:\n\nDe\ufb01nition 1. A tuple C = (i(1), . . . , i(K)) is valid if and only if there exists an x(cid:48) \u2208 Ball(x, \u0001) such\nthat C = C(x(cid:48)).\nThe decision problem of robustness veri\ufb01cation (2) can then be written as:\n\nDoes there exist a valid tuple C such that(cid:88)\n\nvC > 0?\n\n(cid:80) vC :=(cid:80)\n\nNext, we show how to model the set of valid tuples. We have two observations. First, if a tuple\ncontains any node i with inf x(cid:48)\u2208Bi{(cid:107)x \u2212 x(cid:48)(cid:107)\u221e} > \u0001, then it will be invalid. Second, there exists an x\nsuch that C = C(x) if and only if Bi(1) \u2229 \u00b7\u00b7\u00b7 \u2229 Bi(K) (cid:54)= \u2205, or equivalently:\n] (cid:54)= \u2205, \u2200t = 1, . . . , d.\n\n] \u2229 \u00b7\u00b7\u00b7 \u2229 (li(K)\n\n, ri(K)\n\n(li(1)\n\n, ri(1)\n\nt\n\nt\n\nt\n\nt\n\nWe show that the set of valid tuples can be represented as cliques in a graph G = (V, E), where\nV := {i|Bi \u2229 Ball(x, \u0001) (cid:54)= \u2205} and E := {(i, j)|Bi \u2229 Bj (cid:54)= \u2205}. In this graph, nodes are the leaves of\nall trees and we remove every leaf that has empty intersection with Ball(x, \u0001). There is an edge (i, j)\nbetween node i and j if and only if their boxes intersect. The graph will then be a K-partite graph\nsince there cannot be any edge between nodes from the same tree, and thus maximum cliques in this\ngraph will have K nodes. We de\ufb01ne each part of the K-partite graph as Vk. Here a \u201cpart\u201d means a\ndisjoint and independent set in the K-partite graph. The following lemma shows that intersections of\nboxes have very nice properties:\nLemma 1. For boxes B1, . . . , BK, if Bi \u2229 Bj (cid:54)= \u2205 for all i, j \u2208 [K], let \u00afB = B1 \u2229 B2 \u2229 \u00b7\u00b7\u00b7 \u2229 BK\nbe their intersection. Then \u00afB will also be a box and \u00afB (cid:54)= \u2205.\nThe proof can be found in the Appendix. Based on the above lemma, each K-clique (fully connected\nsubgraph with K nodes) in G can be viewed as a set of leaf nodes that has nonempty intersection\nwith each other and also has nonempty intersection with Ball(x, \u0001), so the intersection of those K\nboxes and Ball(x, \u0001) will be a nonempty box, which implies each K-clique corresponds to a valid\ntuple of leaf nodes:\nLemma 2. A tuple C = (i(1), . . . , i(K)) is valid if and only if nodes i(1), . . . , i(K) form a K-clique\n(maximum clique) in graph G constructed above.\n\nTherefore the robustness veri\ufb01cation problem can be formulated as\n\nIs there a maximum clique C in G such that(cid:88)\n\nvC > 0?\n\n(5)\n\nThis reformulation indicates that the tree ensemble veri\ufb01cation problem can be solved by an ef\ufb01cient\nmaximum clique enumeration algorithm. Some standard maximum clique searching algorithms can\nbe applied here to perform veri\ufb01cation:\n\u2022 Finding K-cliques in K-partite graphs: Any algorithm for \ufb01nding all the maximum cliques\n3 ) time to \ufb01nd all\n\nin G can be used. The classic B-K backtracking algorithm [4] takes O(3 m\n\n4\n\n\fthe maximum cliques where m is the number of nodes in G. Furthermore, since our graph is a\nK-partite graph, we can apply some specialized algorithms designed for \ufb01nding all the K-cliques\nin K-partite graphs [26, 29, 34].\n\u2022 Polynomial time algorithms exist for low-dimensional problems: Another important property\nfor graph G is that each node in G is a d-dimensional box and each edge indicates the intersection\nof two boxes. This implies our graph G is with \u201cboxicity d\u201d (see [7] for detail). [7] proved that\nthe number of maximum cliques will only be O((2m)d) and it is able to \ufb01nd the maximum weight\nclique in O((2m)d) time. Therefore, for problems with a very small d, the time complexity for\nveri\ufb01cation is actually polynomial.\n\nTherefore we can exactly solve the tree ensemble veri\ufb01cation problem using algorithms for maximum\ncliques searching in K-partite graph, and its time complexity is found to be as follows:\n\nTheorem 1. Exactly verifying the robustness of a K-tree ensemble with at most n leaves per tree\nand d dimensional features takes min{O(nK), O((2Kn)d)} time.\n\nThis is a direct consequence of the fact that the number of K-cliques in a K-partite graph with n\nvertices per part is bounded by O(nK), and number of maximum cliques in a graph with a total\nof m nodes with boxicity d is O((2m)d). For a general graph, since K and d can be in O(n) and\nO(m) [31], it can still be exponential. But the theorem gives a more precise characterization for the\ncomplexity of the veri\ufb01cation problem for tree ensembles. Based on the nice properties of maximum\ncliques searching problem, we propose a simple and elegant algorithm that enumerates all K-cliques\non a K-partite graph with a known boxicity d in Algorithm 1, and we can use this algorithm for tree\nensemble veri\ufb01cation when the number of trees or the dimension of features is small.\nFor a K-partite graph G, we de\ufb01ne the set \u02dcV := {V1, V2,\u00b7\u00b7\u00b7 , VK} which is a set of independent\nsets (\u201cparts\u201d) in G. The algorithm \ufb01rst looks at any \ufb01rst two parts V1 and V2 of the graph and\nenumerates all 2-cliques in O(|V1||V2|) time. Then, each 2-clique found is converted into a \u201cpseudo\nnode\u201d (this is possible due to Lemma 1), and all 2-cliques form a new part V (cid:48)\n2 of the graph. Then\nwe replace V1 and V2 with V (cid:48)\n2 and V3 to form\nV (cid:48)\n3. A 2-clique between V (cid:48)\n2 and V3 represents a 3-clique in V1, V2 and V3 due to boxicity. Note that\nenumerating all 3-cliques in a general 3-partite graph takes O(|V1||V2||V3|) time; thanks to boxicity,\nour algorithm takes O(|V (cid:48)\n2||V3|) time which equals to O(|V1||V2||V3|) only when V1 and V2 form a\ncomplete bipartite graph, which is unlikely in common cases. This process continues recursively\nuntil we process all K parts and have only V (cid:48)\nK represents a K-clique in\nthe original graph. After obtaining all K-cliques, we can verify their prediction values to compute a\nveri\ufb01cation bound.\nAlgorithm 1: Enumerating all K-cliques on a K-partite graph with a known boxicity d\ninput :V1, V2, , . . . , VK are the K independent sets (\u201cparts\u201d) of a K-partite graph\n1 for k \u2190 1, 2, 3, . . . , K do\n2\n\nUk \u2190 {(Ai, Bi(k)\n/* U is a set of tuples (A, B), which stores a set of cliques and their corresponding boxes. A is\nInitially, each\n\n2, and continue to enumerate all 2-cliques between V (cid:48)\n\nK left, where each vertex in V (cid:48)\n\nthe set of nodes in one clique and B is the corresponding box of this clique.\nnode in Vk forms a 1-clique itself.\n\n)|i(k) \u2208 Vk, Ai = {i(k)}};\n\n3 end\n4 CliqueEnumerate(U1, U2, , . . . , UK);\n5 Function CliqueEnumerate(U1, U2, , . . . , UK)\n6\n7\n8\n\n\u02c6Uold \u2190 U1;\nfor k \u2190 2, 3, . . . , K do\n\u02c6Unew \u2190 \u2205;\nfor ( \u02c6A, \u02c6B) \u2208 \u02c6Uold do\n\nfor (A, B) \u2208 Uk do\nif B \u2229 \u02c6B (cid:54)= \u2205 then\n\n/* A k-clique is found; add it as a pseudo node with the intersection of two boxes.\n\u02c6Unew \u2190 \u02c6Unew \u222a {(A \u222a \u02c6A, B \u2229 \u02c6B)};\n\n*/\n\n*/\n\n9\n10\n11\n\n12\n13\n14\n\n15\n16\n\nend\n\nend\n\u02c6Uold \u2190 \u02c6Unew;\n\nend\nreturn \u02c6Unew;\n\n17\n18 end\n\n5\n\n\fFigure 1: The proposed multi-level veri\ufb01cation algorithm. Lines between leaf node i on tree t1 and\nleaf node j on t2 indicate that their (cid:96)\u221e feature boxes intersect (i.e., there exists an input such that\ntree 1 predicts vi and tree 2 predicts vj).\n3.3 An Ef\ufb01cient Multi-level Algorithm for Verifying the Robustness of a Tree Ensemble\nPractical tree ensembles usually have tens or hundreds of trees with large feature dimensions, so\nAlgorithm 1 will take exponential time and will be too slow. We thus develop an ef\ufb01cient multi-level\nalgorithm for computing veri\ufb01cation bounds by further exploiting the boxicity of the graph.\nFigure 1 illustrates the graph and how our multilevel algorithm runs. There are four trees and each\ntree has four leaf nodes. A node is colored if it has nonempty intersection with Ball(x, \u0001); uncolored\nK-cliques, denoted by v\u2217. As mentioned before, for robustness veri\ufb01cation we only need to compute\nan upper bound of v\u2217 in order to get a lower bound of minimal adversarial perturbation. In the\nfollowing, we will \ufb01rst discuss algorithms for computing an upper bound at the top level, and then\nshow how our multi-scale algorithm iteratively re\ufb01nes this bound until reaching the exact solution v\u2217.\nTo compute an upper bound of v\u2217, a naive approach is to assume that\nBounds for a single level.\nthe graph is fully connected between independent sets (fully connected K-partite graph) and in this\ncase the maximum sum of node values is the sum of the maximum value of each independent set:\n\nnodes are discarded. To answer question (5), we need to compute the maximum(cid:80) vC among all\n\n(cid:88)| \u02dcV |\n\nmaxi\u2208Vk vi \u2265 v\u2217.\n\nk=1\n\n(6)\nHere we abuse the notation vi by assuming that each node i in Vk has been assigned a \u201cpseudo\nprediction value\u201d, which will be used in the multi-level setting. In the simplest case, each independent\nset represents a single tree, Vk = S(k) and vi is the prediction of a leaf. One can easily show this is\nan upper bound of v\u2217 since any K-clique in the graph is still considered when we add more edges to\nthe graph, and eventually it becomes a fully connected K-partite graph.\nAnother slightly better approach is to exploit the edge information but only between tree t and t + 1.\nIf we search over all the length-K paths [i(1), . . . , i(K)] from the \ufb01rst to the last part and de\ufb01ne the\nk vi(k), then the maximum valued path will be a upper bound of v\u2217. This can\nbe computed in linear time using dynamic programming. We scan nodes from tree 1 to tree K, and\nfor each node we store a value di which is the maximum value of paths from tree 1 to this node. At\ntree k and node i, the di value can be computed by\n\nvalue of a path to be(cid:80)\n\n(7)\nThen we take the max d value in the last tree. It produces an upper bound of v\u2217, since the maximum\nvalued path found by dynamic programming is not necessarily a K-clique. Again Vk\u22121 = S(k\u22121) in\nthe \ufb01rst level but it will be generalized below.\n\nj:j\u2208Vk\u22121 and (j,i)\u2208E\n\ndi = vi +\n\nmax\n\ndj.\n\nMerging T independent sets To re\ufb01ne the relatively loose single-level bound, we partition the\ngraph into K/T subgraphs, each with T independent sets. Within each subgraph, we \ufb01nd all the\nT -cliques and use a new \u201cpseudo node\u201d to represent each T -clique. T -cliques in a subgraph can be\nenumerated ef\ufb01ciently if we choose T to be a relatively small number (e.g., 2 or 3 in the experiments).\nNow we exploit the boxicity property to form a new graph among these T -cliques (illustrated as the\nsecond level nodes in Figure 1). By Lemma 1, we know that the intersection of T boxes will still be\n\n6\n\nTree (1)123Tree (2)567Tree (3)91011Tree (4)131648121415Leaf nodes3, 64, 812, 1412, 15Merge (1) and (2)Merge (3) and (4)Merge (1) (2) and (3) (4)3, 6, 12, 144, 8, 12, 15Final (exact) solutionRun single-level algorithm to get level 1 boundRun single-level algorithm to get level 2 bound\fa box, so each T -clique is still a box and can be represented as a pseudo node in the level-2 graph.\nAlso because each pseudo node is still a box, we can easily form edges between pseudo nodes to\nindicate the nonempty overlapping between them and this will be a (K/T )-partite boxicity graph\nsince no edge can be formed for the cliques within the same subgraph. Thus we get the level-2 graph.\nWith the level-2 graph, we can again run the single level algorithm to compute a upper bound on v\u2217\nto get a lower bound of r\u2217 in (1), but different from the level-1 graph, now we already considered all\nthe within-subgraph edges so the bounds we get will be tighter.\n\nThe overall multi-level framework We can run the algorithm level by level until merging all the\nsubgraphs into one, and in the \ufb01nal level the pseudo nodes will correspond to the K-cliques in the\noriginal graph, and the maximum value will be exactly v\u2217. Therefore, our algorithm can be viewed as\nan anytime algorithm that re\ufb01nes the upper bound level-by-level until reaching the maximum value.\nAlthough getting to the \ufb01nal level still requires exponential time, in practice we can stop at any level\n(denoted as L) and get a reasonable bound. In experiments, we will show that by merging few trees\nwe already get a bound very close to the \ufb01nal solution. Algorithm 2 gives the complete procedure.\n\nAlgorithm 2: Multi-level veri\ufb01cation framework\ninput :The set of leaf nodes of each tree, S(1), S(2), , . . . , S(K); maximum number of independent sets in a subgraph\n(denoted as T ); maximum number of levels (denoted as L), L \u2264 (cid:100)logT (K)(cid:101);\n1 for k \u2190 1, 2, . . . , K do\nk \u2190 {(Ai, Bi(k)\nU (0)\n2\n/* U is defined the same as in Algorithm 1.\n\nAt level 0, each Vk forms a 1-clique by itself.\n\n)|i(k) \u2208 S(k), Ai = {i(k)}};\n\n/* Enumerate all cliques in each subgraph at this level. Total (cid:100)K/T l(cid:101) subgraphs.\nfor k \u2190 1, 2, . . . , (cid:100)K/T l(cid:101) do\n\nk \u2190 CliqueEnumerate(U (l\u22121)\nU (l)\n\n(k\u22121)T +1, U (l\u22121)\n\n(k\u22121)T +2, . . . , U (l\u22121)\n\nkT\n\n);\n\n*/\n\n*/\n\n*/\n*/\n\n*/\n\n*/\n\n3 end\n4 for l \u2190 1, 2, . . . , L do\n\nend\n\n7\n8 end\n9 for k \u2190 1, 2, . . . , (cid:100)K/T L(cid:101) do\n\n5\n\n6\n\n10\n\n/* Define an independent set V (cid:48)\n\nk for each U (L)\nk .\n\nIn each V (cid:48)\n\nk \u2190 {A(cid:12)(cid:12) (A, B) \u2208 U (L)\nk, vA \u2190(cid:80)\n\nmultiple nodes from lower levels, and assign \u201cpseudo prediction values\u201d to them.\n\nk } ;\n\nV (cid:48)\n/* Construct the \u201cpseudo prediction value\u201d for each element in V (cid:48)\nFor all A \u2208 V (cid:48)\n\nvalues in the corresponding clique.\n\n/* V (cid:48)\n\nk is a set of sets; each element in V (cid:48)\n\nk represents a clique.\n\nk by summing up all prediction\n\nk, we create \u201cpseudo nodes\u201d which combines\n\n11\n12 end\n13 \u00afv \u2190 an upper bound of v\u2217 using (6) or (7), given \u02dcV = {V (cid:48)\n\ni\u2208A vi\n\n1 ,\u00b7\u00b7\u00b7 , V (cid:48)\n\n(cid:100)K/T L(cid:101)};\n\n/* If (cid:100)K/T L(cid:101) = 1, only 1 independent set left and each pseudo node represents a K-clique; (6) or (7)\n\nwill have a trivial solution where v\u2217 is the maximum vA in U (L)\n1 .\n\n(cid:80)\n\nin group c(cid:48), such that initially(cid:80)\n\nHandling multi-class tree ensembles For a multiclass classi\ufb01cation problem, say a C-class classi-\n\ufb01cation problem, C groups of tree ensembles (each with K trees) are built for the classi\ufb01cation task;\nfor the k-th tree in group c, prediction outcome is denoted as i(k,c) = m(k,c)(x) where m(k,c)(x) is\nthe function that maps the input example x to a leaf node of tree k in group c. The \ufb01nal prediction is\ngiven by argmaxc\nk vi(k,c). Given an input example x with ground-truth class c and an attack target\nclass c(cid:48), we extract 2K trees for class c and class c(cid:48), and \ufb02ip the sign of all prediction values for trees\nt vi(t,c(cid:48)) < 0 for a correctly classi\ufb01ed example. Then,\nwe are back to the binary classi\ufb01cation case with 2K trees, and we can still apply our multi-level\nframework to obtain a lower bound r(c,c(cid:48)) of r\u2217\n(c,c(cid:48)) for this target attack pair (c, c(cid:48)). Robustness of an\nuntargeted attack can be evaluated by taking r = minc(cid:48)(cid:54)=c r(c,c(cid:48)).\n3.4 Veri\ufb01cation Problems Beyond Ordinary Robustness\nThe above discussions focus on the decision problem of (cid:96)\u221e robustness veri\ufb01cation (2). In fact, our\napproach works for a more general veri\ufb01cation problem for any d-dimensional box B:\n\nt vi(t,c) +(cid:80)\n\nIs there any x(cid:48) \u2208 B such that f (x(cid:48)) (cid:54)= y0?\n\n(8)\nIn typical robustness veri\ufb01cation settings, B is de\ufb01ned to be Ball(x, \u0001) but in fact we can allow any\nboxes in our algorithm. For a general B, Lemma 1 still holds so all of our algorithms and analysis\ncan go through. The only change is to compute the intersection between B and each box of leaf\nnode at the \ufb01rst level in Figure 1 and eliminate nodes that have an empty intersection with B. So\n\n7\n\n\fbound of the model robustness r\u2217. We denote it as r and r \u2265 r\u2217.\n\nrobustness veri\ufb01cation is just a special case where we remove all the nodes with empty intersection\nwith Ball(x, \u0001). For example, we can identify a set of unimportant variables, where any individual\nfeature change in this set cannot alter the prediction for a given sample x. For each feature i, we\ncan choose B as Bi = [\u2212\u221e,\u221e] (or the the entire input domain, like [0, 1] for image data) and\nBj(cid:54)=i = {xj} otherwise. If the model is robust to such a single-feature perturbation, then this feature\nis added to the unimportant set. Similarly, we can get a set of anchor features (similar to [30]) such\nthat once a set of features are \ufb01xed, any perturbation outside the set cannot change the prediction.\n4 Experiments\nWe evaluate our proposed method for robustness veri\ufb01cation of tree ensembles on two tasks: binary\nand multiclass classi\ufb01cation on 9 public datasets including both small and large scale datasets. Our\ncode (XGBoost compatible) is available at https://github.com/chenhongge/treeVeri\ufb01cation. We run\nour experiments on Intel Xeon Platinum 8160 CPUs. The datasets other than MNIST and Fashion-\nMNIST are from LIBSVM [8]. The statistics of the data sets are shown in Appendix A. As we\nde\ufb01ned in Section 2, r\u2217 is the radius of minimum adversarial perturbation that re\ufb02ects true model\nrobustness, but is hard to obtain; our method \ufb01nds r that is a lower bound of r\u2217, which guarantees\nthat no adversarial example exists within radius r. A high quality lower bound r should be close to\nr\u2217. We include the following algorithms in our comparisons:\n\u2022 Cheng\u2019s attack [13] provides results on adversarial attacks on these models, which gives an upper\n\u2022 MILP: an MILP (Mixed Integer Linear Programming) based method [20] gives the exact r\u2217. It can\n\u2022 LP relaxation: a Linear Programming (LP) relaxed MILP formulation by directly changing\nall binary variables to continuous ones. Since the binary constraints are removed, solving the\nminimization of MILP gives a lower bound of robustness, rLP , serving as a baseline method.\n\u2022 Our proposed multi-level veri\ufb01cation framework in Section 3.3 (with pseudo code as Algorithm 2 in\nthe appendix). We are targeting to compute robustness interval rour for tree ensemble veri\ufb01cation.\nIn Tables 1 and 2 we show empirical comparisons on 9 datasets. We consider (cid:96)\u221e robustness, and\nnormalize our datasets to [0, 1] such that perturbations on different datasets are comparable. We\nuse (6) to obtain single layer bounds. Results using dynamic programming in (7) are provided in\nAppendix B. We include both standard (naturally trained) GBDT models (Table 1) and robust GBDT\nmodels [9] (Table 2). The robust GBDTs were trained by considering model performance under the\nworst-case perturbation, which leads to a max-min saddle point problem when \ufb01nding the optimal\nsplit at each node [9]. All GBDTs are trained using the XGBoost framework [12]. The number of\ntrees in GBDTs and parameters used in training GBDTs for different datasets are shown in Table 3 in\nthe appendix. Because we solve the decision problem of robustness veri\ufb01cation, we use a 10-step\nbinary search to \ufb01nd the largest r in all experiments, and the reported time is the total time including\nall binary search trials. We present the average of r or r\u2217 over 500 examples. The MILP based\nmethod from [20] is an accurate but very slow method; the results marked with an asterisk (\u201c*\u201d) in\nthe table have very long running time and thus we only evaluate 50 examples instead of 500.\n\nbe very slow when the number of trees or dimension of the features increases.\n\navg. time\n\nLP relaxation\n\nOurs (without DP)\n\nDataset\n\nbreast-cancer\n\ncovtype\ndiabetes\n\nCheng\u2019s attack [13] MILP [20]\navg. r\n.221\n.058\n.064\nFashion-MNIST .048\n.015\n.047\n.070\n.027\n.152\n\nOurs vs. MILP\navg. r\u2217 avg. time avg. rLP avg. time T L avg. rour avg. time rour/r\u2217 speedup\n.012s\n12X\n.210\n355(cid:63)s\n105X\n.028(cid:63)\n3.4X\n.061s\n.049\n97X\n1150(cid:63)s\n.014(cid:63)\n.0028(cid:63) 68(cid:63)min\n3163X\n4.6X\n4.64s\n.030\n71X\n367(cid:63)s\n.011(cid:63)\n117X\n47.2s\n.00076\n.057\n23.0s\n39X\n\n2 1\n.064\n2 3\n.005(cid:63)\n3 2\n.015\n2 1\n.003(cid:63)\n.00035(cid:63) 50(cid:63)min 4 1\n2 2\n2 2\n2 1\n4 1\n\n.208\n.022\n.042\n.012\n.0022\n.026\n.011\n.0005\n.046\n\n.009s\n154(cid:63)s\n.026s\n898(cid:63)s\n\n2.67s\n332(cid:63)s\n39.7s\n11.6s\n\n2.18s\n4.76s\n1.70s\n12.2s\n3.80s\n2.72s\n11.1s\n5.83s\n12.0s\n\n.001s\n3.39s\n.018s\n11.8s\n1.29s\n.101s\n5.14s\n.404s\n.585s\n\nHIGGS\nijcnn1\nMNIST\nwebspam\n\nMNIST 2 vs. 6\n\n.99\n.79\n.86\n.86\n.79\n.87\n1.00\n.66\n.81\n\n.008\n.003(cid:63)\n.0002\n.016\n\nTable 1: Average (cid:96)\u221e distortion over 500 examples and average veri\ufb01cation time per example for three\nveri\ufb01cation methods. Here we evaluate the bounds for standard (natural) GBDT models. Results\nmarked with a start (\u201c(cid:63)\u201d) are the averages of 50 examples due to long running time. T is the number\nof independent sets and L is the number of levels in searching cliques used in our algorithm. A ratio\nrour/r\u2217 close to 1 indicates better lower bound quality. Dynamic programming in (7) is not applied.\nResults using dynamic programming are provided in Appendix B.\nFrom Tables 1 and 2 we can see that our method gives a tight lower bound r compared to r\u2217 from\nMILP, while achieving up to \u223c 3000X speedup on large models. The running time of the baseline\n\n8\n\n\fLP relaxation\n\nDataset\n\navg. time\n\nbreast-cancer\n\ncovtype\ndiabetes\n\nCheng\u2019s attack [13] MILP [20]\navg. r\n.404\n.079\n.137\nFashion-MNIST .153\n.023\n.054\n.367\n.048\n.397\n\n1.96s\n.481s\n1.52s\n13.9s\n3.58s\n2.63s\n1.41s\n4.97s\n17.2s\n\nHIGGS\nijcnn1\nMNIST\nwebspam\n\nMNIST 2 vs. 6\n\nOurs (without DP)\n\nOurs vs. MILP\navg. r\u2217 avg. time avg. rLP avg. time T L avg. rour avg. time rour/r\u2217 speedup\n.009s\n.400\n305(cid:63)s\n.046(cid:63)\n.034s\n.112\n41(cid:63)min\n.091(cid:63)\n.0084(cid:63) 59(cid:63)min\n.036\n2.52s\n615(cid:63)s\n.264(cid:63)\n83.7s\n.015\n.313\n91.5s\n\n2 1\n.008s\n.078\n2 3\n159(cid:63)s\n.0053(cid:63)\n.013s\n3 2\n.035\n34(cid:63)min 2 1\n.009(cid:63)\n.00031(cid:63) 54(cid:63)min 4 1\n2 2\n2 2\n2 1\n4 1\n\n9X\n63X\n5.7X\n137X\n2511X\n4.3X\n49X\n243X\n25X\n\n.399\n.032\n.109\n.071\n.0063\n.032\n.253\n.011\n.308\n\n.001s\n4.84s\n.006s\n18.0s\n1.41s\n0.58s\n12.6s\n.345s\n3.68s\n\n1.00\n.70\n.97\n.78\n.75\n.89\n.96\n.73\n.98\n\n1.26s\n515(cid:63)s\n60.4s\n40.0s\n\n.009\n.019(cid:63)\n.0024\n.039\n\nTable 2: Veri\ufb01cation bounds and running time for robustly trained GBDT models introduced in [9].\nThe settings for each method are similar to the settings in Table 1.\nLP relaxation, however, is on the same order of magnitude as the MILP method, but the results are\nmuch worse, with rLP (cid:28) r\u2217. Figure 2 shows how the tightness of our robustness veri\ufb01cation lower\nbounds changes with different size of clique per level (T ) and different number of levels (L). We\ntest on a 20-tree standard GBDT model on the diabetes dataset. We also show the exact bound r\u2217\nby the MILP method. Our veri\ufb01cation bound converges to the MILP bound as more levels of clique\nenumerations are used. Also, when we use larger cliques in each level, the bound becomes tighter.\nTo show the scalability of our method, we vary the number of trees in GBDTs and compare per\nexample running time with the MILP method on ijcnn1 dataset in Figure 3. We see that our multi-level\nmethod spends much less time on each example compared to the MILP method and our running time\ngrows slower than MILP when the number of trees increases.\n\nFigure 2: Robustness bounds obtained with different param-\neters (T = {2, 3, 4}, L = {1,\u00b7\u00b7\u00b7 , 6}) on a 20-tree standard\nFigure 3: Running time of MILP and\nGBDT model trained on diabetes dataset (left) and a 20-tree\nour method on robust GBDTs with dif-\nrobust GBDT model trained on ijcnn1 dataset (right). rour\nconverges to r\u2217 as L increases.\nferent number of trees (ijcnn1 dataset).\nIn Section 3.4, we showed that our algorithm works for more general veri\ufb01cation problems such\nas identifying unimportant features, where any changes on one of those features alone cannot alter\nthe prediction. We use MNIST to demonstrate pixel importance, where we perturb each pixel\nindividually by \u00b1\u0001 while keeping other pixels unchanged, and obtain the largest \u0001 such that prediction\nis unchanged. In Figure 4, yellow pixels cannot change prediction for any perturbation and a darker\npixel represents a smaller lower bound r of perturbation to change the model output using that\npixel. The standard naturally trained model has some very dark pixels compared to the robust model.\nDiscussion on the connection between this score and other feature importance scores is in Section C.\n\nFigure 4: MNIST pixel importance. For each 3-image group, left: digit image; middle: results on\nstandard DT model; right: results on robust DT model. Changing one of any yellow pixels (r = 1.0)\nto any valid values between 0 and 1 cannot alter model prediction; pixels in darker colors (smaller r)\ntend to affect model prediction more than pixels in lighter colors (larger r).\nAcknowledgement. Chen and Boning acknowledge the support of SenseTime. Hsieh acknowledges\nthe support of NSF IIS-1719097 and Intel faculty award.\n\n9\n\n123456Number of Levels0.000.010.020.030.040.05Robustness BoundStandard GBDT (diabetes)2-clique3-clique4-cliqueMILP123456Number of Levels0.000.010.020.030.040.05Robustness BoundRobust GBDT (ijcnn1)2-clique3-clique4-cliqueMILP20406080100Number of Trees10\u2212410\u2212310\u2212210\u22121100101Running Time Per Example (s)Robust GBDT (ijcnn1)2-clique, 1-level2-clique, 2-level3-clique, 1-levelMILPStandard DTRobust DT0.00.51.0Standard DTRobust DT0.00.51.0Standard DTRobust DT0.00.51.0\fReferences\n[1] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security:\n\nCircumventing defenses to adversarial examples. In ICML, 2018.\n\n[2] Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. Veri\ufb01able reinforcement learning via policy\n\nextraction. In Advances in Neural Information Processing Systems, pages 2494\u20132504, 2018.\n\n[3] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks\n\nagainst black-box machine learning models. In ICLR, 2018.\n\n[4] Coen Bron and Joep Kerbosch. Algorithm 457: \ufb01nding all cliques of an undirected graph. Communications\n\nof the ACM, 16(9):575\u2013577, 1973.\n\n[5] Rudy R Bunel, Ilker Turkaslan, Philip Torr, Pushmeet Kohli, and Pawan K Mudigonda. A uni\ufb01ed view\nof piecewise linear neural network veri\ufb01cation. In Advances in Neural Information Processing Systems,\npages 4790\u20134799, 2018.\n\n[6] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE\n\nSymposium on Security and Privacy (SP), pages 39\u201357. IEEE, 2017.\n\n[7] L Sunil Chandran, Mathew C Francis, and Naveen Sivadasan. Geometric representation of graphs in low\n\ndimension using axis parallel boxes. Algorithmica, 56(2):129, 2010.\n\n[8] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions\non Intelligent Systems and Technology, 2:27:1\u201327:27, 2011. Software available at http://www.csie.ntu.edu.\ntw/~cjlin/libsvm.\n\n[9] Hongge Chen, Huan Zhang, Duane Boning, and Cho-Jui Hsieh. Robust decision trees against adversarial\n\nexamples. In ICML, 2019.\n\n[10] Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. Ead: elastic-net attacks to deep\nneural networks via adversarial examples. In Thirty-second AAAI conference on arti\ufb01cial intelligence,\n2018.\n\n[11] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization\nbased black-box attacks to deep neural networks without training substitute models. In Proceedings of the\n10th ACM Workshop on Arti\ufb01cial Intelligence and Security, pages 15\u201326. ACM, 2017.\n\n[12] Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd\nacm sigkdd international conference on knowledge discovery and data mining, pages 785\u2013794. ACM,\n2016.\n\n[13] Minhao Cheng, Thong Le, Pin-Yu Chen, Jinfeng Yi, Huan Zhang, and Cho-Jui Hsieh. Query-ef\ufb01cient\n\nhard-label black-box attack: An optimization-based approach. In ICLR, 2019.\n\n[14] Gil Einziger, Maayan Goldstein, Yaniv Sa\u2019ar, and Itai Segall. Verifying robustness of gradient boosted\n\nmodels. In AAAI, 2019.\n\n[15] Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash,\nTadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning models. arXiv preprint\narXiv:1707.08945, 2017.\n\n[16] Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin\nVechev. Ai2: Safety and robustness certi\ufb01cation of neural networks with abstract interpretation. In 2018\nIEEE Symposium on Security and Privacy (SP), pages 3\u201318. IEEE, 2018.\n\n[17] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.\n\nIn International Conference on Learning Representations, 2015.\n\n[18] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited\n\nqueries and information. In International Conference on Machine Learning, pages 2142\u20132151, 2018.\n\n[19] Kyle D Julian, Shivam Sharma, Jean-Baptiste Jeannin, and Mykel J Kochenderfer. Verifying aircraft\ncollision avoidance neural networks through linear approximations of safe regions. arXiv preprint\narXiv:1903.00762, 2019.\n\n[20] Alex Kantchelian, JD Tygar, and Anthony Joseph. Evasion and hardening of tree ensemble classi\ufb01ers. In\n\nInternational Conference on Machine Learning, pages 2387\u20132396, 2016.\n\n10\n\n\f[21] Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: An ef\ufb01cient smt\nsolver for verifying deep neural networks. In International Conference on Computer Aided Veri\ufb01cation,\npages 97\u2013117. Springer, 2017.\n\n[22] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan\nLiu. Lightgbm: A highly ef\ufb01cient gradient boosting decision tree. In Advances in Neural Information\nProcessing Systems, pages 3146\u20133154, 2017.\n\n[23] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint\n\narXiv:1611.01236, 2016.\n\n[24] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and\n\nblack-box attacks. arXiv preprint arXiv:1611.02770, 2016.\n\n[25] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards\n\ndeep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.\n\n[26] Mohammad Mirghorbani and P Krokhmal. On \ufb01nding k-cliques in k-partite graphs. Optimization Letters,\n\n7(6):1155\u20131165, 2013.\n\n[27] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from\n\nphenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.\n\n[28] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami.\nPractical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference\non computer and communications security, pages 506\u2013519. ACM, 2017.\n\n[29] Charles A Phillips, Kai Wang, Erich J Baker, Jason A Bubier, Elissa J Chesler, and Michael A Langston.\nOn \ufb01nding and enumerating maximal and maximum k-partite cliques in k-partite graphs. Algorithms,\n12(1):23, 2019.\n\n[30] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High-precision model-agnostic\n\nexplanations. In Thirty-Second AAAI Conference on Arti\ufb01cial Intelligence, 2018.\n\n[31] Fred S Roberts. On the boxicity and cubicity of a graph. Recent Progresses in Combinatorics, pages\n\n301\u2013310, 1969.\n\n[32] Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex relaxation barrier\n\nto tight robustness veri\ufb01cation of neural networks. arXiv preprint arXiv:1902.08722, 2019.\n\n[33] Naoto Sato, Hironobu Kuruma, Yuichiroh Nakagawa, and Hideto Ogawa. Formal veri\ufb01cation of decision-\ntree ensemble model and detection of its violating-input-value ranges. arXiv preprint arXiv:1904.11753,\n2019.\n\n[34] Markus Schneider and Burkhard Wulfhorst. Cliques in k-partite graphs and their application in textile\n\nengineering. 2002.\n\n[35] Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus P\u00fcschel, and Martin Vechev. Fast and effective\nrobustness certi\ufb01cation. In Advances in Neural Information Processing Systems, pages 10802\u201310813,\n2018.\n\n[36] Gagandeep Singh, Timon Gehr, Markus P\u00fcschel, and Martin Vechev. An abstract domain for certifying\n\nneural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019.\n\n[37] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and\n\nRob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.\n\n[38] Vincent Tjeng, Kai Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer\n\nprogramming. arXiv preprint arXiv:1711.07356, 2017.\n\n[39] John T\u00f6rnblom and Simin Nadjm-Tehrani. Formal veri\ufb01cation of input-output mappings of tree ensembles.\n\narXiv preprint arXiv:1905.04194, 2019.\n\n[40] Jonathan Uesato, Brendan O\u2019Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk and\n\nthe dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666, 2018.\n\n[41] Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of formally\n\nrobust neural networks. arXiv preprint arXiv:1811.02625, 2018.\n\n11\n\n\f[42] Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning,\nand Inderjit Dhillon. Towards fast computation of certi\ufb01ed robustness for relu networks. In International\nConference on Machine Learning, pages 5273\u20135282, 2018.\n\n[43] Eric Wong and J Zico Kolter. Provable defenses against adversarial examples via the convex outer\n\nadversarial polytope. In International Conference on Machine Learning, 2018.\n\n[44] Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial defenses.\n\nIn Advances in Neural Information Processing Systems, pages 8400\u20138409, 2018.\n\n[45] Kaidi Xu, Sijia Liu, Pu Zhao, Pin-Yu Chen, Huan Zhang, Quanfu Fan, Deniz Erdogmus, Yanzhi Wang,\nand Xue Lin. Structured adversarial attack: Towards general implementation and better interpretability.\nICLR, 2019.\n\n[46] Huan Zhang, Si Si, and Cho-Jui Hsieh. GPU-acceleration for large-scale tree boosting. SysML Conference,\n\n2018.\n\n[47] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Ef\ufb01cient neural network\nrobustness certi\ufb01cation with general activation functions. In Advances in Neural Information Processing\nSystems, pages 4939\u20134948, 2018.\n\n12\n\n\f", "award": [], "sourceid": 6665, "authors": [{"given_name": "Hongge", "family_name": "Chen", "institution": "MIT"}, {"given_name": "Huan", "family_name": "Zhang", "institution": "UCLA"}, {"given_name": "Si", "family_name": "Si", "institution": "Google Research"}, {"given_name": "Yang", "family_name": "Li", "institution": "Google"}, {"given_name": "Duane", "family_name": "Boning", "institution": "Massachusetts Institute of Technology"}, {"given_name": "Cho-Jui", "family_name": "Hsieh", "institution": "UCLA"}]}