{"title": "Learning to Search in Branch and Bound Algorithms", "book": "Advances in Neural Information Processing Systems", "page_first": 3293, "page_last": 3301, "abstract": "Branch-and-bound is a widely used method in combinatorial optimization, including mixed integer programming, structured prediction and MAP inference. While most work has been focused on developing problem-specific techniques, little is known about how to systematically design the node searching strategy on a branch-and-bound tree. We address the key challenge of learning an adaptive node searching order for any class of problem solvable by branch-and-bound. Our strategies are learned by imitation learning. We apply our algorithm to linear programming based branch-and-bound for solving mixed integer programs (MIP). We compare our method with one of the fastest open-source solvers, SCIP; and a very efficient commercial solver, Gurobi. We demonstrate that our approach achieves better solutions faster on four MIP libraries.", "full_text": "Learning to Search in Branch-and-Bound Algorithms\u21e4\n\nHe He Hal Daum\u00b4e III\n\nDepartment of Computer Science\n\nUniversity of Maryland\nCollege Park, MD 20740\n{hhe,hal}@cs.umd.edu\n\nJason Eisner\n\nDepartment of Computer Science\n\nJohns Hopkins University\n\nBaltimore, MD 21218\njason@cs.jhu.edu\n\nAbstract\n\nBranch-and-bound is a widely used method in combinatorial optimization, in-\ncluding mixed integer programming, structured prediction and MAP inference.\nWhile most work has been focused on developing problem-speci\ufb01c techniques,\nlittle is known about how to systematically design the node searching strategy\non a branch-and-bound tree. We address the key challenge of learning an adap-\ntive node searching order for any class of problem solvable by branch-and-bound.\nOur strategies are learned by imitation learning. We apply our algorithm to linear\nprogramming based branch-and-bound for solving mixed integer programs (MIP).\nWe compare our method with one of the fastest open-source solvers, SCIP; and\na very ef\ufb01cient commercial solver, Gurobi. We demonstrate that our approach\nachieves better solutions faster on four MIP libraries.\n\n1\n\nIntroduction\n\nBranch-and-bound (B&B) [1] is a systematic enumerative method for global optimization of non-\nconvex and combinatorial problems. In the machine learning community, B&B has been used as an\ninference tool in MAP estimation [2, 3]. In applied domains, it has been applied to the \u201cinference\u201d\nstage of structured prediction problems (e.g., dependency parsing [4, 5], scene understanding [6],\nancestral sequence reconstruction [7]). B&B recursively divides the feasible set of a problem into\ndisjoint subsets, organized in a tree structure, where each node represents a subproblem that searches\nonly the subset at that node. If computing bounds on a subproblem does not rule out the possibility\nthat its subset contains the optimal solution, the subset can be further partitioned (\u201cbranched\u201d) as\nneeded. A crucial question in B&B is how to specify the order in which nodes are considered. An\neffective node ordering strategy guides the search to promising areas in the tree and improves the\nchance of quickly \ufb01nding a good incumbent solution, which can be used to rule out other nodes.\nUnfortunately, no theoretically guaranteed general solution for node ordering is currently known.\nInstead of designing node ordering heuristics manually for each problem type, we propose to speed\nup B&B search by automatically learning search heuristics that are adapted to a family of problems.\n\n\u2022 Non-problem-dependent learning. While our approach learns problem-speci\ufb01c policies,\nit can be applied to any family of problems solvable by the B&B framework. We use\nimitation learning to automatically learn the heuristics, free of the trial-and-error tuning\nand rule design by domain experts in most B&B algorithms.\n\n\u2022 Dynamic decision-making. Our decision-making process is adaptive on three scales. First,\nit learns different strategies for different problem types. Second, within a problem type, it\ncan evaluate the hardness of a problem instance based on features describing the solving\nprogress. Third, within a problem instance, it adapts the searching strategy to different\nlevels of the B&B tree and makes decisions based on node-speci\ufb01c features.\n\n\u21e4This material is based upon work supported by the National Science Foundation under Grant No. 0964681.\n\n1\n\n\ftraining examples:\n\n+\u221e\n\u221213/3\n\n<\n\nprune:\n\n+\u221e\n\u221216/3\n\n+\u221e\n\u221213/3\n\n\u22123\n\u22123\n\nx = 5/3#\ny = 1\nx \u2264 1\n\nub = \u22123#\nlb = \u221216/3\n\nx = 1#\ny = 1\n\nx \u2265 2\n\nINF\n\n+\u221e\n\u221213/2\n\nub = +\u221e#\nlb = \u221216/3\n\ny \u2264 1\n\n+\u221e\n\u221213/3\n\nx = 5/2#\ny = 3/2\n\ny \u2265 2\n\nx \u2264 1\n\n+\u221e\n\u201322/5\n\nub = \u22124#\nlb = \u22124\n\nx = 5/3#\ny = 2\nx \u2265 2\n\nINF\n\n+\u221e\n\u221216/3\n\nub = \u22123#\nlb = \u221222/5\nx = 1#\ny = 12/5\n\ny \u2265 3\n\n\u22123\n\u22123\n\nx = 0#\ny = 3\n\ny \u2264 2\n\n\u22124\n\u22124\n\nx = 1#\ny = 2\n\n \n\nnode expansion \norder\nglobal lower and \nupper bound\noptimal node\n\nfathomed node\n\nmin \u22122x \u2212 y#\ns.t. 3x \u2212 5y \u2264 0#\n 3x + 5y \u2264 15#\n x \u2265 0, y \u2265 0#\n x, y \u2208 Z \n\nFigure 1: Using branch-and-bound to solve an integer linear programming minimization.\n\n\u2022 Easy incorporation of heuristics. Most hand-designed strategies handle only a few heuris-\ntics, and they set weights on different heuristics by domain knowledge or manual experi-\nmentation. In our model, multiple heuristics can be simply plugged in as state features for\nthe policy, allowing a hybrid \u201cheuristic\u201d to be learned effectively.\n\nWe assume that a small set of solved problems are given at training time and the problems to be\nsolved at test time are of the same type. We learn a node selection policy and a node pruning policy\nfrom solving the training problems. The node selection policy repeatedly picks a node from the\nqueue of all unexplored nodes, and the node pruning policy decides if the popped node is worth\nexpanding. We formulate B&B search as a sequential decision-making process. We design a simple\noracle that knows the optimal solution in advance and only expands nodes containing the optimal\nsolution. We then use imitation learning to learn policies that mimic the oracle\u2019s behavior without\nperfect information; these policies must even mimic how the oracle would act in states that the ora-\ncle would not itself reach, as such states may be encountered at test time. We apply our approach to\nlinear programming (LP) based B&B for solving mixed integer linear programming (MILP) prob-\nlems, and achieve better solutions faster on 4 MILP problem libraries than Gurobi, a recent fast\ncommercial solver competitive with Cplex, and SCIP, one of the fastest open-source solvers [8].\n\n2 The Branch-and-Bound Framework: An Application in Mixed Integer\n\nLinear Programming\n\nConsider an optimization problem of minimizing f over a feasible set F, where F is usually discrete.\nB&B uses a divide and conquer strategy: F is recursively divided into its subsets F1,F2, . . . ,Fp\nsuch that F =Sp\ni=1 Fi. The recursion tree is an enumeration tree of all feasible solutions, whose\nnodes are subproblems and edges are the partition conditions. Slightly abusing notation, we will use\nFi to refer to both the subset and its corresponding B&B node from now on. A (convex) relaxation\nof each subproblem is solved to provide an upper/lower bound for that node and its descendants. We\ndenote the upper and lower bound at node i by `ub(Fi) and `lb(Fi) respectively where `ub and `lb\nare bounding functions.\nA common setting where B&B is ubiquitously applied is MILP. A MILP optimization problem has\nlinear objective and constraints, and also requires speci\ufb01ed variables to be integer. We assume we\nare minimizing the objective function in MILP from now on. At each node, we drop the integrality\nconstraints and solve its LP relaxation. We present a concrete example in Figure 1. The optimization\nproblem is shown in the lower right corner. At node i, a local lower bound (shown in lower half of\neach circle) is found by the LP solver. A local upper bound (shown in upper part of the circle) is\navailable if a feasible solution is found at this node. We automatically get an upper bound if the LP\nsolution happens to be integer feasible, or we may obtain it by heuristics.\nB&B maintains a queue L of active nodes, starting with a single root node on it. At each step,\nwe pop a node Fi from L using a node selection strategy, and compute its bounds. A node Fi\n\n2\n\n\froot\n\n(problem)\n\nrank\nnodes \n\npop\n\nfathom?\nNo\n\nNo\n\nYes\n\nYes\n\nsolution\n\npush\nchildren\n\nqueue\nempty?\n\nYes\nprune?\n\nNo\n\nAlgorithm 1 Policy Learning (\u21e1\u21e4S, \u21e1\u21e4P )\n\nS = \u21e1\u21e4S,DS = {},DP = {}\n\n\u21e1(1)\nP = \u21e1\u21e4P ,\u21e1 (1)\nfor k = 1 to N do\nfor Q in problem set Q do\nP COLLECTEXAMPLE(Q, \u21e1(k)\nD(Q)\nS ,D(Q)\nDS D S [D (Q)\n,\u21e1 (k+1)\n\u21e1(k+1)\nP train classi\ufb01ers using DS and DP\nS\nS ,\u21e1 (k)\nreturn Best \u21e1(k)\n\nS , DP D P [D (Q)\nP on dev set\n\nP , \u21e1(k)\nS )\n\nP\n\nFigure 2: Our method at runtime (left) and the policy learning algorithm (right). Left: our\npolicy-guided branch-and-bound search. Procedures in the rounded rectangles (shown in blue) are\nexecuted by policies. Right: the DAgger learning algorithm. We start by using oracle policies \u21e1\u21e4S\nand \u21e1\u21e4P to solve problems in Q and collect examples along oracle trajectories. In each iteration,\nwe retrain our policies on all examples collected so far (training sets DD and DS), then collect\nadditional examples by running the newly learned policies. The COLLECTEXAMPLE procedure is\ndescribed in Algorithm 2.\n\nis fathomed (i.e., no further exploration in its subtree) if one of the following cases is true:\n(a)\n`lb(Fi) is larger than the current global upper bound, which means all solutions in its subtree can\nnot possibly be better than the incumbent; (b) `lb(Fi) = `ub(Fi); at this point, B&B has found the\nbest solution in the current subtree; (c) The subproblem is infeasible. In Figure 1, fathomed nodes\nare shown in double circles and infeasible nodes are labeled by \u201cINF\u201d.\nIf a node is not fathomed, it is branched into children of Fi that are pushed onto L. Branching\nconditions are shown next to each edge in Figure 1. The algorithm terminates when L is empty or\nthe gap between the global upper bound and lower bound achieves a speci\ufb01ed tolerance level. In the\nexample in Figure 1, we follow a DFS order. Starting from the root node, the blue arrows points to\nthe next node popped from L to be branched. Updated global lower and upper bounds after a node\nexpansion is shown on the board under each branched node.\n\n3 Learning Control Policies for Branch-and-Bound\n\nA good search strategy should \ufb01nd a good incumbent solution early and identify non-promising\nnodes before they are expanded. However, naively applying a single heuristic through the whole\nprocess ignores the dynamic structure of the B&B tree. For example, DFS should only be used at\nnodes that promise to lead to a good feasible solution that may replace the incumbent. Best-bound-\n\ufb01rst search can quickly discard unpromising nodes, but should not be used frequently at the top\nlevels of the tree since the bound estimate is not accurate enough yet. Therefore, we propose to\nlearn policies adaptive to different problem types and different solving stages.\nThere are two goals in a B&B search: \ufb01nding the optimal solution and proving its optimality. There\nis a trade-off between the two goals: we may be able to return the optimal solution faster if we do\nnot invest the time to prove that all other solutions are worse. Thus, we will aim only to search for\na \u201cgood\u201d (possibly optimal) solution without a rigorous proof of optimality. This allows us to prune\nunpromising portions of the search tree more aggressively. In addition, obtaining a certi\ufb01cate of\noptimality is usually of secondary priority for practical purposes.\nWe assume the branching strategy and the bounding functions are given. We guide search on the\nenumeration tree by two policies. Recall that B&B maintains a priority queue of all nodes to be\nexpanded. The node selection policy determines the priorities used. Once the highest-priority node\nis popped, the node pruning policy decides whether to discard or expand it given the current progress\nof the solver. This process continues iteratively until the tree is empty or the gap reaches some\nspeci\ufb01ed tolerance. All other techniques used during usual branch-and-bound search can still be\napplied with our method. The process is shown in Figure 3.\n\n3\n\n\fOracle. Imitation learning requires an oracle at training time to demonstrate the desired behavior.\nOur ideal oracle would expand nodes in an order that minimized the number of node expansions\nsubject to \ufb01nding the optimal solution. In real branch-and-bound systems, however, the optimal\nsequence of expanded nodes cannot be obtained without substantial computation. After all, the effect\nof expanding one node depends not only on local information such as the local bounds it obtains,\nbut also on how many pruned nodes it may lead to and many other interacting strategies such as\nbranching variable selection. Therefore, given our single goal of \ufb01nding a good solution quickly, we\ndesign an oracle that \ufb01nds the optimal solution without a proof of optimality. We assume optimal\nsolutions are given for training problems.1 Our node selection oracle \u21e1\u21e4S will always expand the\nnode whose feasible set contains the optimal solution. We call such a node an optimal node. For\nexample, in Figure 1, the oracle knows beforehand that the optimal solution is x = 1, y = 2, thus it\nwill only search along edges y 2 and x \uf8ff 1; the optimal nodes are shown in red circles. All other\nnon-optimal nodes are fathomed by the node pruning oracle \u21e1\u21e4P , if not already fathomed by standard\nrules discussed in Section 2. We denote the optimal node at depth d by F\u21e4d where d 2 [0, D] and F\u21e40\nis the root node.\nImitation Learning. We formulate the above approach as a sequential decision-making process,\nde\ufb01ned by a state space S, an action space A and a policy space \u21e7. A trajectory consists of a\nsequence of states s1, s2, . . . , sT and actions a1, a2, . . . , aT . A policy \u21e1 2 \u21e7 maps a state to an\naction: \u21e1(st) = at.\nIn our B&B setting, S is the whole tree of nodes visited so far, with the\nbounds computed at these nodes. The node selection policy \u21e1S has an action space {select node\nFi : Fi 2 queue of active nodes}, which depends on the current state st. The node pruning policy\n\u21e1P is a binary classi\ufb01er that predicts a class in {prune, expand}, given st and the most recently\nselected node (the policy is only applied when this node was not fathomed). At training time, the\noracle provides an optimal action a\u21e4 for any possible state s 2S . Our goal is to learn a policy that\nmimics the oracle\u2019s actions along the trajectory of states encountered by the policy. Let : Fi ! Rp\nand : Fi ! Rq be feature maps for \u21e1S and \u21e1P respectively. The imitation problem can be reduced\nto supervised learning [9, 10, 11]: the policy (classi\ufb01er/regressor) takes a feature-vector description\nof the state st and attempts to predict the oracle action a\u21e4t .\nA generic node selection policy assigns a score to each active node and pops the highest-scoring\none. For example, DFS uses a node\u2019s depth as its score; best-bound-\ufb01rst search uses a node\u2019s\nlower bound as its score. Following this scheme, we de\ufb01ne the score of a node i as wT (Fi) and\n\u21e1S(st) = select node arg maxFi2L wT (Fi), where w is a learned weight vector and L is the\nqueue of active nodes. We obtain w by learning a linear ranking function that de\ufb01nes a total order\non the set of nodes on the priority queue: wT ((Fi) (Fi0)) > 0 if Fi > Fi0. During training,\nwe only specify the order between optimal nodes and non-optimal nodes.\nHowever, at test time,\na total order is obtained by the classi\ufb01er\u2019s automatic generalization: non-optimal nodes close to\noptimal nodes in the feature space will be ranked higher.\nDAgger is an iterative imitation learning algorithm. It repeatedly retrains the policy to make deci-\nsions that agree better with the oracle\u2019s decisions, in those situations that were encountered when\nrunning past versions of the policy. Thus, it learns to deal well with a realistic distribution of situ-\nations that may actually arise at test time. Our training algorithm is shown in Algorithm 1. Algo-\nrithm 2 illustrates how we collect examples during B&B. In words, when pushing an optimal node\nto the queue, we want it ranked higher than all nodes currently on the queue; when pushing a non-\noptimal node, we want it ranked lower than the optimal node on the queue if there is one (note that\nat any time there can be at most one optimal node on the queue); when popping a node from the\nqueue, we want it pruned if it is not optimal. In the left part of Figure 1, we show training examples\ncollected from the oracle policy.\n\n4 Analysis\n\nWe show that our method has the following upper bound on the expected number of branches.\nTheorem 1. Given a node selection policy which ranks some non-optimal node higher than an\noptimal node with probability \u270f , a node pruning policy which expands a non-optimal node with\nprobability \u270f1 and prunes an optimal node with probablity \u270f2, assuming \u270f, \u270f1,\u270f 2 2 [0, 0.5] under the\n1For prediction tasks, the optimal solutions usually come for free in the training set; otherwise, an off-the-\n\nshelf solver can be used.\n\n4\n\n\fAlgorithm 2 Running B&B policies and collect example for problem Q\n\n0\n\nP D (Q)\n\n}, training set D(Q)\n\nprocedure COLLECTEXAMPLE(Q, \u21e1S, \u21e1P )\nL = {F (Q)\nS = {}, D(Q)\nwhile L6 = ; do\nF (Q)\nk \u21e1S pops a node from L,\nif F (Q)\nis optimal then D(Q)\nP [n\u21e3 (F (Q)\nelse D(Q)\nP D (Q)\nis not fathomed and \u21e1P (F (Q)\nif F (Q)\ni+2 expand F (Q)\nF (Q)\ni+1 , F (Q)\nif an optimal node F\u21e4(A)\n2L then\nS [n\u21e3(F\u21e4(Q)\nS D (Q)\nD(Q)\nreturn D(Q)\nS , D(Q)\n\nP = {}, i 0\nP [n\u21e3 (F (Q)\n), prune\u2318o\n, L L [ {F (Q)\n) (F (Q)\n\n) = expand then\ni+1 , F (Q)\n), 1\u2318 : F (Q)\n\ni0\n\nP\n\nk\n\nk\n\nk\n\nk\n\nd\n\nk\n\nk\n\n), expand\u2318o\n\nd\n\ni+2}, i i + 2\ni0 2L and F (Q)\n\ni0\n\n6= F (Q)\u21e4\n\nd\n\no\n\npolicy\u2019s state distribution, we have\n\n12\u270f\u270f1\n\n+ \u270f2\n\nDXd=0\n\n+ 1! D,\n\n(1 \u270f2)d + (1 \u270f2)D+1 (1 \u270f)\u270f1\n1 2\u270f1\n\nexpected number of branches \uf8ff (\u270f, \u270f1,\u270f 2)\n12\u270f1\u2318 \u270f\u270f1.\nwhere (\u270f, \u270f1,\u270f 2) =\u21e3 1\u270f2\nLet the optimal node at depth d be F\u21e4d . Note that at each push step, there is at most one optimal\nnode on the queue. Consider a queue having one optimal node F\u21e4d and m non-optimal nodes ranked\nbefore the optimal one. The following lemma is useful in our proof:\nLemma 1. The average number of pops before we get to F\u21e4d is m\n, among which the number\n12\u270f\u270f1\n, and the number of non-optimal nodes pushed after F\u21e4d is\nof branches is NB(m, opt) = m\u270f1\n12\u270f\u270f1\n, where opt indicates the situation\nNpush(m, opt) = m\u270f1\nwhere one optimal node is on the queue.\n\n12\u270f\u270f1\u21e52(1 \u270f)2 + 2\u270f(1 \u270f)\u21e4 = 2m\u270f1(1\u270f)\n\n12\u270f\u270f1\n\nConsider a queue having no optimal node and m non-optimal nodes, which means an optimal inter-\nnal node has been pruned or the optimal leaf has been found. We have\nLemma 2. The average number of pops to empty the queue is m\n12\u270f1\nbranches is NB(m, opt) = m\u270f1\n12\u270f1\nthe queue.\n\n, among which the number of\n, where opt indicates the situation where no optimal node is on\n\nProofs of the above two lemmas are given in Appendix A.\nLet T (Md,F\u21e4d ) denote the number of branches until the queue is empty, after pushing F\u21e4d to a\nqueue with Md nodes. The total number of branches during the B&B process is T (0,F\u21e40 ). When\npushing F\u21e4d , we compare it with all M nodes on the queue, and the number of non-optimal nodes\nranked before it follows a binomial distribution md \u21e0 Bin(\u270f, Md). We then have the following two\ncases: (a) F\u21e4d will be pruned with probability \u270f2: the expected number of branches is NB(md, opt);\n(b) F\u21e4d will not be pruned with probability 1 \u270f2: we \ufb01rst pop all nodes before F\u21e4d , resulting in\nNpush(md, opt) new nodes after it; we then expand F\u21e4d , get F\u21e4d+1, and push it on a queue with\nMd+1 = Npush(md, opt) + Md md + 1 nodes. Thus the total expected number of branches is\nNB(md, opt) + T (Md+1,F\u21e4d+1).\nThe recursion equation is\nT (Md,F\u21e4d )=Emd\u21e0Bin(\u270f,Md)\u21e5(1\u270f2)NB(md, opt)+1+T (Md+1,F\u21e4d+1)+\u270f2NB(Md, opt)\u21e4 .\nT (MD,F\u21e4D)=EmD\u21e0Bin(\u270f,MD)\u21e5(1\u270f2)NB(mD, opt)+NB(MDmD, opt)+\u270f2NB(MD, opt)\u21e4 .\n\nAt termination, we have\n\n5\n\n\fNote that we ignore node fathoming in this recursion. The path of optimal nodes may stop at F\u21e4d\nwhere d