{"title": "ARA*: Anytime A* with Provable Bounds on Sub-Optimality", "book": "Advances in Neural Information Processing Systems", "page_first": 767, "page_last": 774, "abstract": "", "full_text": "ARA*: Anytime A* with Provable Bounds on\n\nSub-Optimality\n\nMaxim Likhachev, Geoff Gordon and Sebastian Thrun\n\nSchool of Computer Science\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\n{maxim+, ggordon, thrun}@cs.cmu.edu\n\nAbstract\n\nIn real world planning problems, time for deliberation is often limited.\nAnytime planners are well suited for these problems: they \ufb01nd a feasi-\nble solution quickly and then continually work on improving it until time\nruns out. In this paper we propose an anytime heuristic search, ARA*,\nwhich tunes its performance bound based on available search time. It\nstarts by \ufb01nding a suboptimal solution quickly using a loose bound, then\ntightens the bound progressively as time allows. Given enough time it\n\ufb01nds a provably optimal solution. While improving its bound, ARA*\nreuses previous search efforts and, as a result, is signi\ufb01cantly more ef\ufb01-\ncient than other anytime search methods. In addition to our theoretical\nanalysis, we demonstrate the practical utility of ARA* with experiments\non a simulated robot kinematic arm and a dynamic path planning prob-\nlem for an outdoor rover.\n\n1 Introduction\nOptimal search is often infeasible for real world problems, as we are given a limited amount\nof time for deliberation and want to \ufb01nd the best solution given the time provided.\nIn\nthese conditions anytime algorithms [9, 2] prove to be useful as they usually \ufb01nd a \ufb01rst,\npossibly highly suboptimal, solution very fast and then continually work on improving\nthe solution until allocated time expires. Unfortunately, they can rarely provide bounds\non the sub-optimality of their solutions unless the cost of an optimal solution is already\nknown. Even less often can these algorithms control their sub-optimality. Providing sub-\noptimality bounds is valuable, though: it allows one to judge the quality of the current\nplan, decide whether to continue or preempt search based on the current sub-optimality, and\nevaluate the quality of past planning episodes and allocate time for future planning episodes\naccordingly. Control over the sub-optimality bounds helps in adjusting the tradeoff between\ncomputation and plan quality.\n\nA* search with in\ufb02ated heuristics (actual heuristic values are multiplied by an in\ufb02ation\nfactor \u0001 > 1) is sub-optimal but proves to be fast for many domains [1, 5, 8] and also pro-\nvides a bound on the sub-optimality, namely, the \u0001 by which the heuristic is in\ufb02ated [7].\nTo construct an anytime algorithm with sub-optimality bounds one could run a succession\nof these A* searches with decreasing in\ufb02ation factors. This naive approach results in a se-\nries of solutions, each one with a sub-optimality factor equal to the corresponding in\ufb02ation\n\n\ffactor. This approach has control over the sub-optimality bound, but wastes a lot of com-\nputation since each search iteration duplicates most of the efforts of the previous searches.\nOne could try to employ incremental heuristic searches (e.g., [4]), but the sub-optimality\nbounds for each search iteration would no longer be guaranteed.\n\nTo this end we propose the ARA* (Anytime Repairing A*) algorithm, which is an\nef\ufb01cient anytime heuristic search that also runs A* with in\ufb02ated heuristics in succession\nbut reuses search efforts from previous executions in such a way that the sub-optimality\nbounds are still satis\ufb01ed. As a result, a substantial speedup is achieved by not re-computing\nthe state values that have been correctly computed in the previous iterations. We show the\nef\ufb01ciency of ARA* on two different domains. An evaluation of ARA* on a simulated robot\nkinematic arm with six degrees of freedom shows up to 6-fold speedup over the succession\nof A* searches. We also demonstrate ARA* on the problem of planning a path for a mobile\nrobot that takes into account the robot\u2019s dynamics.\n\nThe only other anytime heuristic search known to us is Anytime A*, described in [8]. It\nalso \ufb01rst executes an A* with in\ufb02ated heuristics and then continues to improve a solution.\nHowever, the algorithm does not have control over its sub-optimality bound, except by\nselecting the in\ufb02ation factor of the \ufb01rst search. Our experiments show that ARA* is able\nto decrease its bounds much more gradually and, moreover, does so signi\ufb01cantly faster.\nAnother advantage of ARA* is that it guarantees to examine each state at most once during\nits \ufb01rst search, unlike the algorithm of [8]. This property is important because it provides\na bound on the amount of time before ARA* produces its \ufb01rst plan. Nevertheless, as\nmentioned later, [8] describes a number of very interesting ideas that are also applicable to\nARA*.\n\n2 The ARA* Algorithm\n\n2.1 A* with Weighted Heuristic\nNormally, A* takes as input a heuristic h(s) which must be consistent. That is, h(s) \u2264\nc(s, s0) + h(s0) for any successor s0 of s if s 6= sgoal and h(s) = 0 if s = sgoal. Here\nc(s, s0) denotes the cost of an edge from s to s0 and has to be positive. Consistency, in\nits turn, guarantees that the heuristic is admissible: h(s) is never larger than the true cost\nIn\ufb02ating the heuristic (that is, using \u0001 \u2217 h(s) for \u0001 > 1)\nof reaching the goal from s.\noften results in much fewer state expansions and consequently faster searches. However,\nin\ufb02ating the heuristic may also violate the admissibility property, and as a result, a solution\nis no longer guaranteed to be optimal. The pseudocode of A* with in\ufb02ated heuristic is\ngiven in Figure 1 for easy comparison with our algorithm, ARA*, presented later.\nA* maintains two functions from states to real numbers: g(s) is the cost of the current\npath from the start node to s (it is assumed to be \u221e if no path to s has been found yet), and\nf(s) = g(s)+\u0001\u2217h(s) is an estimate of the total distance from start to goal going through s.\nA* also maintains a priority queue, OPEN, of states which it plans to expand. The OPEN\nqueue is sorted by f(s), so that A* always expands next the state which appears to be on\nthe shortest path from start to goal. A* initializes the OPEN list with the start state, sstart\n(line 02). Each time it expands a state s (lines 04-11), it removes s from OPEN. It then\nupdates the g-values of all of s\u2019s neighbors; if it decreases g(s0), it inserts s0 into OPEN.\nA* terminates as soon as the goal state is expanded.\n01 g(sstart) = 0; OPEN = \u2205;\n02 insert sstart into OPEN with f (sstart) = \u0001 \u2217 h(sstart);\n03 while(sgoal is not expanded)\nremove s with the smallest f -value from OPEN;\n04\nfor each successor s0 of s\n05\nif s0 was not visited before then\n06\nf (s0) = g(s0) = \u221e;\n07\nif g(s0) > g(s) + c(s, s0)\n08\ng(s0) = g(s) + c(s, s0);\n09\nf (s0) = g(s0) + \u0001 \u2217 h(s0);\n10\ninsert s0 into OPEN with f (s0);\n11\n\nFigure 1: A* with heuristic weighted by \u0001 \u2265 1\n\n\f\u0001 = 2.5\n\n\u0001 = 1.5\n\n\u0001 = 1.0\n\n\u0001 = 2.5\n\n\u0001 = 1.5\n\n\u0001 = 1.0\n\nFigure 2: Left three columns: A* searches with decreasing \u0001. Right three columns: the corresponding\nARA* search iterations.\n\nSetting \u0001 to 1 results in standard A* with an unin\ufb02ated heuristic; the resulting solution\nis guaranteed to be optimal. For \u0001 > 1 a solution can be sub-optimal, but the sub-optimality\nis bounded by a factor of \u0001: the length of the found solution is no larger than \u0001 times the\nlength of the optimal solution [7].\n\nThe left three columns in Figure 2 show the operation of the A* algorithm with a\nheuristic in\ufb02ated by \u0001 = 2.5, \u0001 = 1.5, and \u0001 = 1 (no in\ufb02ation) on a simple grid world. In\nthis example we use an eight-connected grid with black cells being obstacles. S denotes a\nstart state, while G denotes a goal state. The cost of moving from one cell to its neighbor\nis one. The heuristic is the larger of the x and y distances from the cell to the goal. The\ncells which were expanded are shown in grey. (A* can stop search as soon as it is about\nto expand a goal state without actually expanding it. Thus, the goal state is not shown in\ngrey.) The paths found by these searches are shown with grey arrows. The A* searches with\nin\ufb02ated heuristics expand substantially fewer cells than A* with \u0001 = 1, but their solution is\nsub-optimal.\n\n2.2 ARA*: Reuse of Search Results\nARA* works by executing A* multiple times, starting with a large \u0001 and decreasing \u0001 prior\nto each execution until \u0001 = 1. As a result, after each search a solution is guaranteed to be\nwithin a factor \u0001 of optimal. Running A* search from scratch every time we decrease \u0001,\nhowever, would be very expensive. We will now explain how ARA* reuses the results of\nthe previous searches to save computation. We \ufb01rst explain the ImprovePath function (left\ncolumn in Figure 3) that recomputes a path for a given \u0001. In the next section we explain the\nMain function of ARA* (right column in Figure 3) that repetitively calls the ImprovePath\nfunction with a series of decreasing \u0001s.\n\nLet us \ufb01rst introduce a notion of local inconsistency (we borrow this term from [4]). A\nstate is called locally inconsistent every time its g-value is decreased (line 09, Figure 1) and\nuntil the next time the state is expanded. That is, suppose that state s is the best predecessor\nfor some state s0: that is, g(s0) = mins00\u2208pred(s0)(g(s00)+c(s00, s0)) = g(s)+c(s, s0). Then,\nif g(s) decreases we get g(s0) > mins00\u2208pred(s0)(g(s00) + c(s00, s0)). In other words, the\ndecrease in g(s) introduces a local inconsistency between the g-value of s and the g-values\nof its successors. Whenever s is expanded, on the other hand, the inconsistency of s is\ncorrected by re-evaluating the g-values of the successors of s (line 08-09, Figure 1). This\nin turn makes the successors of s locally inconsistent. In this way the local inconsistency\nis propagated to the children of s via a series of expansions. Eventually the children no\nlonger rely on s, none of their g-values are lowered, and none of them are inserted into\nthe OPEN list. Given this de\ufb01nition of local inconsistency it is clear that the OPEN list\nconsists of exactly all locally inconsistent states: every time a g-value is lowered the state\nis inserted into OPEN, and every time a state is expanded it is removed from OPEN until\nthe next time its g-value is lowered. Thus, the OPEN list can be viewed as a set of states\nfrom which we need to propagate local inconsistency.\n\nA* with a consistent heuristic is guaranteed not to expand any state more than once.\nSetting \u0001 > 1, however, may violate consistency, and as a result A* search may re-expand\nstates multiple times. It turns out that if we restrict each state to be expanded no more\nthan once, then the sub-optimality bound of \u0001 still holds. To implement this restriction we\ncheck any state whose g-value is lowered and insert it into OPEN only if it has not been\npreviously expanded (line 10, Figure 3). The set of expanded states is maintained in the\nCLOSED variable.\n\n\fprocedure fvalue(s)\n01 return g(s) + \u0001 \u2217 h(s);\n\nprocedure ImprovePath()\n02 while(fvalue(sgoal) > mins\u2208OPEN(fvalue(s)))\nremove s with the smallest fvalue(s) from OPEN;\n03\n04 CLOSED = CLOSED \u222a {s};\nfor each successor s0 of s\n05\nif s0 was not visited before then\n06\ng(s0) = \u221e;\n07\nif g(s0) > g(s) + c(s, s0)\n08\ng(s0) = g(s) + c(s, s0);\n09\nif s0 6\u2208 CLOSED\n10\ninsert s0 into OPEN with fvalue(s0);\n11\nelse\n12\ninsert s0 into INCONS;\n13\n\nprocedure Main()\n01\u2019 g(sgoal) = \u221e; g(sstart) = 0;\n02\u2019 OPEN = CLOSED = INCONS = \u2205;\n03\u2019 insert sstart into OPEN with fvalue(sstart);\n04\u2019 ImprovePath();\n05\u2019 \u00010 = min(\u0001, g(sgoal)/ mins\u2208OPEN\u222aINCONS(g(s)+h(s)));\n06\u2019 publish current \u00010-suboptimal solution;\n07\u2019 while \u00010 > 1\n08\u2019\ndecrease \u0001;\n09\u2019 Move states from INCONS into OPEN;\n10\u2019 Update the priorities for all s \u2208 OPEN according to fvalue(s);\n11\u2019 CLOSED = \u2205;\nImprovePath();\n12\u2019\n\u00010 = min(\u0001, g(sgoal)/ mins\u2208OPEN\u222aINCONS(g(s)+h(s)));\n13\u2019\npublish current \u00010-suboptimal solution;\n14\u2019\n\nFigure 3: ARA*\n\nWith this restriction we will expand each state at most once, but OPEN may no longer\ncontain all the locally inconsistent states. In fact, it will only contain the locally inconsistent\nstates that have not yet been expanded. It is important, however, to keep track of all the\nlocally inconsistent states as they will be the starting points for inconsistency propagation\nin the future search iterations. We do this by maintaining the set INCONS of all the locally\ninconsistent states that are not in OPEN (lines 12-13, Figure 3). Thus, the union of INCONS\nand OPEN is exactly the set of all locally inconsistent states, and can be used as a starting\npoint for inconsistency propagation before each new search iteration.\n\nThe only other difference between the ImprovePath function and A* is the termination\ncondition. Since the ImprovePath function reuses search efforts from the previous execu-\ntions, sgoal may never become locally inconsistent and thus may never be inserted into\nOPEN. As a result, the termination condition of A* becomes invalid. A* search, however,\ncan also stop as soon as f(sgoal) is equal to the minimal f-value among all the states on\nOPEN list. This is the condition that we use in the ImprovePath function (line 02, Fig-\nure 3). It also allows us to avoid expanding sgoal as well as possibly some other states\nwith the same f-value. (Note that ARA* no longer maintains f-values as variables since in\nbetween the calls to the ImprovePath function \u0001 is changed, and it would be prohibitively\nexpensive to update the f-values of all the states. Instead, the fvalue(s) function is called\nto compute and return the f-values only for the states in OPEN and sgoal.)\n\n2.3 ARA*: Iterative Execution of Searches\nWe now introduce the main function of ARA* (right column in Figure 3) which performs a\nseries of search iterations. It does initialization and then repetitively calls the ImprovePath\nfunction with a series of decreasing \u0001s. Before each call to the ImprovePath function a\nnew OPEN list is constructed by moving into it the contents of the set INCONS. Since\nOPEN list has to be sorted by the current f-values of states it is also re-ordered (lines 09\u2019-\n10\u2019, Figure 3). Thus, after each call to the ImprovePath function we get a solution that is\nsub-optimal by at most a factor of \u0001.\n\nAs suggested in [8] a sub-optimality bound can also be computed as the ratio between\ng(sgoal), which gives an upper bound on the cost of an optimal solution, and the minimum\nun-weighted f-value of a locally inconsistent state, which gives a lower bound on the cost\nof an optimal solution. (This is a valid sub-optimality bound as long as the ratio is larger\nthan or equal to one. Otherwise, g(sgoal) is already equal to the cost of an optimal solution.)\nThus, the actual sub-optimality bound for ARA* is computed as the minimum between \u0001\nand the ratio (lines 05\u2019 and 13\u2019, Figure 3). At \ufb01rst, one may also think of using this actual\nsub-optimality bound in deciding how to decrease \u0001 between search iterations (e.g., setting\n\u0001 to \u00010 minus a small delta). Experiments, however, seem to suggest that decreasing \u0001 in\nsmall steps is still more bene\ufb01cial. The reason is that a small decrease in \u0001 often results\nin the improvement of the solution, despite the fact that the actual sub-optimality bound of\nthe previous solution was already substantially less than the value of \u0001. A large decrease in\n\u0001, on the other hand, may often result in the expansion of too many states during the next\nsearch. (Another useful suggestion from [8], which we have not implemented in ARA*, is\nto prune OPEN so that it never contains a state whose un-weighted f-value is larger than\n\n\for equal to g(sgoal).)\n\nWithin each execution of the ImprovePath function we mainly save computation by\nnot re-expanding the states which were locally consistent and whose g-values were already\ncorrect before the call to ImprovePath (Theorem 2 states this more precisely). For example,\nthe right three columns in Figure 2 show a series of calls to the ImprovePath function.\nStates that are locally inconsistent at the end of an iteration are shown with an asterisk.\nWhile the \ufb01rst call (\u0001 = 2.5) is identical to the A* call with the same \u0001, the second call\nto the ImprovePath function (\u0001 = 1.5) expands only 1 cell. This is in contrast to 15 cells\nexpanded by A* search with the same \u0001. For both searches the sub-optimality factor, \u0001,\ndecreases from 2.5 to 1.5. Finally, the third call to the ImprovePath function with \u0001 set to\n1 expands only 9 cells. The solution is now optimal, and the total number of expansions\nis 23. Only 2 cells are expanded more than once across all three calls to the ImprovePath\nfunction. Even a single optimal search from scratch expands 20 cells.\n\n2.4 Theoretical Properties of the Algorithm\nWe now present some of the theoretical properties of ARA*. For the proofs of these and\nother properties of the algorithm please refer to [6]. We use g\u2217(s) to denote the cost of an\noptimal path from sstart to s. Let us also de\ufb01ne a greedy path from sstart to s as a path\nthat is computed by tracing it backward as follows: start at s, and at any state si pick a state\nsi\u22121 = arg mins0\u2208pred(si)(g(s0) + c(s0, si)) until si\u22121 = sstart.\nfor any state s with f(s) \u2264\nTheorem 1 Whenever the ImprovePath function exits,\nmins0\u2208OPEN(f(s0)), we have g\u2217(s) \u2264 g(s) \u2264 \u0001 \u2217 g\u2217(s), and the cost of a greedy path\nfrom sstart to s is no larger than g(s).\n\nThe correctness of ARA* follows from this theorem: each execution of the Im-\nprovePath function terminates when f(sgoal) is no larger than the minimum f-value in\nOPEN, which means that the greedy path from start to goal that we have found is within a\nfactor \u0001 of optimal. Since before each iteration \u0001 is decreased, and it, in its turn, is an upper\nbound on \u00010, ARA* gradually decreases the sub-optimality bound and \ufb01nds new solutions\nto satisfy the bound.\nTheorem 2 Within each call to ImprovePath() a state is expanded at most once and only\nif it was locally inconsistent before the call to ImprovePath() or its g-value was lowered\nduring the current execution of ImprovePath().\n\nThe second theorem formalizes where the computational savings for ARA* search\ncome from. Unlike A* search with an in\ufb02ated heuristic, each search iteration in ARA*\nis guaranteed not to expand states more than once. Moreover, it also does not expand states\nwhose g-values before a call to the ImprovePath function have already been correctly com-\nputed by some previous search iteration, unless they are in the set of locally inconsistent\nstates already and thus need to update their neighbors (propagate local inconsistency).\n\n3 Experimental Study\n3.1 Robotic Arm\nWe \ufb01rst evaluate the performance of ARA* on simulated 6 and 20 degree of freedom (DOF)\nrobotic arms (Figure 4). The base of the arm is \ufb01xed, and the task is to move its end-effector\nto the goal while navigating around obstacles (indicated by grey rectangles). An action\nis de\ufb01ned as a change of a global angle of any particular joint (i.e., the next joint further\nalong the arm rotates in the opposite direction to maintain the global angle of the remaining\njoints.) We discretitize the workspace into 50 by 50 cells and compute a distance from each\ncell to the cell containing the goal while taking into account that some cells are occupied\nby obstacles. This distance is our heuristic. In order for the heuristic not to overestimate\ntrue costs, joint angles are discretitized so as to never move the end-effector by more than\none cell in a single action. The resulting state-space is over 3 billion states for a 6 DOF\nrobot arm and over 1026 states for a 20 DOF robot arm, and memory for states is allocated\non demand.\n\n\f(a) 6D arm trajectory for \u0001 = 3\n\n(b) uniform costs\n\n(c) non-uniform costs\n\n(d) both Anytime A* and A*\nafter 90 secs, cost=682, \u00010=15.5\nFigure 4: Top row: 6D robot arm experiments. Bottom row: 20D robot arm experiments (the\ntrajectories shown are downsampled by 6). Anytime A* is the algorithm in [8].\n\nafter 90 secs, cost=657, \u00010=14.9\n\n(e) ARA*\n\n(f) non-uniform costs\n\nFigure 4a shows the planned trajectory of the robot arm after the initial search of ARA*\nwith \u0001 = 3.0. This search takes about 0.05 secs. (By comparison, a search for an optimal\ntrajectory is infeasible as it runs out of memory very quickly.) The plot in Figure 4b shows\nthat ARA* improves both the quality of the solution and the bound on its sub-optimality\nfaster and in a more gradual manner than either a succession of A* searches or Anytime\nA* [8]. In this experiment \u0001 is initially set to 3.0 for all three algorithms. For all the ex-\nperiments in this section \u0001 is decreased in steps of 0.02 (2% sub-optimality) for ARA* and\na succession of A* searches. Anytime A* does not control \u0001, and in this experiment it\napparently performs a lot of computations that result in a large decrease of \u0001 at the end. On\nthe other hand, it does reach the optimal solution \ufb01rst this way. To evaluate the expense of\nthe anytime property of ARA* we also ran ARA* and an optimal A* search in a slightly\nsimpler environment (for the optimal search to be feasible). Optimal A* search required\nabout 5.3 mins (2,202,666 state expanded) to \ufb01nd an optimal solution, while ARA* re-\nquired about 5.5 mins (2,207,178 state expanded) to decrease \u0001 in steps of 0.02 from 3.0\nuntil a provably optimal solution was found (about 4% overhead).\n\nWhile in the experiment for Figure 4b all the actions have the same cost, in the exper-\niment for Figure 4c actions have non-uniform costs: changing a joint angle closer to the\nbase is more expensive than changing a higher joint angle. As a result of the non-uniform\ncosts our heuristic becomes less informative, and so search is much more expensive. In\nthis experiment we start with \u0001 = 10, and run all algorithms for 30 minutes. At the end,\nARA* achieves a solution with a substantially smaller cost (200 vs. 220 for the succession\nof A* searches and 223 for Anytime A*) and a better sub-optimality bound (3.92 vs. 4.46\nfor both the succession of A* searches and Anytime A*). Also, since ARA* controls \u0001 it\ndecreases the cost of the solution gradually. Reading the graph differently, ARA* reaches\na sub-optimality bound \u00010 = 4.5 after about 59 thousand expansions and 11.7 secs, while\nthe succession of A* searches reaches the same bound after 12.5 million expansions and\n27.4 minutes (about 140-fold speedup by ARA*) and Anytime A* reaches it after over 4\nmillion expansions and 8.8 minutes (over 44-fold speedup by ARA*). Similar results hold\nwhen comparing the amount of work each of the algorithms spend on obtaining a solution\nof cost 225. While Figure 4 shows execution time, the comparison of states expanded (not\nshown) is almost identical. Additionally, to demonstrate the advantage of ARA* expanding\neach state no more than once per search iteration, we compare the \ufb01rst searches of ARA*\nand Anytime A*: the \ufb01rst search of ARA* performed 6,378 expansions, while Anytime\nA* performed 8,994 expansions, mainly because some of the states were expanded up to\n\n\f(a) robot with laser scanner\n\n(b) 3D Map\n\n(c) optimal 2D search\n\n(d) optimal 4D search with A*\n\nafter 25 secs\n\n(e) 4D search with ARA*\nafter 0.6 secs (\u0001 = 2.5)\n\n(f) 4D search with ARA*\n\nafter 25 secs (\u0001 = 1.0)\n\nFigure 5: outdoor robot navigation experiment (cross shows the position of the robot)\n\nseven times before a \ufb01rst solution was found.\n\nFigures 4d-f show the results of experiments done on a 20 DOF robot arm, with actions\nthat have non-uniform costs. All three algorithms start with \u0001 = 30. Figures 4d and 4e\nshow that in 90 seconds of planning the cost of the trajectory found by ARA* and the sub-\noptimality bound it can guarantee is substantially smaller than for the other algorithms. For\nexample, the trajectory in Figure 4d contains more steps and also makes one extra change\nin the angle of the third joint from the base of the arm (despite the fact that changing lower\njoint angles is very expensive) in comparison to the trajectory in Figure 4e. The graph in\nFigure 4f compares the performance of the three algorithms on twenty randomized envi-\nronments similar to the environment in Figure 4d. The environments had random goal lo-\ncations, and the obstacles were slid to random locations along the outside walls. The graph\nshows the additional time the other algorithms require to achieve the same sub-optimality\nbound that ARA* does. To make the results from different environments comparable we\nnormalize the bound by dividing it by the maximum of the best bounds that the algorithms\nachieve before they run out of memory. Averaging over all environments, the time for\nARA* to achieve the best bound was 10.1 secs. Thus, the difference of 40 seconds at the\nend of the Anytime A* graph corresponds to an overhead of about a factor of 4.\n\n3.2 Outdoor Robot Navigation\nFor us the motivation for this work was ef\ufb01cient path-planning for mobile robots in large\noutdoor environments, where optimal trajectories involve fast motion and sweeping turns\nat speed. In such environments it is particularly important to take advantage of the robot\u2019s\nmomentum and \ufb01nd dynamic rather than static plans. We use a 4D state space: xy position,\norientation, and velocity. High dimensionality and large environments result in very large\nstate-spaces for the planner and make it computationally infeasible for the robot to plan\noptimally every time it discovers new obstacles or modelling errors. To solve this problem\nwe built a two-level planner: a 4D planner that uses ARA*, and a fast 2D (x, y) planner\nthat uses A* search and whose results serve as the heuristic for the 4D planner.1\n\n1To interleave search with the execution of the best plan so far we perform 4D search backward.\nThat is, the start of the search, sstart, is the actual goal state of the robot, while the goal of the search,\nsgoal, is the current state of the robot. Thus, sstart does not change as the robot moves and the search\ntree remains valid in between search iterations. Since heuristics estimate the distances to sgoal (the\nrobot position) we have to recompute them during the reorder operation (line 10\u2019, Figure 3).\n\n\fIn Figure 5 we show the robot we used for navigation and a 3D laser scan [3] con-\nstructed by the robot of the environment we tested our system in. The scan is converted\ninto a map of the environment (Figure 5c, obstacles shown in black). The size of the envi-\nronment is 91.2 by 94.4 meters, and the map is discretitized into cells of 0.4 by 0.4 meters.\nThus, the 2D state-space consists of 53808 states. The 4D state space has over 20 million\nstates. The robot\u2019s initial state is the upper circle, while its goal is the lower circle. To\nensure safe operation we created a buffer zone with high costs around each obstacle. The\nsquares in the upper-right corners of the \ufb01gures show a magni\ufb01ed fragment of the map with\ngrayscale proportional to cost. The 2D plan (Figure 5c) makes sharp 45 degree turns when\ngoing around the obstacles, requiring the robot to come to complete stops. The optimal\n4D plan results in a wider turn, and the velocity of the robot remains high throughout the\nwhole trajectory. In the \ufb01rst plan computed by ARA* starting at \u0001 = 2.5 (Figure 5e) the\ntrajectory is much better than the 2D plan, but somewhat worse than the optimal 4D plan.\nThe time required for the optimal 4D planner was 11.196 secs, whereas the time for\nthe 4D ARA* planner to generate the plan in Figure 5e was 556ms. As a result, the robot\nthat runs ARA* can start executing its plan much earlier. A robot running the optimal\n4D planner would still be near the beginning of its path 25 seconds after receiving a goal\nlocation (Figure 5d). In contrast, in the same amount of time the robot running ARA* has\nadvanced much further (Figure 5f), and its plan by now has converged to optimal (\u0001 has\ndecreased to 1).\n\n4 Conclusions\nWe have presented the \ufb01rst anytime heuristic search that works by continually decreasing\na sub-optimality bound on its solution and \ufb01nding new solutions that satisfy the bound on\nthe way. It executes a series of searches with decreasing sub-optimality bounds, and each\nsearch tries to reuse as much as possible of the results from previous searches. The exper-\niments show that our algorithm is much more ef\ufb01cient than any of the previous anytime\nsearches, and can successfully solve large robotic planning problems.\n\nAcknowledgments\nThis work was supported by AFRL contract F30602\u201301\u2013C\u20130219, DARPA\u2019s MICA pro-\ngram.\n\nReferences\n[1] B. Bonet and H. Geffner. Planning as heuristic search. Arti\ufb01cial Intelligence, 129(1-\n\n2):5\u201333, 2001.\n\n[2] T. L. Dean and M. Boddy. An analysis of time-dependent planning. In Proc. of the\n\nNational Conference on Arti\ufb01cial Intelligence (AAAI), 1988.\n\n[3] D. Haehnel. Personal communication, 2003.\n[4] S. Koenig and M. Likhachev.\n\nIncremental A*.\n\nIn Advances in Neural Information\n\nProcessing Systems (NIPS) 14. Cambridge, MA: MIT Press, 2002.\n\n[5] R. E. Korf. Linear-space best-\ufb01rst search. Arti\ufb01cial Intelligence, 62:41\u201378, 1993.\n[6] M. Likhachev, G. Gordon, and S. Thrun. ARA*: Formal Analysis. Tech. Rep. CMU-\n\nCS-03-148, Carnegie Mellon University, Pittsburgh, PA, 2003.\n\n[7] J. Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving.\n\nAddison-Wesley, 1984.\n\n[8] R. Zhou and E. A. Hansen. Multiple sequence alignment using A*. In Proc. of the\n\nNational Conference on Arti\ufb01cial Intelligence (AAAI), 2002. Student abstract.\n\n[9] S. Zilberstein and S. Russell. Approximate reasoning using anytime algorithms. In\n\nImprecise and Approximate Computation. Kluwer Academic Publishers, 1995.\n\n\f", "award": [], "sourceid": 2382, "authors": [{"given_name": "Maxim", "family_name": "Likhachev", "institution": null}, {"given_name": "Geoffrey", "family_name": "Gordon", "institution": null}, {"given_name": "Sebastian", "family_name": "Thrun", "institution": null}]}