{"title": "Applying Metric-Trees to Belief-Point POMDPs", "book": "Advances in Neural Information Processing Systems", "page_first": 759, "page_last": 766, "abstract": "", "full_text": "Applying Metric-Trees to Belief-Point POMDPs\n\nJoelle Pineau, Geoffrey Gordon\n\nSchool of Computer Science\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nSebastian Thrun\n\nComputer Science Department\n\nStanford University\nStanford, CA 94305\n\nfjpineau,ggordong@cs.cmu.edu\n\nthrun@stanford.edu\n\nAbstract\n\nRecent developments in grid-based and point-based approximation algo-\nrithms for POMDPs have greatly improved the tractability of POMDP\nplanning. These approaches operate on sets of belief points by individ-\nually learning a value function for each point. In reality, belief points\nexist in a highly-structured metric simplex, but current POMDP algo-\nrithms do not exploit this property. This paper presents a new metric-tree\nalgorithm which can be used in the context of POMDP planning to sort\nbelief points spatially, and then perform fast value function updates over\ngroups of points. We present results showing that this approach can re-\nduce computation in point-based POMDP algorithms for a wide range of\nproblems.\n\n1\n\nIntroduction\n\nPlanning under uncertainty is a central problem in the \ufb01eld of robotics as well as many\nother AI applications. In terms of representational effectiveness, the Partially Observable\nMarkov Decision Process (POMDP) is among the most promising frameworks for this\nproblem. However the practical use of POMDPs has been severely limited by the computa-\ntional requirement of planning in such a rich representation. POMDP planning is dif\ufb01cult\nbecause it involves learning action selection strategies contingent on all possible types of\nstate uncertainty. This means that whenever the robot\u2019s world state cannot be observed,\nthe planner must maintain a belief (namely a probability distribution over possible states)\nto summarize the robot\u2019s recent history of actions taken and observations received. The\nPOMDP planner then learns an optimal future action selection for each possible belief. As\nthe planning horizon grows (linearly), so does the number of possible beliefs (exponen-\ntially), which causes the computational intractability of exact POMDP planning.\n\nIn recent years, a number of approximate algorithms have been proposed which overcome\nthis issue by simply refusing to consider all possible beliefs, and instead selecting (and\nplanning for) a small set of representative belief points. During execution, should the robot\nencounter a belief for which it has no plan, it \ufb01nds the nearest known belief point and\nfollows its plan. Such approaches, often known as grid-based [1, 4, 13], or point-based [8,\n9] algorithms, have had signi\ufb01cant success with increasingly large planning domains. They\nformulate the plan optimization problem as a value iteration procedure, and estimate the\ncost/reward of applying a sequence of actions from a given belief point. The value of\n\n\feach action sequence can be expressed as an (cid:11)-vector, and a key step in many algorithms\nconsists of evaluating many candidate (cid:11)-vectors (set (cid:0)) at each belief point (set B).\nThese B (cid:2) (cid:0) (point-to-vector) comparisons\u2014which are typically the main bottleneck in\nscaling point-based algorithms\u2014are reminiscent of many M (cid:2) N comparison problems\nthat arise in statistical learning tasks, such as kNN, mixture models, kernel regression, etc.\nRecent work has shown that for these problems, one can signi\ufb01cantly reduce the number of\nnecessary comparisons by using appropriate metric data structures, such as KD-trees and\nball-trees [3, 6, 12]. Given this insight, we extend the metric-tree approach to POMDP\nplanning, with the speci\ufb01c goal of reducing the number of B (cid:2) (cid:0) comparisons. This paper\ndescribes our algorithm for building and searching a metric-tree over belief points.\n\nIn addition to improving the scalability of POMDP planning, this approach features a num-\nber of interesting ideas for generalizing metric-tree algorithms. For example, when using\ntrees for POMDPs, we move away from point-to-point search procedures for which the\ntrees are typically used, and leverage metric constraints to prune point-to-vector compar-\nisons. We show how it is often possible to evaluate the usefulness of an (cid:11)-vector over an\nentire sub-region of the belief simplex without explicitly evaluating it at each belief point\nin that sub-region. While our new metric-tree approach offers signi\ufb01cant potential for all\npoint-based approaches, in this paper we apply it in the context of the PBVI algorithm [8],\nand show that it can effectively reduce computation without compromising plan quality.\n\n2 Partially Observable Markov Decision Processes\n\nWe adopt the standard POMDP formulation [5], de\ufb01ning a problem by the n-tuple:\nfS; A; Z; T; O; R; (cid:13); b0g, where S is a set of (discrete) world states describing the prob-\nlem domain, A is a set of possible actions, and Z is a set of possible observations pro-\nviding (possibly noisy and/or partial) state information. The distribution T (s; a; s0) de-\nscribes state-to-state transition probabilities; distribution O(s; a; z) describes observation\nemission probabilities; function R(s; a) represents the reward received for applying action\na in state s; (cid:13) represents the discount factor; and b0 speci\ufb01es the initial belief distribu-\ntion. An jSj-dimensional vector, bt, represents the agent\u2019s belief about the state of the\nworld at time t, and is expressed as a probability distribution over states. This belief is\nupdated after each time step\u2014to re\ufb02ect the latest pair (at(cid:0)1; zt)\u2014using a Bayesian \ufb01lter:\n\nbt(s0) := c O(s0; at(cid:0)1; zt)Ps2S T (s; at(cid:0)1; s0)bt(cid:0)1(s), where c is a normalizing constant.\nsum of rewards E[Pt (cid:13)tR(st; at)], for all belief. The corresponding value function can be\nformulated as a Bellman equation: V (b) = maxa2A(cid:2)R(b; a) + (cid:13)Pb02B T (b; a; b0)V (b0)(cid:3)\n\nThe goal of POMDP planning is to \ufb01nd a sequence of actions maximizing the expected\n\nBy de\ufb01nition there exist an in\ufb01nite number of belief points. However when optimized ex-\nactly, the value function is always piecewise linear and convex in the belief (Fig. 1a). After\nn value iterations, the solution consists of a \ufb01nite set of (cid:11)-vectors: Vn = f(cid:11)0; (cid:11)1; :::; (cid:11)mg.\nEach (cid:11)-vector represents an jSj-dimensional hyper-plane, and de\ufb01nes the value function\nexact value updates, the set of (cid:11)-vectors can (and often does) grow exponentially with the\nplanning horizon. Therefore exact algorithms tend to be impractical for all but the smallest\nproblems. We leave out a full discussion of exact POMDP planning (see [5] for more) and\nfocus instead on the much more tractable point-based approximate algorithm.\n\nover a bounded region of the belief: Vn(b) = max(cid:11)2VnPs2S (cid:11)(s)b(s). When performing\n\n3 Point-based value iteration for POMDPs\n\nThe main motivation behind the point-based algorithm is to exploit the fact that most be-\nliefs are never, or very rarely, encountered, and thus resources are better spent planning\n\n\ffor those beliefs that are most likely to be reached. Many classical POMDP algorithms\ndo not exploit this insight. Point-based value iteration algorithms on the other hand ap-\nply value backups only to a \ufb01nite set of pre-selected (and likely to be encountered) belief\npoints B = fb0; b1; :::; bqg. They initialize a separate (cid:11)-vector for each selected point, and\nrepeatedly update the value of that (cid:11)-vector. As shown in Figure 1b, by maintaining a full\n(cid:11)-vector for each belief point, we can preserve the piecewise linearity and convexity of\nthe value function, and de\ufb01ne a value function over the entire belief simplex. This is an\napproximation, as some vectors may be missed, but by appropriately selecting points, we\ncan bound the approximation error (see [8] for details).\n\nV={\n\na 0\n\n,a 1,a 2,a 3}\n\nV={\n\na 0\n\n,a 1,a 3}\n\n(a)\n\nb2\n\nb1\n\nb0\n(b)\n\nb3\n\nFigure 1: (a) Value iteration with exact updates. (b) Value iteration with point-based updates.\n\nThere are generally two phases to point-based algorithms. First, a set of belief points is se-\nlected, and second, a series of backup operations are applied over (cid:11)-vectors for that set of\npoints. In practice, steps of value iteration and steps of belief set expansion can be repeat-\nedly interleaved to produce an anytime algorithm that can gradually trade-off computation\ntime and solution quality. The question of how to best select belief points is somewhat\northogonal to the ideas in this paper and is discussed in detail in [8]. We therefore focus\non describing how to do point-based value backups, before showing how this step can be\nsigni\ufb01cantly accelerated by the use of appropriate metric data structures.\nThe traditional value iteration POMDP backup operation is formulated as a dynamic pro-\ngram, where we build the n-th horizon value function V from the previous solution V 0:\n\na2A\"Xs2S\na2A\"Xz2Z\n\n= max\n\nR(s; a)b(s)+ (cid:13)Xz2Z\n(cid:11)02V 0\"Xs2S\n\nR(s; a)\njZj\n\nmax\n\nmax\n\n(cid:11)02V 0Xs2SXs02S\nb(s)+ (cid:13)Xs2SXs02S\n\nT (s; a; s0)O(z; s0; a)(cid:11)0(s0)b(s)# (1)\nT (s; a; s0)O(z; s0; a)(cid:11)0(s0)b(s)##\n\nV (b) = max\n\nTo plan for a \ufb01nite set of belief points B, we can modify this operation such that only\none (cid:11)-vector per belief point is maintained and therefore we only consider V (b) at points\nb 2 B. This is implemented using three steps. First, we take each vector in V 0 and project\nit backward (according to the model) for a given action, observation pair. In doing so, we\ngenerate intermediate sets (cid:0)a;z;8a 2 A;8z 2 Z:\n\n(cid:0)a;z (cid:11)a;z\n\ni\n\n(s) =\n\nT (s; a; s0)O(z; s0; a)(cid:11)0i(s0);8(cid:11)0i 2 V 0 (Step 1)\n\n(2)\n\nR(s; a)\njZj\n\n+ (cid:13)Xs02S\n\nSecond for each b 2 B, we construct (cid:0)a (8a 2 A). This sum over observations1 includes\nthe maximum (cid:11)a;z (at a given b) from each (cid:0)a;z:\n(3)\n\n(cid:0)a\n\nb = Xz2Z\n\nargmax\n(cid:11)2(cid:0)a;z\n\n((cid:11) (cid:1) b) (Step 2)\n\n1In exact updates,\n\nthis step requires taking a cross-sum over observations, which is\nO(jSjjAjjV 0jjZj). By operating over a \ufb01nite set of points, the cross-sum reduces to a simple sum,\nwhich is the main reason behind the computational speed-up obtained in point-based algorithms.\n\n\fFinally, we \ufb01nd the best action for each belief point:\n\nV argmax\n;8a2A\n\n(cid:0)a\nb\n\n((cid:0)a\n\nb (cid:1) b); 8b 2 B (Step 3)\n\n(4)\n\nThe main bottleneck in applying point-based algorithms to larger POMDPs is in step 2\nwhere we perform a B (cid:2) (cid:0) comparison2: for every b 2 B, we must \ufb01nd the best vector\nfrom a given set (cid:0)a;z. This is usually implemented as a sequential search, exhaustively\ncomparing (cid:11) (cid:1) b for every b 2 B and every (cid:11) 2 (cid:0)a;z, in order to \ufb01nd the best (cid:11) at\neach b (with overall time-complexity O(jAjjZjjSjjBjjV 0j)). While this is not entirely\nunreasonable, it is by far the slowest step. It also completely ignores the highly structured\nnature of the belief space.\n\nBelief points exist in a metric space and there is much to be gained from exploiting this\nproperty. For example, given the piecewise linearity and convexity of the value function, it\nis more likely that two nearby points will share similar values (and policies) than points that\nare far away. Consequently it could be much more ef\ufb01cient to evaluate an (cid:11)-vector over\nsets of nearby points, rather than by exhaustively looking at all the points separately. In the\nnext section, we describe a new type of metric-tree which structures data points based on a\ndistance metric over the belief simplex. We then show how this kind of tree can be used to\nef\ufb01ciently evaluate (cid:11)-vectors over sets of belief points (or belief regions).\n\n4 Metric-trees for belief spaces\n\nMetric data structures offer a way to organize large sets of data points according to distances\nbetween the points. By organizing the data appropriately, it is possible to satisfy many\ndifferent statistical queries over the elements of the set, without explicitly considering all\npoints. Instances of metric data structures such as KD-trees, ball-trees and metric-trees have\nbeen shown to be useful for a wide range of learning tasks (e.g. nearest-neighbor, kernel\nregression, mixture modeling), including some with high-dimensional and non-Euclidean\nspaces. The metric-tree [12] in particular offers a very general approach to the problem of\nstructural data partitioning. It consists of a hierarchical tree built by recursively splitting the\nset of points into spatially tighter subsets, assuming only that the distance between points\nis a metric.\n\n4.1 Building a metric-tree from belief points\n\nEach node (cid:17) in a metric-tree is represented by its center (cid:17)c, its radius (cid:17)r, and a set of points\n(cid:17)B that fall within its radius. To recursively construct the tree\u2014starting with node (cid:17) and\nbuilding children nodes (cid:17)1 and (cid:17)2\u2014we \ufb01rst pick two candidate centers (one per child) at\nthe extremes of the (cid:17)\u2019s region: (cid:17)1\nc = maxb2(cid:17)D D((cid:17)c; b), and (cid:17)2\nc ; b). In\na single-step approximation to k-nearest-neighbor (k=2), we then re-allocate each point in\n(cid:17)B to the child with the closest center (ties are broken randomly):\n\nc = maxb2(cid:17)D D((cid:17)1\n\n(cid:17)1\nB b\n(cid:17)2\nB b\n\nif D((cid:17)1\nif D((cid:17)1\n\nc ; b) < D((cid:17)2\nc ; b) > D((cid:17)2\n\nc ; b)\nc ; b)\n\nFinally we update the centers and calculate the radius for each child:\n\nc = Centerf(cid:17)1\n(cid:17)1\nBg\n(cid:17)1\nD((cid:17)1\nr = max\nc ; b)\nb2(cid:17)1\nB\n\n(cid:17)2\nc = Centerf(cid:17)2\nBg\n(cid:17)2\nD((cid:17)2\nr = max\nc ; b)\nb2(cid:17)2\nB\n\n(5)\n\n(6)\n(7)\n\n2Step 1 projects all vectors (cid:11) 2 V 0 for any (a; z) pair. In the worse-case, this has time-complexity\nO(jAjjZjjSj2 jV 0j), however most problems have very sparse transition matrices and this is typically\nmuch closer to O(jAjjZjjSjjV 0j). Step 3 is also relatively ef\ufb01cient at O(jAjjZjjSjjBj).\n\n\fThe general metric-tree algorithm allows a variety of ways to calculate centers and dis-\ntances. For the centers, the most common choice is the centroid of the points and this is\nwhat we use when building a tree over belief points. We have tried other options, but with\nnegligible impact. For the distance metric, we select the max-norm: D((cid:17)c; b) = jj(cid:17)c(cid:0)bjj1,\nwhich allows for fast searching as described in the next section. While the radius deter-\nmines the size of the region enclosed by each node, the choice of distance metric deter-\nmines its shape (e.g. with Euclidean distance, we would get hyper-balls of radius (cid:17)r). In\nthe case of the max-norm, each node de\ufb01nes an jSj-dimensional hyper-cube of length 2(cid:3)(cid:17)r.\nFigure 2 shows how the \ufb01rst two-levels of a tree are built, assuming a 3-state problem.\n\nP(s2)\n\nn0\n\nnr\n\nnc\n\nn1\n\nn2\n\nP(s1)\n\n(a)\n\n(b)\n\n(c)\n\nn0\n\n(d)\n\nn1\n\nbi bj\n\nn2\n\n...\n\nFigure 2: (a) Belief points. (b) Top node. (c) Level-1 left and right nodes. (d) Corresponding tree\n\nWhile we need to compute the center and radius for each node to build the tree, there are\nadditional statistics which we also store about each node. These are speci\ufb01c to using trees\nin the context of belief-state planning, and are necessary to evaluate (cid:11) vectors over regions\nof the belief simplex. For a given node (cid:17) containing data points (cid:17)B, we compute (cid:17)min and\n(cid:17)max, the vectors containing respectively the min and max belief in each dimension:\n\n(cid:17)min(s) = min\nb2(cid:17)B\n\nb(s);8s 2 S\n\n(cid:17)max(s) = max\nb2(cid:17)B\n\nb(s);8s 2 S\n\n(8)\n\n4.2 Searching over sub-regions of the simplex\n\nOnce the tree is built, it can be used for fast statistical queries. In our case, the goal is to\ncompute argmax(cid:11)2(cid:0)a;z ((cid:11) (cid:1) b) for all belief points. To do this, we consider the (cid:11) vectors\none at a time, and decide whether a new candidate (cid:11)i is better than any of the previous\nvectors f(cid:11)0 : : : (cid:11)i(cid:0)1g. With the belief points organized in a tree, we can often assess this\nover sets of points by consulting a high-level node (cid:17), rather than by assessing this for each\nbelief point separately.\n\nWe start at the root node of the tree. There are four different situations we can encounter\nas we traverse the tree: \ufb01rst, there might be no single previous (cid:11)-vector that is best for all\nbelief points below the current node (Fig. 3a). In this case we proceed to the children of the\ncurrent node without performing any tests. In the other three cases there is a single domi-\nnant alpha-vector at the current node; the cases are that the newest vector (cid:11)i dominates it\n(Fig. 3b), is dominated by it (Fig. 3c), or neither (Fig. 3d). If we can prove that (cid:11)i domi-\nnates or is dominated by the previous one, we can prune the search and avoid checking the\ncurrent node\u2019s children; otherwise we must check the children recursively.\nWe seek an ef\ufb01cient test to determine whether one vector, (cid:11)i, dominates another, (cid:11)j, over\nthe belief points contained within a node. The test must be conservative: it must never\nerroneously say that one vector dominates another.\nIt is acceptable for the test to miss\nsome pruning opportunities\u2014the consequence is an increase in run-time as we check more\nnodes than necessary\u2014but this is best avoided if possible. The most thorough test would\ncheck whether (cid:1) (cid:1) b is positive or negative at every belief sample b under the current node\n\n\fi\n\nc\n\nr\n(a)\n\ni\n\nc\n\nr\n(b)\n\ni\n\nr\n\nc\n\n(c)\n\ni\n\nr\n\nc\n\n(d)\n\nFigure 3: Possible scenarios when evaluation a new vector (cid:11) at a node (cid:17), assuming a 2-state domain.\n(a) (cid:17) is a split node. (b) (cid:11)i is dominant. (c) (cid:11)i is dominated. (d) (cid:11)i is partially dominant.\n\n(where (cid:1) = ((cid:11)i (cid:0) (cid:11)j)). All positive would mean that (cid:11)i dominates (cid:11)j, all negative the\nreverse, and mixed positive and negative would mean that neither dominates the other. Of\ncourse, this test renders the tree useless, since all points are checked individually. Instead,\nwe test whether (cid:1)(cid:1)b is positive or negative over a convex region R which includes all of the\nbelief samples that belong to the current node. The smaller the region, the more accurate\nour test will be; on the other hand, if the region is too complicated we won\u2019t be able to\ncarry out the test ef\ufb01ciently. (Note that we can always test some region R by solving one\nlinear program to \ufb01nd l = minb2R b (cid:1) (cid:1), another to \ufb01nd h = maxb2R b (cid:1) (cid:1), and testing\nwhether l < 0 < h. But this is expensive and we prefer a more ef\ufb01cient test.)\n\nP(s2)\n\nmax(s2)\nh\n\nm\n\na\n\nx(s\n\n3)\n\nmin(s2)\n\nh\n\nP(s1)\n\n(a)\n\nmin(s1)\n(b)\n\nm\n\nin(s\n\n3)\n\nm\na\nx\n\n(\ns\n\n1\n\n)\n\n(c)\n\n(d)\n\nFigure 4: Several possible convex regions over subsets of belief points, assuming a 3-state domain.\n\nWe tested several types of region. The simplest type is an axis-parallel bounding box\n(Fig. 4a), (cid:17)min (cid:20) b (cid:20) (cid:17)max for vectors (cid:17)min and (cid:17)max (as de\ufb01ned in Eq. 8). We also\n\ntested the simplex de\ufb01ned by b (cid:21) (cid:17)min and Ps2S b(s) = 1 (Fig. 4b), as well as the\nsimplex de\ufb01ned by b (cid:20) (cid:17)max andPs2S b(s) = 1 (Fig. 4c). The most effective test we\nthe planePs2S b(s) = 1 (Fig. 4d). For each of these shapes, minimizing or maximizing\n\ndiscovered assumes R is the intersection of the bounding box (cid:17)min (cid:20) b (cid:20) (cid:17)max with\nb (cid:1) (cid:1) takes time O(d) (where d=#states): for the box (Fig. 4a) we check each dimension\nindependently, and for the simplices (Figs 4b, 4c) we check each corner exhaustively. For\nthe last shape (Fig. 4d), maximizing with respect to b is the same as computing (cid:14) s.t. b(s) =\n(cid:17)min(s) if (cid:1)(s) < (cid:14) and b(s) = (cid:17)max(s) if (cid:1)(s) > (cid:14). We can \ufb01nd (cid:14) in expected time O(d)\nusing a modi\ufb01cation of the quick-median algorithm. In practice, not all O(d) algorithms\nare equivalent. Empirical results show that checking the corners of regions (b) and (c) and\ntaking the tightest bounds provides the fastest algorithm. This is what we used for the\nresults presented below.\n\n5 Results and Discussion\n\nWe have conducted a set of experiments to test the effectiveness of the tree structure in\nreducing computations. While still preliminary, these results illustrate a few interesting\n\nh\nh\nh\nh\nh\nh\nh\nh\na\na\na\na\nh\nh\nh\nh\n\fproperties of metric-trees when used in conjunction with point-based POMDP planning.\nFigure 5 presents results for six well-known POMDP problems, ranging in size from 4 to\n870 states (for problem descriptions see [2], except for Coffee [10] and Tag [8]). While all\nthese problems have been successfully solved by previous approaches, it is interesting to\nobserve the level of speed-up that can be obtained by leveraging metric-tree data structures.\nIn Fig. 5(a)-(f) we show the number of B (cid:2) (cid:0) (point-to-vector) comparisons required, with\nand without a tree, for different numbers of belief points. In Fig. 5(g)-(h) we show the\ncomputation time (as a function of the number of belief points) required for two of the\nproblems. The No-Tree results were generated by applying the original PBVI algorithm\n(Section 2, [8]). The Tree results (which count comparisons on both internal and leaf\nnodes) were generated by embedding the tree searching procedure described in Section 4.2\nwithin the same point-based POMDP algorithm. For some of the problems, we also show\nperformance using an (cid:15)-tree, where the test for vector dominance can reject (i.e. declare (cid:11)i\nis dominated, Fig. 3c) a new vector that is within (cid:15) of the current best vector.\n\nx 104\n\n10\n\ns\nn\no\ns\ni\nr\na\np\nm\no\nc\n \n\n#\n\n8\n\n6\n\n4\n\n2\n\n0\n0\n\nNo Tree\nTree\nEpsilon\u2212Tree\n\nx 106\n\n2\n\ns\nn\no\ns\ni\nr\na\np\nm\no\nc\n \n\n#\n\n1.5\n\n1\n\n0.5\n\nx 104\n\n7\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\ns\nn\no\ns\ni\nr\na\np\nm\no\nc\n \n#\n\nx 107\n\n2\n\ns\nn\no\ns\ni\nr\na\np\nm\no\nc\n \n\n#\n\n1.5\n\n1\n\n0.5\n\n1000\n\n2000\n\n# belief points\n\n3000\n\n4000\n\n0\n0\n\n1000\n\n2000\n3000\n# belief points\n\n4000\n\n5000\n\n0\n0\n\n100\n\n200\n300\n# belief points\n\n400\n\n500\n\n0\n0\n\n100\n\n200\n300\n# belief points\n\n400\n\n500\n\n(a) Hanks, jSj=4\n\n(b) SACI, jSj=12\n\n(c) Coffee, jSj=32\n\n(d) Tiger-grid, jSj=36\n\nx 107\n\n10\n\ns\nn\no\ns\ni\nr\na\np\nm\no\nc\n \n\n#\n\n8\n\n6\n\n4\n\n2\n\n0\n0\n\n200\n\nx 106\n\n10\n\ns\nn\no\ns\ni\nr\na\np\nm\no\nc\n \n\n#\n\n8\n\n6\n\n4\n\n2\n\n0\n0\n\n1000\n\n1200\n\n400\n\n600\n\n# belief points\n\n800\n\n(e) Hallway, jSj=60\n\n25\n\n20\n\n15\n\n10\n\n)\ns\nc\ne\ns\n(\n \n\nE\nM\nT\n\nI\n\n5\n\n0\n0\n\n100\n\n200\n300\n# belief points\n\n400\n\n500\n\n(f) Tag, jSj=870\n\nx 104\n\n5\n\n4\n\n3\n\n2\n\n1\n\n)\ns\nc\ne\ns\n(\n \n\nE\nM\nT\n\nI\n\n0.5\n\n1\n1.5\n# belief points\n\n2\n\n2.5\nx 104\n\n(g) SACI, jSj=12\n\n0\n0\n\n200\n\n400\n600\n# belief points\n\n800\n\n1000\n\n(h) Tag, jSj=870\n\nFigure 5: Results of PBVI algorithm with and without metric-tree.\n\nThese early results show that, in various proportions, the tree can cut down on the number\nof comparisons. This illustrates how the use of metric-trees can effectively reduce POMDP\ncomputational load. The (cid:15)-tree is particularly effective at reducing the number of com-\nparisons in some domains (e.g. SACI, Tag). The much smaller effect shown in the other\nproblems may be attributed to a poorly tuned (cid:15) (we used (cid:15) = 0:01 in all experiments). The\nquestion of how to set (cid:15) such that we most reduce computation, while maintaining good\ncontrol performance, tends to be highly problem-dependent.\n\nIn keeping with other metric-tree applications, our results show that computational savings\nincrease with the number of belief points. What is more surprising is to see the trees paying\noff with so few data points (most applications of KD-trees start seeing bene\ufb01ts with 1000+\ndata points.) This may be partially attributed to the compactness of our convex test region\n(Fig. 4d), and to the fact that we do not search on split nodes (Fig. 3a); however, it is\nmost likely due to the nature of our search problem: many (cid:11) vectors are accepted/rejected\nbefore visiting any leaf nodes, which is different from typical metric-tree applications. We\nare particularly encouraged to see trees having a noticeable effect with very few data points\nbecause, in some domains, good control policies can also be extracted with few data points.\n\nWe notice that the effect of using trees is negligible in some larger problems (e.g. Tiger-\ngrid), while still pronounced in others of equal or larger size (e.g. Coffee, Tag). This is\n\n\flikely due to the intrinsic dimensionality of each problem.3 Metric-trees often perform\nwell in high-dimensional datasets with low intrinsic dimensionality; this also appears to be\ntrue of metric-trees applied to vector sorting. While this suggests that our current algorithm\nis not as effective in problems with intrinsic high-dimensionality, a slightly different tree\nstructure or search procedure may well help in those cases. Recent work has proposed new\nkinds of metric-trees that can better handle point-based searches in high-dimensions [7],\nand some of this may be applicable to the POMDP (cid:11)-vector sorting problem.\n\n6 Conclusion\n\nWe have described a new type of metric-tree which can be used for sorting belief points\nand accelerating value updates in POMDPs. Early experiments indicate that the tree struc-\nture, by appropriately pruning unnecessary (cid:11)-vectors over large regions of the belief, can\naccelerate planning for a range problems. The promising performance of the approach on\nthe Tag domain opens the door to larger experiments.\n\nAcknowledgments\n\nThis research was supported by DARPA (MARS program) and NSF (ITR initiative).\n\nReferences\n\n[1] R. I. Brafman. A heuristic variable grid solution method for POMDPs. In Proceedings of the\n\nFourteenth National Conference on Arti\ufb01cial Intelligence (AAAI), pages 727\u2013733, 1997.\n\n[2] A. Cassandra. http://www.cs.brown.edu/research/ai/pomdp/examples/index.html.\n[3] J. H. Friendman, J. L. Bengley, and R. A. Finkel. An algorithm for \ufb01nding best matches in\nlogarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209\u2013226, 1977.\n[4] M. Hauskrecht. Value-function approximations for partially observable Markov decision pro-\n\ncesses. Journal of Arti\ufb01cial Intelligence Research, 13:33\u201394, 2000.\n\n[5] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable\n\nstochastic domains. Arti\ufb01cial Intelligence, 101:99\u2013134, 1998.\n\n[6] A. W. Moore. Very fast EM-based mixture model clustering using multiresolution KD-trees. In\n\nAdvances in Neural Information Processing Systems (NIPS), volume 11, 1999.\n\n[7] A. W. Moore. The anchors hierarchy: Using the triangle inequality to survive high dimensional\n\ndata. Technical Report CMU-RI-TR-00-05, Carnegie Mellon, 2000.\n\n[8] J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: An anytime algorithm for\n\nPOMDPs. In International Joint Conference on Arti\ufb01cial Intelligence (IJCAI), 2003.\n\n[9] K.-M. Poon. A fast heuristic algorithm for decision-theoretic planning. Master\u2019s thesis, The\n\nHong-Kong University of Science and Technology, 2001.\n\n[10] P. Poupart and C. Boutilier. Value-directed compression of POMDPs. In Advances in Neural\n\nInformation Processing Systems (NIPS), volume 15, 2003.\n\n[11] N. Roy and G. Gordon. Exponential family PCA for belief compression in POMDPs.\n\nAdvances in Neural Information Processing Systems (NIPS), volume 15, 2003.\n\nIn\n\n[12] J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information\n\nProcessing Letters, 40:175\u2013179, 1991.\n\n[13] R. Zhou and E. A. Hansen. An improved grid-based approximation algorithm for POMDPs. In\nProceedings of the 17th International Joint Conference on Arti\ufb01cial Intelligence (IJCAI), 2001.\n\n3The coffee domain is known to have an intrinsic dimensionality of 7 [10]. We do not know the\nintrinsic dimensionality of the Tag domain, but many robot applications produce belief points that\nexist in sub-dimensional manifolds [11].\n\n\f", "award": [], "sourceid": 2437, "authors": [{"given_name": "Joelle", "family_name": "Pineau", "institution": null}, {"given_name": "Geoffrey", "family_name": "Gordon", "institution": null}, {"given_name": "Sebastian", "family_name": "Thrun", "institution": null}]}