{"title": "Coastal Navigation with Mobile Robots", "book": "Advances in Neural Information Processing Systems", "page_first": 1043, "page_last": 1049, "abstract": null, "full_text": "Coastal Navigation with Mobile Robots \n\nNicholas Roy and Sebastian Thrun \n\nSchool of Computer Science \nCarnegie Mellon University \n\nPittsburgh, PA 15213 \n\n{ nicholas. roy I sebastian. thrun } @cs.cmu.edu \n\nAbstract \n\nThe problem that we address in this paper is how a mobile robot can plan in order \nto arrive at its goal with minimum uncertainty. Traditional motion planning algo(cid:173)\nrithms often assume that a mobile robot can track its position reliably, however, in real \nworld situations, reliable localization may not always be feasible. Partially Observable \nMarkov Decision Processes (POMDPs) provide one way to maximize the certainty of \nreaching the goal state, but at the cost of computational intractability for large state \nspaces. \nThe method we propose explicitly models the uncertainty of the robot's position as \na state variable, and generates trajectories through the augmented pose-uncertainty \nspace. By minimizing the positional uncertainty at the goal, the robot reduces the \nlikelihood it becomes lost. We demonstrate experimentally that coastal navigation \nreduces the uncertainty at the goal, especially with degraded localization. \n\n1 Introduction \n\nFor an operational mobile robot, it is essential to prevent becoming lost. Early motion \nplanners assumed that a robot would never be lost - that a robot could always know its \nposition via dead reckoning without error [7]. This assumption proved to be untenable due \nto the small and inevitable inconsistencies in actual robot motion; robots that rely solely on \ndead reckoning for their position estimates lose their position quickly. Mobile robots now \nperform position tracking using a combination of sensor data and odometry [2, 10, 5]. \nHowever, the robot's ability to track its position can vary considerably with the robot's \nposition in the environment. Some parts of the environment may lack good features for lo(cid:173)\ncalization [11]. Other parts of the environment can have a large number of dynamic features \n(for example, people) that can mislead the localization system. Motion planners rarely, if \never, take the robot's position tracking ability into consideration. As the robot's localiza(cid:173)\ntion suffers, the likelihood that the robot becomes lost increases, and as a consequence, the \nrobot is less likely to complete the given trajectory. \nMost localization systems therefore compensate by adding environment-specific knowl(cid:173)\nedge to the localization system, or by adding additional sensing capabilities to the robot, \nto guarantee that the robot can complete every possible path. In general, however, such \nalterations to the position tracking abilities of the robot have limitations, and an alternative \nscheme must be used to ensure that the robot can navigate with maximum reliability. The \nconventional planners represent one end of a spectrum of approaches (figure 1), in that a \nplan can be computed easily, but at the cost of not modelling localization performance. \nAt opposite end of the spectrum is the Partially Observable Markov Decision Process \n\n\f1044 \n\nN Roy and S. Thntn \n\nConventional \n\nPath Planner \n\nTractable \n\nNot Robust \n\nPOMDP \n\nIntracwble \n\nRobust \n\nFigure 1: The continuum of possible approaches to the motion planning, from the robust but in(cid:173)\ntractable POMDP, to the potentially failure-prone but real-time conventional planners. Coastal navi(cid:173)\ngation lies in the middle of this spectrum. \n\n(POMDP). POMDPs in a sense are the brass ring of planning with uncertainty; a POMDP \npolicy will make exactly the right kind of compromise between conventional optimality \nconsiderations and certainty of achieving the goal state. Many people have examined the \nuse of POMDPs for mobile robot navigation [5, 6, 8]. However, computing a POMDP \nsolution is computationally intractable (PSPACE-hard) for large state systems - a mobile \nrobot operating in the real world often has millions of possible states. As a result, many \nof the mobile robot POMDP solutions have made simplifying assumptions about the world \nin order to reduce the state space size. Many of these assumptions do not scale to larger \nenvironments or robots. In contrast, our hypothesis is that only a small number of the \ndimensions of the uncertainty matter, and that we can augment the state with these dimen(cid:173)\nsions to approximate a solution to the POMDP. \nThe coastal navigation model developed in this paper represents a tradeoff between robust \ntrajectories and computational tractability, and is inspired by traditional navigation of ships. \nShips often use the coasts of continents for navigation in the absence of better tools such \nas GPS, since being close to the land allows sailors to determine with high accuracy where \nthey are. The success of this method results from coast lines containing enough information \nin their structure for accurate localization. By navigating sufficiently close to areas of the \nmap that have high information content, the likelihood of getting lost can be minimized. \n\n2 Modelling Uncertainty \n\nThe problem that we address in this paper is how a mobile robot can plan in order to arrive \nat its goal with minimum uncertainty. Throughout this discussion, we will be assuming a \nknown map of the environment [9]. The position, x, of the robot is given as the location \n(x, y) and direction e, defined over a space X = (X, Y, 8). Our localization method is \na grid-based implementation of Markov localization [3, 5]. This method represents the \nrobot's belief in its current position using a 3-dimensional grid over X = (X, Y, 8), which \nallows for a discrete approximation of arbitrary probability distributions. The probability \nthat the robot has a particular pose x is given by the probability p(x). \n\nState Augmentation We can extend the state of the robot from the 3-dimensional pose \nspace to an augmented pose-uncertainty space. We can represent the uncertainty of the \nrobot's positional distribution as the entropy, \n\nH(Px ) = - J p(x) log(p(x)) dx \n\nx \n\n(1) \n\nWe therefore represent the state space of the robot as the tuple \n\nS \n\n(x,y,e,H(x,y,e)) \n(x, H(x)) \n\nState Transitions \nIn order to construct a plan between two points in the environment, \nwe need to be able to represent the effect of the robot's sensing and moving actions. The \nimplementation of Markov localization provides the following equations for the tracking \n\n\fCoastal Navigation with Mobile Robots \n\nthe robot's pose from x to x': \n\np(x'lu) \n\np(x'lz) \n\nJ p(x'lx, u)p(x)dx \n\nx \nap(zlx)p(x) \n\n1045 \n\n(2) \n\n(3) \n\nThese equations are taken from [3, 12], where equation (2) gives the prediction phase of \nlocalization (after motion u), and equation (3) gives the update phase of localization (after \nreceiving observation z). a is a normalizing constant. We extend these equations to the \nfourth dimension as follows: \n\np(slu) \np(slz) \n\n(p(xlu),ll(p(xlu))) \n(p(xlz), ll(p(xlz))) \n\n(4) \n(5) \n\n3 Planning \n\nEquations (4) and (5) provide a mechanism for tracking the robot's state, and in fact contain \nredundant information, since the extra state variable ll(x) is also contained in the probabil(cid:173)\nity distribution p(x). However, in order to make the planning problem tractable, we cannot \nin fact maintain the probabilistic sensing model. To do so would put the planning problem \nfirmly in the domain ofPOMDPs, with the associated computational intractability. Instead, \nwe make a simplifying assumption, that is, that the positional probability distribution of \nthe robot can be represented at all times by a Gaussian centered at the mean x. This allows \nus to approximate the positional distribution with a single statistic, the entropy. In POMDP \nterms, we using the assumption of Gaussian distributions to compress the belief space to a \nsingle dimension. We can now represent the positional probability distribution completely \nwith the vector s, since the width of the Gaussian is represented by the entropy ll(x). \nMore importantly, the simplifying assumption allows us to track the state of the robot de(cid:173)\nterministically. Although the state transitions are stochastic (as in equation (4\u00bb, the obser(cid:173)\nvations are not. At any point in time, the sensors identify the true state of the system, with \nsome certainty given by II (p(xlz)). This allows us to compress the state transitions into a \nsingle rule: \n\np(slu) \n\n(p(xlu),ll(p(xlu,z))) \n\n(6) \n\nThe final position of the robot depends only on the motion command 1l and can be identified \nby sensing z. However, the uncertainty of the pose, ll(p(xlll, z)), is a function not only \nof the motion command but also the sensing. The simplifying assumption of Gaussian \nmodels is in general untenable for localization; however, we shall see that this assumption \nis sufficient for the purpose of motion planning. \nOne final modification must be made to the state transition rule. In a perfect world, it \nwould be possible to predict exactly what observation would be made. However, it is \nexactly the stochastic and noisy nature of real sensors that generates planning difficulty, \nyet the update rule (6) assumes that it is possible to predict measurement z at pose x. \nDeterministic prediction is not possible; however, it is possible to compute probabilities \nfor sensor measurements, and thus generate an expected value for the entropy based on the \nprobability distribution of observations Z, which leads to the final state transition rule: \n\np(slu) \n\n(p(xlu), Ez[ll(p(xlu , z))]) \n\n(7) \n\nwhere Ez[ll(p(xlll, z))] represents the expected value of the entropy of the pose distribu(cid:173)\ntion over the space of possible sensor measurements. \nWith the transition rule in equation (7), we can now compute the transition probabilities \nfor any particular state using a model of the robot's motion, a model of the robot's sensor \nand a map of th~ environment. The probability p(xlu) is given by a model of the robot's \nmotion, and can be easily precomputed for each action u. The expectation term Ez [ll] \n\n\f1046 \n\nN. Roy and S. Thrun \n\ncan also be precomputed for each possible state s. The precomputation of these transition \nprobabilities is very time-intensive, because it requires simulating sensing at each state in \nthe environment, and then computing the posterior distribution. However, as the precom(cid:173)\nputation is a one-time operation for the environment and robot, planning itself can be an \nonline operation and is (in the limit) unaffected by the speed of computing the transition \nprobabilities. \n\n3.1 Computing Trajectories \n\nWith the state update rule given in equation (7), we can now compute the optimal trajectory \nto a particular goal. We would in fact like to compute not just the optimal trajectory from \nthe current robot position, but the optimal action from any position in the world. If the robot \nshould deviate from the expected trajectory for any reason (such as error in the motion, or \ndue to low-level control constraints), interests of efficiency suggest precomputing actions \nfor continuing to the goal, rather than continually replanning as these contingencies arise. \nNote that the motion planning problem as we have now phrased it can be viewed as the \nproblem of computing the optimal policy for a given problem. The Markovian, stochastic \nnature of the transitions, coupled with the need to compute the optimal policy for all states, \nsuggests a value iteration approach. \nValue iteration attempts to find the policy that maximizes the long-term reward [1,4]. The \nproblem becomes one of finding the value function, J(s) which assigns a value to each \nstate. The optimal action at each state can then be easily computed by determining the \nexpected value of each action at each state, from the neighboring values. We use a modified \nform of Bellman's equations to give the value of state J (s) and policy as \nm:x[R(sd + C(s, u) + L p(Sj lSi, u) . J(Sj)] \n\nJ(Si) \n\n(8) \n\nN \n\nj=1 \n\nargmax[R(si) + C(s, u) + L p(Sj lSi, u) . J(Sj)] \n\nN \n\nIl \n\nj=1 \n\n(9) \n\nBy iterating equation (8), the value function iteratively settles to a converged value over all \nstates. Iteration stops when no state value changes above some threshold value. \nIn the above equations, R(sd is the immediate reward at state si, p(Sj lSi , u) is the transition \nprobability from state si to state Sj, and C(s, u) is the cost of taking action u at state s. Note \nthat the form of the equations is undiscounted in the traditional sense, however, the additive \ncost term plays a similar role in that the system is penalized for policies that take longer \ntrajectories. The cost in general is simply the distance of one step in the given direction u, \nalthough the cost of travel close to obstacles is higher, in order to create a safety margin \naround obstacles. The cost of an action that would cause a collision is infinite, preventing \nsuch actions from being used. \nThe immediate reward is localized only at the goal pose. However, the goal pose has a \nrange of possible values for the uncertainty, creating a set of goal states, g. In order to \nreward policies that arrive at a goal state with a lower uncertainty, the reward is scaled \nlinearly with goal state uncertainty. \n\nR( xd = {~ - H (s) \n\nS (; 9 \notherwise \n\n(10) \n\nBy implementing the value iteration given in the equations (8) and (9) in a dynamic pro(cid:173)\ngram, we can compute the value function in O( nkcrid where n is the number of states in \nthe environment (number of positions x number of entropy levels) and kcrit is the num(cid:173)\nber of iterations to convergence. With the value function computed, we can generate the \noptimal action for any state in O(a) time, where a is the number of actions out of each \nstate. \n\n\fCoastal Navigation with Mobile Robots \n\n1047 \n\n4 Experimental Results \n\nFigure 2 shows the mobile robot, Minerva, used for this research. Minerva is a RWI B-18, \nand senses using a 360 0 field of view laser range finder at 10 increments. \n\nFigure 2: Minerva, the B-18 mobile robot used for this research, and an example environment map, \nthe Smithsonian National Museum of American History. The black areas are the walls and obstacles. \nNote the large sparse areas in the center of the environment. \n\nAlso shown in figure 2 is an example environment,the Smithsonian National Museum of \nAmerican History. Minerva was used to generate this map, and operated as a tour-guide in \nthe museum for two weeks in the summer of 1998. This museum has many of the features \nthat make localization difficult -large open spaces, and many dynamic obstacles (people) \nthat can mislead the sensors. \n\nStartillg Positioll \n\nStart POSitiOIl \n\n~ ~~ ~ .,.~ \nI \n\n~\n\n. \n\n. .-A. ~ \n\n(a) Conventional \n\n(b) Coastal \n\n(c) Sensor Map \n\nFigure 3: Two examples in the museum environment. The left trajectory is given by a conventional, \nshortest-path planner. The middle trajectory is given by the coastal navigation planner. The black \nareas correspond to obstacles, the dark grey areas correspond to regions where sensor infonnation is \navailable, the light grey areas to regions where no sensor infonnation is available. \n\nFigure 3 shows the effect of different planners in the sample environment. Panel (a) shows \nthe trajectory of a conventional, shortest distance planner. Note that the robot moves di-\n\n\f1048 \n\nN. Roy and S. Thrun \n\nrectly towards the goal. Panel (b) shows the trajectory given by the coastal planner. In both \nexamples, the robot moves towards an obstacle, and relocalizes once it is in sensor range of \nthe obstacle, before moving towards the goal. These periodic relocalizations are essential \nfor the robot to arrive at the goal with minimum positional uncertainty, and maximum reli(cid:173)\nability. Panel (c) shows the sensor map of the environment. The black areas show obstacles \nand walls, and the light grey areas are where no information is available to the sensors, be(cid:173)\ncause all environmental features are outside the range of the sensors. The dark grey areas \nindicate areas where the information gain from the sensors is not zero; the darker grey the \narea, the better the information gain from the sensors. \n\n20 \n18 \n16 \n14 \n~ 12 \nt;I \n\" to \n,., \ne \nJl \n\n8 \n6 \n4 \n2 \n0 \n\u00b72 \n\n0 \n\nPositional Uncenainty at Goal \n\nConventional Navigation -(cid:173)\ncoastal Navigation -\n\n! ...... + .......... +- ......... .1 ............. I ............... J \n\n0.5 \n\nI \n\n1.5 \n\n2 \n\n2.5 \n\n3 \n\n3.5 \n\n4 \n\n4.5 \n\n5 \n\n5.5 \n\nFigure 4: The performance of the coastal navigation algorithm compared to the coastal motion plan(cid:173)\nner. The graph depicts the entropy of the position probability distribution against the range of the \nlaser sensor. Note that the coastal navigation dramatically improves the certainty of the goal position \nwith shorter range laser sensing. \n\nFigure 4 is a comparison of the average positional certainty (computed as entropy of the \npositional probability) of the robot at its goal position, compared to the range of the laser \nrange sensor. As the range of the laser range gets shorter, the robot can see fewer and \nfewer environmental features - this is essentially a way of reducing the ability of the robot \nto localize itself. The upper line is the performance of a conventional shortest-distance \npath planner, and the lower line is the coastal planner. The coastal planner has a lower \nuncertainty for all ranges of the laser sensor, and is substantially lower at shorter ranges, \nconfirming that the coastal navigation has the most effect when the localization is worst. \n\n5 Conclusion \n\nIn this paper, we have described a particular problem of motion planning - how to guarantee \nthat a mobile robot can reach its goal with maximum reliability. Conventional motion \nplanners do not typically plan according to the ability of the localization unit in different \nareas of the environment, and thus make no claims about the robustness of the generated \ntrajectory. In contrast, POMDPs provide the correct solution to the problem of robust \ntrajectories, however, computing the solution to a POMDP is intractable for the size of the \nstate space for typical mobile robot environments. \nWe propose a motion planner with an augmented state space that represents positional \nuncertainty explicitly as an extra dimension. The motion planner then plans through pose(cid:173)\nuncertainty space, to arrive at the goal pose with the lowest possible uncertainty. This can \nbe seen to be an approximation to a POMDP where the multi-dimensional belief space is \nrepresented as a subset of statistics, in this case the entropy of the belief space. \nWe have shown some experimental comparisons with a conventional motion planner. Not \nonly did the coastal navigation generated trajectories that provided substantial improve(cid:173)\nment of the positional certainty at the goal compared to the conventional planner, but the \nimprovement became more pronounced as the localization was degraded. \n\n\fCoastal Navigation with Mobile Robots \n\n1049 \n\nThe model presented here, however, is not complete. The entire methodology hinges upon \nthe assumption that the robot's probability distribution can be adequately represented by \nthe entropy of the distribution. This assumption is valid if the distribution is restricted \nto a uni-modal Gaussian, however, most Markov localization methods that are based on \nthis assumption fail, because multi-modal, non-Gaussian positional distributions are quite \ncommon for moving robots. Nonetheless, it may be that multiple uncertainty statistics \nalong multiple dimensions (e.g., x and y) may do a better job of capturing the uncertainty \nsufficiently. It is an question for future work as to how many statistics can capture the \nuncertainty of a mobile robot, and under what environmental conditions. \nAcknowledgments \nThe authors gratefully acknowledge the advice and collaboration of Tom Mitchell throughout the \ndevelopment of this work. Wolfram Burgard and Dieter Fox played an instrumental role in the de(cid:173)\nvelopment of earlier versions of this work, and their involvement and discussion of this new model is \nmuch appreciated. This work was partially funded by the Fonds pour la Formation de Chercheurs et \nl' Aide a la Recherche (FCAR). \n\nReferences \n\n[1] R. Bellman. Dynamic Programming. Princeton University Press, NJ, 1957. \n[2] w. Burgard, D. Fox, D. Hennig, and T. Schmidt. Estimating the absolute position of a mobile \n\nrobot using position probability grids. In AAAI, 1996. \n\n[3] D. Fox, W. Burgard, and S. Thrun. Active Markov localization for mobile robots. Robotics and \n\nAutonomous Systems, 25(3-4), 1998. \n\n[4] R. A. Howard. Dynamic Programming and Markov Processes. MIT, 1960. \n[5] L. Kaelbling, A. R. Cassandra, and J. A. Kurien. Acting under uncertainty: Discrete Bayesian \n\nmodels for mobile-robot navigation. In IROS, 1996. \n\n[6] S. Koenig and R. Simmons. The effect of representation and knowledge on goal-directed explo(cid:173)\nration with reinforcement learning algorithms. Machine Learning Journal, 22:227-250,1996. \n\n[7] J .-c. Latombe. Robot Motion Planning. Kluwer Academic Publishers, 1991. \n[8] S. Mahadevan and N. Khaleeli. Robust mobile robot navigation using partially-observable \n\nsemi-Markov decision processes. 1999. \n\n[9] H. P. Moravec and A. Elfes. High resolution maps from wide angle sonar. In ICRA, 1985. \n[10] R. Sim and G. Dudek. Mobile robot localization from learned landmarks. In lROS, 1998. \n[11] H. Takeda, C. Facchinetti, and J.-c. Latombe. Planning the motions of mobile robot in a sensory \n\nuncertainty field. IEEE Trans. on Pattern Analysis and Machine Intelligence, 16(10), 1994. \n\n[12] S. Thrun, D. Fox, and W. Burgard. A probabilistic approach to concurrent mapping and local(cid:173)\n\nization for mobile robots. Machine Learning, 431,1998. \n\n\f", "award": [], "sourceid": 1763, "authors": [{"given_name": "Nicholas", "family_name": "Roy", "institution": null}, {"given_name": "Sebastian", "family_name": "Thrun", "institution": null}]}