{"title": "Bayesian Map Learning in Dynamic Environments", "book": "Advances in Neural Information Processing Systems", "page_first": 1015, "page_last": 1021, "abstract": null, "full_text": "Bayesian Map Learning in Dynamic \n\nEnvironments \n\nKevin P. Murphy \n\nComputer Science Division \n\nUniversity of California \nBerkeley, CA 94720-1776 \nmurphyk@cs.berkeley.edu \n\nAbstract \n\nWe consider the problem of learning a grid-based map using a robot \nwith noisy sensors and actuators. We compare two approaches: \nonline EM, where the map is treated as a fixed parameter, and \nBayesian inference, where the map is a (matrix-valued) random \nvariable. We show that even on a very simple example, online EM \ncan get stuck in local minima, which causes the robot to get \"lost\" \nand the resulting map to be useless. By contrast, the Bayesian \napproach, by maintaining multiple hypotheses, is much more ro(cid:173)\nbust. We then introduce a method for approximating the Bayesian \nsolution, called Rao-Blackwellised particle filtering. We show that \nthis approximation, when coupled with an active learning strategy, \nis fast but accurate. \n\n1 \n\nIntroduction \n\nThe problem of getting mobile robots to autonomously learn maps of their envi(cid:173)\nronment has been widely studied (see e.g., [9] for a collection of recent papers). \nThe basic difficulty is that the robot must know exactly where it is (a problem \ncalled localization), so that it can update the right part of the map. However, to \nknow where it is, the robot must already have a map: relying on dead-reckoning \nalone (Le., integrating the motor commands) is unreliable because of noise in the \nactuators (slippage and drift). \n\nOne obvious solution is to use EM, where we alternate between estimating the \nlocation given the map (the E step), and estimating the map given the location \n(the M step). Indeed, this approach has been successfully used by several groups \n[8, 11, 12]. However, in all of these works, the trajectory of the robot was specified \nby hand, and the map was learned off-line. For fully autonomous operation, and to \ncope with dynamic environments, the map must be learned online. \n\nWe consider two approaches to online learning: online EM, and Bayesian inference, \n\n\f1016 \n\nK. P Murphy \n\na \n\nc \n\nFigure 1: (a) The POMDP represented as a graphical model. L t is the location, \nMt(i) is the label of the i'th grid cell, At is the action, and Zt is the observation. \nDotted circles denote variables that EM treats as parameters. (b) A one-dimensional \ngrid with binary labels (white = 0, black = 1). (c) A two-dimensional grid, with \nfour labels (closed doors, open doors, walls, and free space). \n\nwhere we treat the map as a random variable. In Section 3, we show that the \nBayesian approach can lead to much better results than online EM; unfortunately, \nit is computationally intractable, so in Section 4, we discuss an approximation based \non Rao-BIackwellised particle filtering. \n\n2 The model \n\nWe now precisely define the model that we will use in this paper; it is similar to, but \nmuch simpler than, the occupancy grid model in [12]. The map is defined to be a \ngrid, where each cell has a label which represents what the robot would see at that \npoint. More formally, the map at time t is a vector of discrete random variables, \nM t (i) E {I, ... , No}, where 1 ::; i ::; N L. Of course, the map is not observed \ndirectly, and nor is the robot's location, L t E {I, ... , NL}. What is observed is \nZt E {l, ... ,No}, the label of the cell at the robot's current location, and At E \n{I, ... ,N A}, the action chosen by the robot just before time t. The conditional \nindependence assumptions we are making are illustrated in Figure l(a). We start \nby considering the very simple one-dimensional grid shown in Figure l(b), where \nthere are just two actions, move right (-+) and move left (f-), and just two labels, \noff (0) and on (1). This is sufficiently small that we can perform exact Bayesian \ninference. Later, we will generalize to two dimensions. \n\nThe prior for the location is a delta function with all its mass on the first (left-most) \ncell, independent of AI. The transition model for the location is as follows. \n\nP ( \n\n\u00b71 \n\nr Lt = J Lt-I =~, t =-+ = \n\n. A \n\n) \n\n{ \n\nPa \n1 - Pa \n1 \no \n\nif j = i + 1, j < N \nif j = i, j < N \nif j = i = N \notherwise \n\nwhere Pa is the probability of a successful action, i.e., 1 - Pa is the probability that \nthe robot's wheels slip. There is an analogous equation for the case when At =f-. \nNote that it is not possible to pass through the \"rightmost\" cell; the robot can use \nthis information to help localize itself. \n\nThe prior for the map is a product of the priors for each cell, which are uniform. \n(We could model correlation between neighboring cells using a Markov Random \nField, although this is computationally expensive.) The transition model for the \nmap is a product of the transition models for each cell, which are defined as follows: \n\n\fBayesian Map Learning in Dynamic Environments \n\n1017 \n\nthe probability that a 0 becomes a 1 or vice versa is Pc (probability of change), and \nhence the probability that the cell label remains the same is 1 - Pc. \n\nFinally, the observation model is \n\nPr(Zt = klMt = (mI , ... , mNL), Lt = i) = { Po \n\n1- Po \n\nif mi = k \notherwise \n\nwhere Po is the probability of a succesful observation, Le. , 1 - Po is the probability \nof a classification error. Another way of writing this, that will be useful later, is to \nintroduce the dummy deterministic variable, Z:, which has the following distribu(cid:173)\ntion: Pr(Z: = klMt = (mI, ... ,mNL) , Lt = i) = 8(k,mi) , where 8(a, b) = 1 if a = b \nand is 0 otherwise. Thus Z: acts just like a multiplexer, selecting out a component \nof Mt as determined by the \"gate\" Lt. The output of the multiplexer is then passed \nthrough a noisy channel, which flips bits with probability 1 - Po, to produce Zt. \n\n3 Bayesian learning compared to EM \n\nFor simplicity, we assume that the parameters Po, Pa and Pc, are all known. (In this \nsection, we use Po = 0.9, Pa = 0.8 and Pc = 0, so the world is somewhat \"slippery\", \nbut static in appearance.) The state estimation problem is to compute the belief \nstate Pr(Lt, MtIYl:t), where Yt = (Zt, At) is the evidence at time t; this is equiv(cid:173)\nalent to performing online inference in the graphical model shown in Figure 1(a). \nUnfortunately, even though we have assumed that the components of M t are a pri(cid:173)\nori independent, they become correlated by virtue of sharing a common child, Zt. \nThat is, since the true location of the robot is unknown, all of the cells are possible \ncauses of the observation, and they \"compete\" to \"explain\" the data. Hence all of \nthe hidden variables become coupled, and the belief state has size O(NL2NL). \n\nIf the world is static (Le. , Pc = 0) , we can treat M as a fixed , but unknown, \nparameter; this can then be combined with the noisy sensor model to define an \nHMM with the following observation matrix: \n\nB(i , k) ~ Pr(Zt = kiLt = i; M) = L Pr(Zt = klZ: = j)8(M(i),j) \n\nj \n\nWe can then learn B using EM, as in [8, 11, 12]. (We assume for now that the HMM \ntransition matrix is independent of the map, and encodes the known topology of \nthe grid, Le., the robot can move to any neighboring cell, no matter what its label \nis. We will lift this restriction in the 2D example. \n\nWe can formulate an online version of EM as follows. We use fixed-lag s\"moothing \nwith a sliding window of length W, and compute the expected sufficient statis(cid:173)\ntics (ESS) for the observation matrix within this window as follows: Ot(i, k) = \n2:~=t-W:Z,\"=k LT1t(i) , where LTlt(i) = Pr(LT = iIYl:t)\u00b7 We can compute L using \nthe forwards-backwards algorithm, using Lt-W-Ilt-I as the prior. (The initial con(cid:173)\ndition is L = 11\", where 11\" is the (known) prior for Lo.) Thus the cost per time step is \nO(2W Nl). In the M step, we normalize each row of Ot + d x Ot-l, where 0 < d < 1 \nis a decay constant, to get the new estimate of B . We need to downweight the \nprevious ESS since they were computed using out-of-date parameters; in addition, \nexponential forgetting allows us to handle dynamic environments. [1] discuss some \nvariations on this algorithm. \n\n\f1018 \n\nK. P. Murphy \n\n__ ' .... ......,.'I!'. \n\na \n\nb \n\n_al_ ... , ....... ,,' ... ~Ir! ' . \n\nlei \n\nc \n\nd \n\nFigure 2: (a) The full joint posterior on P(Mt !Yl:t). 0 and 255, on the axis into the \npage, represent the maps where every cell is off and every cell is on, respectively; the \nmode at t = 16 is for map 171, which corresponds to the correct pattern 01010101. \n(b-d) Estimated map. Light cells are more likely to contains Os, so the correct \npattern should have light bars in the odd rows. (b) The marginals of the exact \njoint. (c) Online EM. (d) Omine EM. \n\nAs the window length increases, past locations are allowed to look at more and \nmore future data, and hence their estimates become more accurate; however, the \nspace and time requirements increase. Nevertheless, there are occasions when even \nthe maximum window size (i.e., looking all the way back to 'T = 0) will perform \npoorly, because of the greedy hill-climbing nature of EM. For a simple example of \nthis, consider the environment shown in Figure 1 (b). Suppose the robot starts in \ncell 1, keeps going right until it comes to the end of the \"corridor\", and then heads \nback \"home\". Suppose further that there is a single slippage error at t = 4, so the \nactual path and observation sequence of the robot is as follows: \n\nt \nL t \nZt \nAt \n\n1 \n1 \n0 \n\n2 \n2 \n1 \n--7 \n\n3 \n3 \no \n--7 \n\n4 \n4 \n1 \n--7 \n\n7 \n\n6 \n\n5 \n8 \n456 7 \n101 0 \n--7 \n--7 \n\n--7 \n\n10 \n\n9 \n11 \n876 \n101 \n+-\n\n15 \n\n13 14 \n\n12 \n16 \n54 32 1 \n0101 0 \n+-\n\n+-\n\n+-\n\n--7 \n\n+-\n\n+-\n\n+-\n\n+-\nTo study the effect of this sequence, we computed Pr(Mt , Lt!Yl:t) by applying the \njunction tree algorithm to the graphical model in Figure l(a). We then marginalized \nout L t to compute the posterior P(Mt ): see Figure 2(a). At t = 1, there are 27 \nmodes, corresponding to all possible bit patterns on the unobserved cells. At each \ntime step, the robot thinks it is moving one step to the right. Hence at t = 8, the \nrobot thinks it is in cell 8, and observes O. When it tries to move rightf it knows \nit will remain in cell 8 (since the robot knows where the boundaries are). Hence at \nt = 9, it is almost 70% confident that it is in cell 8. At t = 9, it observes a 1, which \ncontradicts its previous observation of O. There are two possible explanations: this \nis a sensor error, or there was a motor error. Which of these is more likely depends \non the relative values of the sensor noise, Po, and the system noise, Pa. \nIn our \nexperiments, we found that the motor error hypothesis is much more likely; hence \nthe mode of the posterior jumps from the wrong map (in which M(5) = 1) to the \nright map (in which M(5) = 0). Furthermore, as the robot returns to \"familiar \nterritory\", it is able to better localize itself (see Figure 3(a)), and continues to learn \nthe map even for far-away cells, because they are all correlated (in Figure 2(b), the \nentry for cell 8 becomes sharper even as the robot returns to cell 1) \n\nWe now compare the Bayesian solution with EM. Online EM with no smoothing \nwas not able to learn the correct map. Adding smoothing with the maximum \nwindow size of Wt = t did not improve matters: it is still unable to escape the local \n\n\fBayesian Map Learning in Dynamic Environments \n\n1019 \n\nI \n\nc \n\na \n\nb \n\nFigure 3: Estimated location. Light cells are more likely to contain the robot. \n(a) Optimal Bayes solution which marginalizes out the map. (b) Dead-reckoning \nsolution which ignores the map. Notice how \"blurry\" it is. (c) Online EM solution \nusing fixed-lag smoothing with a maximal window length. \n\nminimum in which M(5) = 1, as shown in Figure 2(c). (We tried various values of \nthe decay rate d, from 0.1 to 0.9, and found that it made little difference.) With the \nwrong map, the robot \"gets lost\" on the return journey: see Figure 3(c). Offline \nEM, on the other hand, does very well, as shown in Figure 2(d); although the initial \nestimate oflocation (see Figure 3(b)) is rather diffuse, as it updates the map it can \nuse the benefit of hindsight to figure out where it must have been. \n\n4 Rao-Blackwellised particle filtering \n\n[0,211\"]. \n\nAlthough the Bayesian solution exhibits some desirable properties, its running time \nis exponential in the size of the environment. In this section, we discuss a sequential \nMonte Carlo algorithm called particle filtering (also known as sm filtering, the \nbootstrap filter, the condensation algorithm, survival of the fittest, etc; see [10, 4] \nfor recent reviews). Particle filtering (PF) has already been successfully applied to \nthe problem of (global) robot localization [5]. However, in that case, the state space \nwas only of dimension 3: the unknowns were the position of the robot, (x, y) E lR?, \nand its orientation, () E \nIn our case, the state space is discrete and of \ndimension 0(1 + NL), since we need to keep track of the map as well as the robot's \nlocation (we ignore orientation in this paper). \nParticle filtering can be very inefficient in high-dimensional spaces. The key obser(cid:173)\nvation which makes it tractable in this context is that, if Ll:t were known, then the \nposterior on M t would be factored; hence M t can be marginalized out analytically, \nand we only need to sample Lt. This idea is known in the statistics literature as Rao(cid:173)\nBlackwellisation [10, 41. In more detail, we will approximate the posterior at time t \nusing a set of weighted particles, where each particle specifies a trajectory L 1:t , and \nthe corresponding conditionally factored representation of P(Mt) = TIi P(Mt(i)); \nwe will denote the j'th particle at time t as bF). Note that we do not need to actu(cid:173)\nally store the complete trajectories Ll:t: we only need the most recent value of L. \nThe approach we take is essentially the same as the one used in the conditional lin(cid:173)\near Gaussian models of [4, 3], except we replace the Kalman filter update with one \nwhich exploits the conditionally factored representation of P(Mt ). In particular, \nthe algorithm is as follows: For each particle j = 1, ... , N s , we do the following: \n\n1. Sample L~~l from a proposal distribution, which we discuss below. \n\n2. Update each component of the map separately using L~~l and Zt+1 \n\nPr(Mt~lIL~~l = i,bP),Zt+l) oc Pr(zt+1IMt~l(i)) rrPr(Mi~l(k)IMF)(k)) \n\nk \n\n\f1020 \n\nK. P. Murphy \n\nIIK _\n\nI \n\n_ \n\n... '~I ~ I. \n\nc \n\nI I \n\nd \n\na \n\nb \n\nFigure 4: (a-b) Results using 50 particles. (c-d) Results using BK. \n\n3. Update the weIghts: Wt+l = u t+1 wt \n\n. \n\n(j) \n\n(j) \n\n(j) \n\n,where Ut+l IS defined below. \n\n(j) \n\n. \n\nWe then res ample Ns particles from the normalised weights, using Liu's residual \nresampling algorithm [10], and set WWl = 1/ Ns for all j. We consider two proposal \ndistributions. The first is a simple one which just uses the transition model to \npredict the new location: Pr(Lt+1lb~j), at+1) . In this case, the incremental weight \nis U~~l <X P(zt+1IL~~l,b~j)). The optimal proposal distribution (the one which \nminimizes the variance of the importance weights) takes the most recent evidence \ninto account, and can be shown to have the form Pr(Lt+1lb~j), at+l, Zt+l) with \nincremental weight Ut+1 <X P(Zt+1lb~j)) . Computing this requires marginalizing out \nMt+l and Lt+l' which can be done in O(NL) time (details omitted). \n\nIn Figure 4, we show the results of applying the above algorithm to the same problem \nas in Section 3; it can be seen that it approximates the exact solution- very closely, \nusing only 50 particles. The results shown are for a particular random number seed; \nother seeds produce qualitatively very similar results, indicating that 50 particles \nis in fact sufficient in this case. Obviously, as we increase the number of particles, \nthe error and variance decrease, but the running time increases (linearly). \n\nThe question of how many particles to use is a difficult one: it depends both on \nthe noise parameters and the structure of the environment (if every cell has a \nunique label, localization is easy). Since we are sampling trajectories, the number \nof hypotheses, and hence the number of particles needed, grows exponentially with \ntime. In the above example, the robot was able to localize itself quite accurately \nIn \nwhen it reached the end of the corridor, where most hypotheses \"died off\". \ngeneral, the number of particles will depend on the length of the longest cycle in \nthe environment, so we will need to use active learning to ensure tractability. \n\nIn the dynamic two-dimensional grid world of Figure l(c), we chose actions so as \nto maximize expected discounted reward (using policy iteration), where the reward \nfor visiting cell i is \n\nwhere H(\u00b7) is the normalized entropy. Hence, if the robot is \"lost\", so H(Lt ) ~ 1, \nthe robot will try to visit a cell which it is certain about (see [6] for a better \napproach); otherwise, it will try to explore uncertain cells. After learning the map, \nthe robot spends its time visiting each of the doors, to keep its knowledge of their \nstate (open or closed) up-to-date. \n\nWe now briefly consider some alternative approximate inference algorithms. Exam(cid:173)\nining the graphical structure of our model (see Figure l(a)) , we see that it is identical \n\n\fBayesian Map Learning in Dynamic Environments \n\n1021 \n\nto a Factorial HMM [7] (ignoring the inputs). Unfortunately, we cannot use their \nvariational approximation, because they assume a conditional Gaussian observa(cid:173)\ntion model, whereas ours is almost deterministic. Another popular approximate \ninference algorithm for dynamic Bayes nets (DBNs) is the \"BK algorithm\" [2, 1]. \nThis entails projecting the joint posterior at time t onto a product-of-marginals \nrepresentation \n\nP(Lt, Mt(1) , . . . , Mt(NdIYl:t) = P(LtIYl :t) II P(Mt(i)IYl :t) \n\ni \n\nand using this as a factored prior for Bayesian updating at time t + 1. Given a \nfactored prior, we can compute a factored posterior in O(NL) time by conditioning \non each Lt+1, and then averaging. We found that the BK method does very poorly \non this problem (see Figure 4), because it ignores correlation between the cells. Of \ncourse, it is possible to use pairwise or higher order marginals for tightly coupled \nsets of variables. Unfortunately, the running time is exponential in the size of the \nlargest marginal, and in our case, all the Mt(i) variables are coupled. \n\nAcknowledgments \n\nI would like to thank Nando de Freitas for helping me get particle filtering to work, \nSebastian Thrun for an interesting discussion at the conference, and Stuart Russell for \nencouraging me to compare to EM . This work was supported by grant number ONR \nN00014-97-1-0941. \n\nReferences \n\n[1) X. Boyen and D. Koller. Approximate learning of dynamic models. In NIPS, 1998. \n\n[2) X. Boyen and D. Koller. Tractable inference for complex stochastic processes. In \n\nUAI, 1998. \n\n[3) R. Chen and S. Liu. Mixture Kalman filters. Submitted, 1999. \n\n[4) A. Doucet, S. Godsill, and C. Andrieu. On sequential Monte Carlo sampling methods \n\nfor Bayesian filtering. Statistics and Computing, 1999. \n\n[5) D. Fox, W. Burgard, F. Dellaert, and S. Thrun. Monte carlo localization: Efficient \n\nposition estimation for mobile robots. In AAAI, 1999. \n\n[6) D. Fox, W . Burgard, and S. Thrun. Active Markov localization for mobile robots. \n\nRobotics and Autonomous Systems, 1998. \n\n[7] Z. Ghahramani and M. Jordan. Factorial Hidden Markov Models. Machine Learning, \n\n29:245- 273, 1997. \n\n[8) S. Koenig and R. Simmons. Unsupervised learning of probabilistic models for robot \n\nnavigation. In ICRA, 1996. \n\n[9] D. Kortenkamp, R. Bonasso, and R. Murphy, editors. Artificial Intelligence and \n\nMobile Robots: case studies of successful robot systems. MIT Press, 1998. \n\n[10] J . Liu and R. Chen. Sequential monte carlo methods for dynamic systems. JASA , \n\n93:1032-1044, 1998. \n\n[11) H. Shatkay and L. P. Kaelbling. Learning topological maps with weak local odometric \n\ninformation. In IlCAI, 1997. \n\n[12) S. Thrun, W . Burgard, and D. Fox. A probabilistic approach to concurrent mapping \n\nand localization for mobile robots. Machine Learning, 31:29- 53, 1998. \n\n\f", "award": [], "sourceid": 1716, "authors": [{"given_name": "Kevin", "family_name": "Murphy", "institution": null}]}