{"title": "Robot Docking Using Mixtures of Gaussians", "book": "Advances in Neural Information Processing Systems", "page_first": 945, "page_last": 951, "abstract": null, "full_text": "Robot Docking using Mixtures of Gaussians \n\nMatthew Williamson* \n\nRoderick Murray-Smith t \n\nVolker Hansent \n\nAbstract \n\nThis paper applies the Mixture of Gaussians probabilistic model, com(cid:173)\nbined with Expectation Maximization optimization to the task of sum(cid:173)\nmarizing three dimensional range data for a mobile robot. This provides \na flexible way of dealing with uncertainties in sensor information, and al(cid:173)\nlows the introduction of prior knowledge into low-level perception mod(cid:173)\nules. Problems with the basic approach were solved in several ways: the \nmixture of Gaussians was reparameterized to reflect the types of objects \nexpected in the scene, and priors on model parameters were included \nin the optimization process. Both approaches force the optimization to \nfind 'interesting' objects, given the sensor and object characteristics. A \nhigher level classifier was used to interpret the results provided by the \nmodel, and to reject spurious solutions. \n\n1 Introduction \n\nThis paper concerns an application of the Mixture of Gaussians (MoG) probabilistic model \n(Titterington et aI., 1985) for a robot docking application. We use the Expectation(cid:173)\nMaximization (EM) approach (Dempster et aI., 1977) to fit Gaussian sub-models to a sparse \n3d representation of the robot's environment, finding walls, boxes, etc .. We have modified \nthe MoG formulation in three ways to incorporate prior knowledge about the task, and the \nsensor characteristics: the parameters of the Gaussians are recast to constrain how they fit \nthe data, priors on these parameters are calculated and incorporated into the EM algorithm, \nand a higher level processing stage is included which interprets the fit of the Gaussians on \nthe data, detects misclassifications, and providing prior information to guide the model(cid:173)\nfitting. \n\nThe robot is equipped with a LIDAR 3d laser range-finder (PIAP, 1995) which it uses to \nidentify possible docking objects. The range-finder calculates the time of flight for a light \npulse reflected off objects in the scene. The particular LIDAR used is not very powerful, \nmaking objects with poor reflectance (e.g., dark, shiny, or surfaces not perpendicular to the \n\n*Corresponding author: MIT AI Lab, Cambridge, MA, USA. rna t t@ai . rni t . edu \ntDept. of Mathematical Modelling, Technical University of Denmark. rod@imm. dtu. dk \ntDaimlerChrysler, Alt-Moabit 96a, Berlin, Germany. hansen@dbag.bIn. dairnierbenz . com \n\n\f946 \n\nM M Williamson, R. Murray-Smith and V. Hansen \n\nlaser beam) invisible. The scan pattern is also very sparse, especially in the vertical direc(cid:173)\ntion, as shown in the scan of a wall in Figure 1. However, if an object is detected, the range \nreturned is accurate (\u00b11-2cm). When the range data is plotted in Cartesian space it forms \na number of sparse clusters, leading naturally to the use of MoG clustering algorithms to \nmake sense of the scene. While the Gaussian assumption is not an ideal model of the data, \nthe generality of MoG, and its ease of implementation and analysis motivated its use over a \nmore specialized approach. The sparse nature of the data inspired the modifications to the \nMoG formulation described in this paper. \n\nModel-based object recognition from dense range images has been widely reported (see \n(Arman and Aggarwal, 1993) for a review), but is not relevant in this case given the sparse(cid:173)\nness of the data. Denser range images could be collected by combining multiple scans, but \nthe poor visibility of the sensor hampers the application of these techniques. The advantage \nof the MoG technique is that the segmentation is \"soft\", and perception proceeds iteratively \nduring learning. This is especially useful for mobile robots where evidence accumulates \nover time, and the allocation of attention is time and state-dependent. The EM algorithm is \nuseful since it is guaranteed to converge to a local maximum. \n\nThe following sections of the paper describe the re-parameterization of the Gaussians to \nmodel plane-like clusters, the formulation of the priors, and the higher level processing \nwhich interprets the clustered data in order to both move the robot and provide prior infor(cid:173)\nmation to the model-fitting algorithm. \n\n-<>.2 \n-0. \ne \n~ -0.6 \n~-08 \n-, \n-, .. \n\n2 \n\nFigure 1: Plot showing data from a LIDAR scan of a wall, plotted in Cartesian space. The \nrobot is located at the origin, with the y axis pointing forward, x to the right, and z up. The \nsparse scan pattern is visible, as well as the visibility constraint: the wall extends beyond \nwhere the scan ends, but is invisible to the LIDAR due to the orientation of the wall \n\n2 Mixture of Gaussians model \n\nThe range-finder returns a set of data, each of which is a position in Cartesian space Xi = \n(Xi, Yi, Zi). The complete set of data D = {Xl ... XN} is modeled as being generated by a \nmixture density \n\nM \n\nP(xn) = L P(xn Ii, JLi, Ei , 1l'i)P( i), \n\nwhere we use a Gaussian as the sub-model, with mean JLi, variance Ei and weight 1l'i' which \nmakes the probability of a particular data point: \n\ni=l \n\nP(xnIJL, E, 1l') = ~ (21l')3/:jEi I1/2 exp ( -~(Xn - JLi)TE;l(xn - JLi)) \n\nM \n\n\fRobot Docking Using Mixtures of Gaussians \n\n947 \n\nGiven a set of data D, the most likely set of parameters is found using the EM algorithm. \nThis algorithm has a number of advantages, such as guaranteed convergence to a local \nminimum, and efficient computational performance. \n\nIn 3D Cartesian space, the Gaussian sub-models form ellipsoids, where the size and orien(cid:173)\ntation are determined by the covariance matrix ~~. In the general case, the EM algorithm \ncan be used to learn all the parameters of ~i. The sparseness of the LIDAR data makes \nthis parameterization inappropriate, as various odd collections of points could be clustered \ntogether. By changing the parameterization of ~~ to better model plane-like structures, the \nsystem can be improved. The reparameterization is most readily expressed in terms of the \neigenvalues Ai and eigenvectors ~ of the covariance matrix ~i = ~Ai ~ -I. \n\nThe covariance matrix of a normal approximation to a plane-like vertical structure will \nhave a large eigenvalue in the z direction, and in the x-y plane one large and one small \neigenvalue. Since ~i is symmetrical, the eigenvectors are orthogonal, v:- I = ~T = ~, \n\nand ~i can be written: \n\no \n\nwhere Oi is the angle of orientation of the ith sub-model in the x-y plane, ai scales the \ncluster in the x and y directions, and bi scales in the z direction. The constant, controls \nthe aspect ratio of the ellipsoid in the x-y plane. I \nThe optimal values of these parameters (a, b) are found using EM, first calculating the \nprobability that data point Xn is modeled by Gaussian i, (htn ) for every data point Xn and \nevery Gaussian i, \n\n7ril~il-1/2 exp (-~(Xn - fli)T~il(Xn -\n\nfli)) \n\nhin = --~M~--------~~~--------~--------~--\n\nLi==1 7ril~~I - 1/2exp (-~(Xn - fldT~il(Xn -\n\nfli))' \n\nThis \"responsibility\" is then used as a weighting for the updates to the other parameters, \n\nfli \n\nLn hinxn \nLn htn ' \n\n2 Ln htn(Xnl - flil)(Xn 2 -\n(Xn2 -\n\nLn htn[(Xnl - fl~I)2 -\n\n) \n\nfli2) \nfli2)2] \n\n-I ( \n\n{) _ ~ t \nt - 2 an \nflid sin 0 + (Xn2 -\nb _ Ln hin (Xn3 -\nLn h tn \n\nt -\n\n(r - l)((xnl -\nLn hin( \n2, Ln hin ' \n\nfl~2) COSO)2 + (Xnl -\nfln3)2 \n\n' \n\nflid 2 + (Xn2 -\n\nfli2)2 \n\nwhere Xnl is the first element of Xn etc. and ( corresponds to the projection of the data into \nthe plane ofthe cluster. It is im~ortant to update the means fli first, and use the new values \nto update the other parameters. Figure 2 shows a typical model response on real LIDAR \ndata. \n\n2.1 Practicalities of application, and results \n\nStarting values for the model parameters are important, as EM is only guaranteed to find a \nlocal optimum. The Gaussian mixture components are initialized with a large covariance, \nallowing them to pick up data and move to the correct positions. We found that initializing \nthe means fli to random data points, rather than randomly in the input space, tended to \n\n1 By experimentation, a value of'Y of 0.01 was found to be reasonable for this application. \n2Intuition for the Oi update can be obtained by considering that (Xnl - fltl) is the x component of \nthe distance between Xn and /.Li, which is IXn - /.Ld cos e, and similarly (Xn2 - /.Li2) is IXn - /.Li I sin e, \nso tan 2() = sin 20 = 2 sin 0 cos 0 = 2(xn1 -1'.1 )(xn 2 -1'.2) \n(X n 1-l'i1 )2 -(Xn2 -1'.2)2 \n\ncos2 0-sin2 0 \n\ncos 20 \n\n. \n\n\f948 \n\nM. M. Williamson, R. Murray-Smith and V. Hansen \n\nO'+ ~~ 1 ;Ui?h \u2022\n----..-~ \n\" \n\n+ \u2022 \n\n... \n\n\u2022 \n\nFigure 2: Example of clustering of the 3d data points. The left hand graph shows the view \nfrom above (the x-y plane), and the right graph shows the view from the side (the y-z \nplane), with the robot positioned at the origin. The scene shows a box at an oblique angle, \nwith a wall behind. The extent of the plane-like Gaussian sub-models is illustrated using \nthe ellipses, which are drawn at a probability of 0.5. \n\nwork better, especially given the sensor characteristics-if the LIDAR returned a range \nmeasurement, it was likely to be part of an interesting object. \n\nDespite the accuracy of measurement, there are still outlying data points, and it is impos(cid:173)\nsible to fully segment the data into separate objects. One simple solution we found was \nto define a \"junk\" Gaussian. This is a sub-model placed in the center of the data, with a \nlarge covariance ~. This Gaussian then becomes responsible for the outliers in the data (i.e. \nsparsely distributed data over the whole scene, none of which are associated with a specific \nobject), allowing the object-modeling Gaussians to work undistracted. \nThe use of EM with the a, b, e parameterization found and represented plane-like data clus(cid:173)\nters better than models where all the elements of the covariance matrix were free to adapt. \nIt also tended to converge faster, probably due to the reduced numbers of parameters in the \ncovariance matrix (3 as opposed to 6). Although the algorithm is constrained to find planes, \nthe parameterization was flexible enough to model other objects such as thin vertical lines \n(say from a table leg). The only problem with the algorithm was that it occasionally found \npoor local minimum solutions, such as illustrated in Figure 3. This is a common problem \nwith least squares based clustering methods (Duda and Hart, 1973) . \n\nO. \n\nOB \n\n07 \n\n06 \n\nos \n\n04 \n\n03 \n\n02 \n\n01 \n\n0 -, \n\n\u2022 \n\n-o.s \n\n\u2022 \u2022 \n\nos \n\n.. \n\nO. \no. \n\n07 \n\n06 \n\nos \n\n04 \n\n03 \n\n0.2 \n\n01 \n\nI ..%.6 \n\n-04 \n\n-02 \n\n02 \n\n04 \n\n06 \n\n08 \n\nFigure 3: Two examples of 'undesirable' local minimum solutions found by EM. Both \ngraphs show the top view of a scene of a box in front of a wall. The algorithm has incor(cid:173)\nrectly clustered the box with the left hand side of the wall. \n\n\fRobot Docking Using Mixtures ofGaussians \n\n949 \n\n3 Incorporating prior information \n\nAs well as reformulating the Gaussian models to suit our application, we also incorporated \nprior knowledge on the parameters of the sub-models. Sensor characteristics are often \nwell-defined, and it makes sense to use these as early as possible in perception, rather than \ndealing with their side-effects at higher levels of reasoning. Here, e.g., the visibility con(cid:173)\nstraint, by which only planes which are almost perpendicular to the lidar rays are visible, \ncould be included by writing P(xn) = I:~~l P(xnli, f3t)P(i)P(visiblelf3i), the updates \ncould be recalculated, and the feature immediately brought into the modeling process. In \naddition, prior knowledge about the locations and sizes of objects, maybe from other sen(cid:173)\nsors, can be used to influence the modeling procedure. This allows the sensor to make \nbetter use of the sparse data. \nFor a model with parameters f3 and data D, Bayes rule gives: \n\nP(f3) II \n\nP(,8ID) = P(D) \n\nP(xnlf3)\u00b7 \n\nNormally the logarithm of this is taken, to give the log-likelihood, which in the case of \nmixtures of Gaussians is \nL(DIf3) = log(p({/-li, 7ri,ai,bi ,6Q)) -log(p(D)) + LlogLp(xnli,/-li,7ri,ai,bi,Oi) \n\nn \n\nTo include the parameter priors in the EM algorithm, distributions for the different pa(cid:173)\nrameters are chosen, then the log-likelihood is differentiated as usual to find the up(cid:173)\ndates to the parameters (McMichael, 1995). The calculations are simplified if the \npriors on all the parameters are assumed to be independent, p( {/-li, 7rt , ai, bt , Od) = \nIt p(/-ldp( 7ri)P( ai)p(bdp( Od\u00b7 \nThe exact form of the prior distributions varies for different parameters, both to cap(cid:173)\nture different behavior and for ease of implementation. For the element means (/-li), \na flat distribution over the data is used, specifying that the means should be among \nthe data points. For the element weights, a multinomial Dirichlet prior can be used, \np(7ri la) = n::~1J n~l 7rf. When the hyperparameter a > 0, the algorithm favours \nweights around 1/ NI, and when -1 < a < 0, weights close to 0 or 1.3 The expected \nvalue of ai (written as ai) can be encoded using a truncated inverse exponential prior \n(McMichael, 1995), setting p(ailai) = Kexp(-at/(2ai)), where K is a normalizing \nfactor. 4 The prior for bi has the same form. Priors for Ot were not used, but could be useful \nto capture the visibility constraint. Given these distributions, the updates to the parameters \nbecome \n\nI:n hin(/, + a; bt = I:n hin (Xn3 - /-ln3) + bt . \n\n2 \n\n-\n\n2 I:n hin \n\nI:n hin \n\nThe update for /-li is the same as before, the prior having no effect. The update for at and \nbt forces them to be near ai and bi , and the update for 7ri is affected by the hyperparameter \na. \nThe priors on ai and bi had noticeable effects on the models obtained. Figure 4 shows the \nresults from two fits, starting from identical initial conditions. By adjusting the size of the \nprior, the algorithm can be guided into finding different sized clusters. Large values of the \nprior are shown here to demonstrate its effect. \n\n3In this paper we make little use of the Q priors, but introducing separate Q;'S for each object \n\ncould be a useful next step for scenes with varying object sizes. \n\n4To deal with the case when a, = 0, the prior is truncated, setting p(a;!a,) = 0 when a, < Perit . \n\n/-li \n\no'i \n\nI:n hin + a \nI:n I:j h jn + a \n\n\f950 \n\nM M Williamson. R. Murray-Smith and V. Hansen \n\n.. \n\n~ \" 6JiiZC3!' \n\n.' . \n\\....t \n\n..... . \n. ~ \n, \n\n1, 4' ~. . , \n\n\u2022 \n\n~ \n\n~ \n\nf \nf.1J \n\n~ . \n\n; \n\n~ \n\n~ \n) \n\n. \n\\....t \n\n.. ~.' ., \n\n\u2022 \n\n., ~ \n\n~, . \n~ \n, \n-\n~ . \n... \n\n~'. \n\n@ \n,. \n\n) \n., \n\n'.' \n\n.. \n\nf \nf:> \n\n~ . \n.. \n\n; \n\nFigure 4: Example of the action of the priors on ai and bi . The photograph shows a \nvisual image of the scene: a box in front of a wall, and the priors were chosen to prefer a \ndistribution matching the wall. The two left hand graphs show the top and side view of the \nscene clustered without priors, while the two right hand graphs use priors on ai and bi . The \npriors give a preference for large values of ai and bi , so biasing the optimization to find a \nmixture component matching the whole wall as opposed to just the top of it. \n\n4 Classification and diagnosis \n\nFEATURES \n\nSENSOR \n\nMODEL FITIING \n\nHIGHER LEVEL MOVE COMMAND \n\nDATA \n\nEM ALGORITHM \n\nPRIOR \n\nPROCESSING \n\nFOR ROBOT \n\nINFORMATION \n\nFigure 5: Schematic of system \n\nThis section describes how higher-level processing can be used to not only interpret the \nclusters fitted by the EM algorithm, but also affect the model-fitting using prior information. \nThe processes of model-fitting and analysis are thus coupled, and not sequential. \n\nThe results of the model fitting are primarily processed to steer the robot. Once the cluster \nhas been recognized as a boxlwaIVetc., the location and orientation are used to calculate \na move command. To perform the object-recognition, we used a simple classifier on a \nfeature vector extracted from the clustered data. The labels used were specific to docking, \nand commonly clustered objects - boxes, walls, thin vertical lines. but also included labels \nfor clustering errors (like those shown in Figure 3). The features used were the values of the \nparameters ai, bi , giving the size of the clusters, but also measures of the visibility of the \nclusters, and the skewness of the within-cluster data. The classification used simple models \nof the probability distributions of the features fi' given the objects OJ (i.e. P(hIOj)), \nusing a set of training data. In addition to moving the robot, the classifier can modify the \nbehavior of the model fitting algorithm. If a poor clustering solution is found, EM can be \nre-run with slightly different initial conditions. If the probable locations or sizes of objects \nare known from previous scans, or indeed from other sensors, then these can constrain the \nclustering through priors, or provide initial means. \n\n\fRobot Docking Using Mixtures ofGaussians \n\n951 \n\n5 Summary \n\nThis paper shows that the Mixture of Gaussians architecture combined with EM optimiza(cid:173)\ntion and the use of parameter priors can be used to segment and analyze real data from the \n3D range-finder of a mobile robot. The approach was successfully used to guide a mobile \nrobot towards a docking object, using only its range-finder for perception. \n\nFor the learning community this provides more than an example of the application of a \nprobabilistic model to a real task. We have shown how the usual Mixture of Gaussians \nmodel can be parameterized to include expectations about the environment in a way which \ncan be readily extended. We have included prior knowledge at three different levels: 1. \nThe use of problem-specific parameterization of the covariance matrix to find expected \npatterns (e.g. planes at particular angles). 2. The use of problem-specific parameter priors \nto automatically rule-out unlikely objects at the lowest level of perception. 3. The results of \nthe clustering process were post-processed by higher-level classification algorithms which \ninterpreted the parameters of the mixture components, diagnosed typical misclassification, \nprovided new priors for future perception, and gave the robot control system new targets. \n\nIt is expected that the basic approach can be fruitfully applied to other sensors, to prob(cid:173)\nlems which track dynamically changing scenes, or to problems which require relationships \nbetween objects in the scene to be accounted for and interpreted. A problem common \nto all modeling approaches is that it is not trivial to determine the number and types of \nclusters needed to represent a given scene. Recent work with Markov-Chain Monte-Carlo \napproaches has been successfully applied to mixtures of Gaussians (Richardson and Green, \n1997), allowing a Bayesian solution to this problem, which could provide control systems \nwith even richer probabilistic information (a series of models conditioned on number of \nclusters). \n\nAcknowledgements \n\nAll authors were employed by Daimler-Benz AG during stages of the work. R. Murray(cid:173)\nSmith gratefully acknowledges the support of Marie Curie TMR grant FMBICT96 I 369. \n\nReferences \nArman, F. and Aggarwal, J. K. (1993). Model-based object recognition in dense-range \n\nimages-a review. ACM Computing Surveys, 25 (1), 5-43. \n\nDempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incom(cid:173)\n\nplete data via the EM algorithm. J. Royal Statistical Society Series B, 39, 1-38. \n\nDuda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York, \n\nWiley. \n\nMcMichael, D. W. (1995). Bayesian growing and pruning strategies for MAP-optimal \nIn 4th lEE International Con! on Artificial \n\nestimation of gaussian mixture models. \nNeural Networks, pp. 364-368. \n\nPIAP (1995). PIAP impact report on TRC lidar performance. Technical Report 1, In(cid:173)\ndustrial Research Institute for Automation and Measure ments, 02-486 Warszawa, AI. \nJerozolimskie 202, Poland. \n\nRichardson, S. and Green, P. J. (1997). On Bayesian anaysis of mixtures with an unknown \n\nnumber of components. Journal of the Royal Statistical Society B, 50 (4), 700-792. \n\nTitterington, D., Smith, A., and Makov, U. (1985). Statistical Analysis of Finite Mixture \n\nDistributions. Chichester, John Wiley & Sons. \n\n\f", "award": [], "sourceid": 1538, "authors": [{"given_name": "Matthew", "family_name": "Williamson", "institution": null}, {"given_name": "Roderick", "family_name": "Murray-Smith", "institution": null}, {"given_name": "Volker", "family_name": "Hansen", "institution": null}]}