{"title": "Keeping Flexible Active Contours on Track using Metropolis Updates", "book": "Advances in Neural Information Processing Systems", "page_first": 859, "page_last": 865, "abstract": null, "full_text": "Keeping flexible active contours on track using \n\nMetropolis updates \n\nTrausti T. Kristjansson \nUniversity of Waterloo \n\ntt kr i s tj @uwate r l oo . ca \n\nBrendan J. Frey \n\nUniversity of Waterloo \nf r ey@uwate r l oo . ca \n\nAbstract \n\nCondensation, a form of likelihood-weighted particle filtering, has been \nsuccessfully used to infer the shapes of highly constrained \"active\" con(cid:173)\ntours in video sequences. However, when the contours are highly flexible \n(e.g. for tracking fingers of a hand), a computationally burdensome num(cid:173)\nber of particles is needed to successfully approximate the contour distri(cid:173)\nbution. We show how the Metropolis algorithm can be used to update a \nparticle set representing a distribution over contours at each frame in a \nvideo sequence. We compare this method to condensation using a video \nsequence that requires highly flexible contours, and show that the new \nalgorithm performs dramatically better that the condensation algorithm. \nWe discuss the incorporation of this method into the \"active contour\" \nframework where a shape-subspace is used constrain shape variation. \n\n1 Introduction \n\nTracking objects with flexible shapes in video sequences is currently an important topic in \nthe vision community. Methods include curve fitting [9], layered models [1, 2, 3], Bayesian \nreconstruction of 3-D models from video[6], and active contour models [10, 14, 15]. \n\nFitting curves to the outlines of objects has been attempted using various methods, includ(cid:173)\ning \"Snakes\" [8, 9], where an energy function is minimized so as to find the best fit. As \nwith other optimization methods, this approach suffers from local maxima. This problem \nis amplified when using real data where edge noise can prevent the fit of the contour to the \ndesired object outline. \nIn contrast, Blake et at. [10] introduced a probabilistic framework for curve fitting and \ntracking. Instead of proposing one single best fit for the contour, a probability distribution \nover contours is found. The distribution is represented as a particle set where each particle \nrepresents one contour shape. Inference in these \"active contour\" models is accomplished \nusing particle filtering. \n\nIn the \"active contour\" method, a probabilistic dynamic system is used to model the dis(cid:173)\ntribution over the outline of the object (the contour) yt and the observations Zt at time t. \nTracking is performed by inference in this model. \n\nThe outline of an object is tracked through successive frames in a video by using a particle \n\n\f(a) \n\n(b) \n\n.... \n\n-..' H \n\n.'IO'III!!I.l' \n\n~\"'.~ \n. \n. \n. , \n\u00b7t~~', \n~ . \n\n, \n\n~\"\\tt~. \n\n\" \n\n#~., \n-'-\n\n\" .. \n\n'\" \" \n\n,J\" \n\nr \n\n,.):, \u2022 \n\n~; \n-'~! ~rJ\" \n- . \n\nIlllll'iIi,W \n\nf~'~ ~ ~~ \n\n, \n\nFigure 1: (a) Condensation with Gaussian dynamics (result for best a = 2 shown) applied \nto a video sequence. The 200 contours corresponding to 200 particles fail to track the \ncomplex outline of the hand. The pictures show every 24th frame of a 211-frame sequence. \n(b) Metropolis updates with only 12 particles keep the contours on track. At each step, 4 \niterations of Metropolis updates are applied with a = 3. \n\ndistribution. Each particle Xn represents single contour Y 1 that approximates the outline \nof the object. For any given frame, a set of particles represents the probability distribution \nover positions and shapes of an object. \n\nIn order to find the likelihood of an observation Zt, given a particle X n , lines perpendicular \nto the contour are examined and edges are detected. A variety of distributions can be \nused to model the likelihood of the edge positions along each line. We assume that the \nposition of the edge belonging to the object is drawn from a Gaussian with mean position \nat the intersection of the contour and the measurement line Y(Sm) and the positions of the \nother edges are drawn from a Poisson distribution. The observation likelihood for a single \nmeasurement line Zm can be simplified to [10] \n\np(zmlxn) ex: 1 + \n\n1 L exp [_Izm ,j - B~sm)xnI2] \n\n(1) \n\nV21fam lQ \n\nj \n\n2aml \n\nwhere Zm,j denotes the coordinates of an edge on measurement line m, and B(sm)xn = \nYn(Sm) is the intersection of the contour and the measurement line (see later). Q = q>.. \nlNotation: We will use Y to refer to a curve, parameterized by x, and yes) for a particular point \non the curve. x refers to a particle consisting of subspace parameters, or in our case, control points. \nn indexes a particle in a particle set, i indexes a component of a particle (i.e. a single control point), \nm indexes measurement lines and t is used as a frame index \n\n\fwhere q is the probability of not observing the edge, and A is the rate of the Poisson process. \n(J'rnl defines the standard deviation in pixels. A multitude of measurement lines is used \nalong the contour, and (assuming independence) the contour likelihood is \n\np(Zlxn) = IIP(ZrnIXn) \n\n(2) \n\nM \nwhere m E M is the set of measurement lines. \n\nAs mentioned, in the condensation algorithm, a particle set is used to represent the distri(cid:173)\nbution of contours. Starting from an initial distribution, a new distribution for a successive \nframe is produced by propagating each particle using the system dynamics P(xtlxt-t} . \nNow the observation likelihood P(Ztlxt) is calculated for each particle, and the particle \nset is resampled with replacement, using the likelihoods as weights. The resulting set of \nparticles approximates the posterior distribution at time t and is then propagated to the next \nframe. \n\nFigure l(a) shows the results of using condensation with 200 particles. As can be seen, the \nresult is poor. Intuitively, the reason condensation fails is that it is highly unlikely to draw \na particle that has raised control points over the four fingers , while keeping the remainder \nfixed. Figure 1 (b) shows the result of using Metropolis updates and 12 particles (equivalent \namount of computation). \n\n2 Keeping contours on track using Metropolis updates \n\nTo reduce the dimensionality of the inference, a subspace is often used. For example, a \nfixed shape is only allowed horizontal and vertical translation. Using a subspace reduces \nthe size of the required particle set, allowing for successful tracking using standard con(cid:173)\ndensation. If the object can deform, a subspace that captures the allowed deformations \nmay be used [15]. This increases the flexibility of the contour, but at the cost of enlarged \ndimensionality. In order to learn such a subspace, a large amount of training samples are \nused, which are supplied by hand fitting contour shapes to a large number of frames. How(cid:173)\never, even moderately detailed contours (say, the outline of a hand) will have many control \npoints that interact in complex ways, making subspace modeling difficult or impractical. \n\n2.1 Metropolis sampling \n\nMetropolis sampling is a popular Markov Chain Monte Carlo method for problems of large \ndimensionality[16, 17]. A new particle is drawn from a proposal density Q(X'; Xt) , where \nin our case, Xt is a particle (i.e. a set of control points) at time t, and x' is a tentative new \nparticle produced by perturbing a subset of the control points. \n\nI \n\nQi(X IXt) = J'<\\? exp -\n\n1 \n\nV 27r(J'2 \n\n[ (x' - Xt)2] \n\n2 2 \n(J' \n\nWe then calculate \n\n. \n\n(3) \n\n(4) \n\nwhere p(Xt IXt-l)p(Zt IXt) is proportional to the posterior probability of observing the con(cid:173)\ntour in that position. If a ~ 1 the proposed particle is accepted. If a < 1, it is accepted \nwith probability a. Since Q is symmetric, the second factor Q(x'; Xt)/Q(Xt; x') = 1. \nMetropolis sampling can be used in the framework of particle propagation in two ways. It \ncan either be used to fit splines around contours of a training set that is used to construct a \nshape subspace, e.g. by PCA, or it can also be used to refine the shapes of the subspace to \nthe actual data during tracking. \n\n\f2.2 B-splines \n\nB-splines or basis function splines are parametric curves, defined as follows: \n\nY(s) = B(s)C \n\n(5) \n\nwhere Y (s) is a two dimensional vector consisting of the 2-D coordinates of a point on the \ncurve, B(s) is a matrix of polynomial basis functions, and C is a vector of control points. \nIn other words, a point along the curve Y (s) is a weighted sum of the values of the basis \nfunctions B(s) for a particular value of s, where the weights are given by the values of \nC. The basis functions of b-splines have the characteristic that they are non-zero over a \nlimited range of s. Thus a particular control point will only affect a portion of the curve. \nFor regular b-splines of order 4 (the basis functions are 3rd degree polynomials), a single \ncontrol point will only affect Y (s) over a range of s of length 4. Conversely, for particular \nSm (m : Sm E SuppartO !(Xi), where i indexes the component of x that has been altered), \nY(Sm) is affected by at most 4 control points (fewer towards the ends). \nAs mentioned before, a detailed contour can have a large number of control points, and thus \nhigh dimensionality and so it is common to use a subspace. In this case C can be written as \nC = W x + Co where W defines a linear subspace and Co is the template of control points, \nand x represents perturbations from the template in the subspace. \n\nIn this work we examine unconstrained models, where no prior knowledge about the de(cid:173)\nformations or dynamics of the object are presumed. In this case W is the identity matrix, \nCo = 0, and x are the actual coordinates of the control points. This allows the contour to \ndeform in any way. \n\n2.3 Metropolis updates in condensation \n\nThe new algorithm consists of two steps: a Metropolis step, followed by a resampling step. \n\n1. Iterate over control points: \n\n\u2022 For one control point at a time, draw a proposal particle by drawing a new \ncontrol point x~ from a 2-D Gaussian centered at the current control point \nXt ,i, Eq. (3), and keeping all others unchanged. \n\n\u2022 Calculate the observation likelihood for the new control point, Eq. (2). \n\u2022 Calculate a (Eq. 4) and reject or accept the new particle \n\n2. Resample \n3. Get next image in video \n\nIf the particle distribution at t - 1 reflects P(xt-lIZl, ... , Zt-t}, the Metropolis updates \nwill converge to P(XtIZl, ... , Zt) [16]. \nAs mentioned above, the affect of altering the position of a control point is to change the \nshape of the contour locally since the basis functions have limited support. Thus, when \nevaluating p(x~lxt-t}p(ZtlxD for a proposed particle, we only need to reexamine mea(cid:173)\nsurement lines and evaluate p(zm,t Ix~,t) for lines in the effected interval and similarly for \np(x~,t IXn,t-l). This allows for an efficient algorithm implementation. \nThe computation eM required to update a single particle using metropolis, compared to \ncondensation is eM = o\u00b7 it . ec where 0 is the order of the b-spline, it is the number \nof iterations, and ec is the number of computations required to update a particle using \ncondensation. Thus, in the case offourth order splines such as the ones we use, the increase \nin computation for a single particle is only four for a single iteration, and eight for two \niterations. However, we have seen that far fewer particles are required. \n\n\fFigure 2: The behavior of the algorithm with Metropolis updates is shown at frame 100 \n(t = 100) as a function of iterations and u. The columns, show, from left to right, 1,2,4 \nand 8 iterations, and the rows, from top to bottom show u = {I, 2, 3, 4}. The rejection \nratio (i.e. the ratio of rejected proposal particles to the total number of proposed particles) \nis shown as a bar on the right side of each image. \n\n3 Results \n\nWe tested our algorithm on the video sequence shown in Figure 1. The contour had 56 2-D \ncontrol points i.e a state space of 112 dimensions. Such high dimensionality is required for \nthe detailed contours required to properly outline the fingers of the hand. \n\nThe results presented are for relatively noise free data, i.e. free from background clutter. \nThis allows us to contrast the performance of using Metropolis updates and standard con(cid:173)\ndensation, for the scenarios of interest, i.e. the learning of subspace models and contour \nrefinement. \n\nFigure l(b) shows the results for the Metropolis updates for 12 particles, 4 iterations and \nu = 3. The figure shows every 24th frame from frame 1 to frame 211. The outline of the \nsplayed fingers is tracked very successfully. \n\nFigure l(a) shows every 24th frame for the condensation algorithm of equivalent complex(cid:173)\nity, using 200 particles and u = 2. This value of u gave the best results for 200 particles. \nAs can be seen, the little finger is tracked moderately well. However the other parts of the \nhand are very poorly tracked. For lower values of u the contour distribution did not track \n\n\fthe hand, but stayed in roughly the position of the initial contour distribution. For higher \nvalues of 0', the contour looped around in the general area of the fingers. \n\nFigure 2 shows the contour distribution for frame 100 and 12 particles, for different num(cid:173)\nbers of iterations and values of 0'. When 0' = 1 and 2 the contour distribution does not keep \nup with the deformation. For 0' = 4 the contour is correctly tracked except for the case of a \nsingle iteration. The rejection ratio (i.e. the ratio of rejected proposal particles to the total \nnumber of proposed particles) is shown as a bar on the right side of each image. Notice \nthat the general trend is that rejection ratio increases as 0' increases, and decreases as the \nnumber of iterations is increased (due to a smaller 0' at each step). \n\nIntuitively, it is not surprising that our new algorithm outperforms standard condensation. \nIn the case of condensation, Gaussian noise is added to each control point at each time \nstep. One particle may be correctly positioned for the little finger and poorly positioned \nfor the forefinger, whereas an other particle may be well positioned around the forefinger \nand poorly positioned around the little finger. In order to track the deformation of the hand, \nsome particles are required that track both the little finger and the forefinger (and all other \nparts too). In contrast the Metropolis updates are likely to reject particles that are locally \nworse than the current particle, but accept local improvements. \n\nIt should be noted that for lower dimensional problems, the increase in tracking perfor(cid:173)\nmance is not as dramatic. E.g. in the case of tracking a rotating head, using a 12 control \npoint b-spline, the two algorithms performed comparably. \n\n4 Future work and conclusion \n\nWe are currently examining the effects of background clutter on the performance of the \nalgorithm. We are also investigating other sequences and groupings of control points for \ngenerating proposal particles, and ways of using subspace models in combination with \nMetropolis updates. \n\nIn this paper we showed how Metropolis updates can be used to keep highly flexible ac(cid:173)\ntive contours on track, and an efficient implementation strategy was presented. For high \ndimensional problems which are common for detailed shapes, the new algorithm presented \nproduces dramatically better results than standard condensation. \n\nAcknowledgments \n\nWe thank Andrew Blake and Dale Schuurmans for helpful discussions. \n\nReferences \n\n[1] 1. Y. A. Wang and E. H. Adelson \"Representing moving images with layers.\" IEEE \nTransactions on Image Processing, Special Issue: Image Sequence Compression, vol. \n3, no. 5. 1994, pp 625-638 \n\n[2] Y. Weiss \"Smoothness in layers: Motion segmentation using nonparametric mixture \n\nestimation.\" Proceedings of IEEE conference on Computer Vision and Pattern Recog(cid:173)\nnition, 1997. \n\n[3] A. Jepson and M. 1. Black \"Mixture models for optical flow computation.\" Proceedings \n\nof the IEEE Conference on Computer Vision and Pattern Recognition. \n\n[4] W. T. Freeman and P. A. Viola \"Bayesian model of surface perception.\" Advances in \n\nNeural Information Processing Systems 10, MIT Press, 1998. \n\n[5] W. Freeman, E. Pasztor,\"Leaming low-level vision,\" Proceedings of the International \n\nConference on Computer Vision, 1999 pp. 1182-1189 \n\n\f[6] N. R. Howe, M. E. Leventon, W. T. Freeman, \"Bayesian Reconstruction of 3D Human \nMotion from Single-Camera Video To appear in:\" Advances in Neural Information \nProcessing Systems 12, edited by S. A. Solla, T. KLeen, and K-R Muller, 2000. TR99-\n37. \n\n[7] G. E. Hinton, Z. Ghahramani and Y. W. Teh \"Learning to parse images.\" In S.A. Solla, \nT. KLeen, and K-R. Muller (eds) Advances in Neural Information Processing Systems \n12, MIT Press, 2000 \n\n[8] D. Terzopoulos, R. Szeliski, \"Tracking with Kalman snakes\" In A. Blake and A. Yuille \n\n(ed) Active Vision, 3-20. MIT Press, Cambridge, MA, 1992 \n\n[9] N. Papanikolopoulos, P. Khosla, T. Kanade \"Vision and Control Techniques for robotic \nvisual tracking,\" In Proc. IEEE Int. Con! Robotics and Autmation 1, 1991, pp. 851 -\n856. \n\n[10] A. Blake, M. Isard \"Active Contours\" Springer-Verlag 1998 ISBN 3540762175 \n[11] 1. MacCormick, A. Blake \"A probabilistic exclusion principle for tracking multiple \n\nobjects\" Proc. 7th IEEE Int. Con! Computer Vision, 1999 \n\n[12] M. Isard, A. Blake \"ICONDENSATION: Unifying low-level and high-level tracking \nin a stochastic framework\" Proc. 5th European Con! Computer Vision, vol. 1 1998, \npp.893-908 \n\n[13] 1. Sullivan, A. Blake, M. Isard, 1. MacCormick, \"Object Localization by Bayesian \n\nCorrelation\" Proc. Int. Con! Computer Vision, 1999 \n\n[14] T. F. Cootes, G. H. Edwards, C. 1. Taylor, \"Active Appearance Models\" Proceedings \n\nof the European conference on Computer Vision, Vol. 2, 1998, pp. 484 - 498 \n\n[15] I. Matthews, J. A. Bangham, R. Harvey and S. Cox. Proc. Auditory-Visual Speech \n\nProcessing (AVSP), 1998 pp. 73-78. \n\n[16] R. M. Neal, \"Probabilistic Inference Using Markov Chain Monte Carlo Methods\", \n\nTechnical Report CR G-TR -93-1, University of Toronto, 1993 \n\n[17] D. J. C MacKay \"Introduction to Monte Carlo methods\" In M.1. Jordan (ed) Learning \n\nin Graphical Models, MIT Press, Cambridge, MA, 1999 \n\n\f", "award": [], "sourceid": 1835, "authors": [{"given_name": "Trausti", "family_name": "Kristjansson", "institution": null}, {"given_name": "Brendan", "family_name": "Frey", "institution": null}]}