{"title": "Non-rigid point set registration: Coherent Point Drift", "book": "Advances in Neural Information Processing Systems", "page_first": 1009, "page_last": 1016, "abstract": null, "full_text": "Non-rigid point set registration: Coherent Point Drift\n ~ Andriy Myronenko Xubo Song Miguel A. Carreira-Perpinan Department of Computer Science and Electrical Engineering OGI School of Science and Engineering Oregon Health and Science University Beaverton, OR, USA, 97006 {myron, xubosong, miguel}@csee.ogi.edu\n\nAbstract\nWe introduce Coherent Point Drift (CPD), a novel probabilistic method for nonrigid registration of point sets. The registration is treated as a Maximum Likelihood (ML) estimation problem with motion coherence constraint over the velocity field such that one point set moves coherently to align with the second set. We formulate the motion coherence constraint and derive a solution of regularized ML estimation through the variational approach, which leads to an elegant kernel form. We also derive the EM algorithm for the penalized ML optimization with deterministic annealing. The CPD method simultaneously finds both the non-rigid transformation and the correspondence between two point sets without making any prior assumption of the transformation model except that of motion coherence. This method can estimate complex non-linear non-rigid transformations, and is shown to be accurate on 2D and 3D examples and robust in the presence of outliers and missing points.\n\n1 Introduction\nRegistration of point sets is an important issue for many computer vision applications such as robot navigation, image guided surgery, motion tracking, and face recognition. In fact, it is the key component in tasks such as object alignment, stereo matching, point set correspondence, image segmentation and shape/pattern matching. The registration problem is to find meaningful correspondence between two point sets and to recover the underlying transformation that maps one point set to the second. The \"points\" in the point set are features, most often the locations of interest points extracted from an image. Other common geometrical features include line segments, implicit and parametric curves and surfaces. Any geometrical feature can be represented as a point set; in this sense, the point locations is the most general of all features. Registration techniques can be rigid or non-rigid depending on the underlying transformation model. The key characteristic of a rigid transformation is that all distances are preserved. The simplest nonrigid transformation is affine, which also allows anisotropic scaling and skews. Effective algorithms exist for rigid and affine registration. However, the need for more general non-rigid registration occurs in many tasks, where complex non-linear transformation models are required. Non-linear non-rigid registration remains a challenge in computer vision. Many algorithms exist for point sets registration. A direct way of associating points of two arbitrary patterns is proposed in [1]. The algorithm exploits properties of singular value decomposition and works well with translation, shearing and scaling deformations. However, for a non-rigid transformation, the method performs poorly. Another popular method for point sets registration is the Iterative Closest Point (ICP) algorithm [2], which iteratively assigns correspondence and finds the least squares transformation (usually rigid) relating these point sets. The algorithm then redetermines the closest point set and continues until it reaches the local minimum. Many variants of ICP\n\n\f\nhave been proposed that affect all phases of the algorithm from the selection and matching of points to the minimization strategy [3]. Nonetheless ICP requires that the initial pose of the two point sets be adequately close, which is not always possible, especially when transformation is non-rigid [3]. Several non-rigid registration methods are introduced [4, 5]. The Robust Point Matching (RPM) method [4] allows global to local search and soft assignment of correspondences between two point sets. In [5] it is further shown that the RPM algorithm is similar to Expectation Maximization (EM) algorithms for the mixture models, where one point set represents data points and the other represents centroids of mixture models. In both papers, the non-rigid transform is parameterized by Thin Plate Spline (TPS) [6], leading to the TPS-RPM algorithm [4]. According to regularization theory, the TPS parametrization is a solution of the interpolation problem in 2D that penalizes the second order derivatives of the transformation. In 3D the solution is not differentiable at point locations. In four or higher dimensions the generalization collapses completely [7]. The M-step in the EM algorithm in [5] is approximated for simplification. As a result, the approach is not truly probabilistic and does not lead, in general, to the true Maximum Likelihood solution. A correlation-based approach to point set registration is proposed in [8]. Two data sets are represented as probability densities, estimated using kernel density estimation. The registration is considered as the alignment between the two distributions that minimizes a similarity function defined by L2 norm. This approach is further extended in [9], where both densities are represented as Gaussian Mixture Models (GMM). Once again thin-plate spline is used to parameterize the smooth non-linear underlying transformation. In this paper we introduce a probabilistic method for point set registration that we call the Coherent Point Drift (CPD) method. Similar to [5], given two point sets, we fit a GMM to the first point set, whose Gaussian centroids are initialized from the points in the second set. However, unlike [4, 5, 9] which assumes a thin-plate spline transformation, we do not make any explicit assumption of the transformation model. Instead, we consider the process of adapting the Gaussian centroids from their initial positions to their final positions as a temporal motion process, and impose a motion coherence constraint over the velocity field. Velocity coherence is a particular way of imposing smoothness on the underlying transformation. The concept of motion coherence was proposed in the Motion Coherence Theory [10]. The intuition is that points close to one another tend to move coherently. This motion coherence constraint penalizes derivatives of all orders of the underlying velocity field (thin-plate spline only penalizes the second order derivative). Examples of velocity fields with different levels of motion coherence for different point correspondence are illustrated in Fig. 1.\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 1: (a) Two given point sets. (b) A coherent velocity field. (c, d) Velocity fields that are less coherent for the given correspondences. We derive a solution for the velocity field through a variational approach by maximizing the likelihood of GMM penalized by motion coherence. We show that the final transformation has an elegant kernel form. We also derive an EM algorithm for the penalized ML optimization with deterministic annealing. Once we have the final positions of the GMM centroids, the correspondence between the two point sets can be easily inferred through the posterior probability of the Gaussian mixture components given the first point set. Our method is a true probabilistic approach and is shown to be accurate and robust in the presence of outliers and missing points, and is effective for estimation of complex non-linear non-rigid transformations. The rest of the paper is organized as follows. In Section 2 we formulate the problem and derive the CPD algorithm. In Section 3 we present the results of CPD algorithm and compare its performance with that of RPM [4] and ICP [2]. In Section 4 we summarize the properties of CPD and discuss the results.\n\n\f\n2 Method\nAssume two point sets are given, where the template point set Y = (y 1 , . . . , yM )T (expressed as a M  D matrix) should be aligned with the reference point set X = (x1 , . . . , xN )T (expressed as a N  D matrix) and D is the dimension of the points. We consider the points in Y as the centroids of a Gaussian Mixture Model, and fit it to the data points X by maximizing the likelihood function. We denote Y0 as the initial centroid positions and define a continuous velocity function v for the template point set such that the current position of centroids is defined as Y = v (Y 0 ) + Y0 . M 1 Consider a Gaussian-mixture density p(x) = m=1 M p(x|m) with x|m  N (ym ,  2 ID ), where Y represents D-dimensional centroids of equally-weighted Gaussians with equal isotropic covariance matrices, and X set represents data points. In order to enforce a smooth motion constraint, we define the prior p(Y|)  exp (-  (Y)), where  is a weighting constant and (Y) is a function 2 that regularizes the motion to be smooth. Using Bayes theorem, we want to find the parameters Y by maximizing the posteriori probability, or equivalently by minimizing the following energy function: E (Y) = - nN log\nM m\n1 n -ym 2 +  (Y) e- 2 x  2 =1\n\n(1)\n\n=1\n\nWe make the i.i.d. data assumption and ignore terms independent of Y. Equation 1 has a similar form to that of Generalized Elastic Net (GEN) [11], which has shown good performance in nonrigid image registration [12]; note that there we directly penalized Y, while here we penalize the transformation v . The  function represents our prior knowledge about the motion, which should be smooth. Specifically, we want the velocity field v generated by template point set displacement to be smooth. According to [13], smoothness is a measure of the \"oscillatory\" behavior of a function. Within the class of differentiable functions, one function is said to be smoother than another if it oscillates less; in other words, if it has less energy at high frequency. The high frequency content of a function can be measured by first high-pass filtering the function, and then measuring the resulting R ~ 2~ ~ power. This can be represented as (v ) = d |v (s)| /G(s) ds, where v indicates the Fourier ~ is some positive function that approaches zero as s  . Here transform of the velocity and G ~ G represents a symmetric low-pass filter, so that its Fourier transform G is real and symmetric. Following this formulation, we rewrite the energy function as: nN\nM m\n1 n -ym 2+ e- 2 x  2 =1\n\nE (v ) = - ~\n\nlog\n\n=1\n\nR\n\nd\n\n|v (s)|2 ~ ds ~ G(s)\n\n(2)\n\nIt can be shown using a variational approach (see Appendix A for a sketch of the proof) that the function which minimizes the energy function in Eq. 2 has the form of the radial basis function:\nM m\n\nv (z) =\n\nwm G(z - y0m )\n\n(3)\n\n=1\n\nWe choose a Gaussian kernel form for G (note it is not related to the Gaussian form of the distribution chosen for the mixture model). There are several motivations for such a Gaussian choice: ~ First, it satisfies the required properties (symmetric, positive definite, and G approaches zero as s  ). Second, a Gaussian low pass filter has the property of having the Gaussian form in both frequency and time domain without oscillations. By choosing an appropriately sized Gaussian filter we have the flexibility to control the range of filtered frequencies and thus the amount of spatial smoothness. Third, the choice of the Gaussian makes our regularizationRterm equivalent to ~ the one in Motion Coherence Theory (MCT) [10]. The regularization term d |v (s)|2 /G(s) ds, ~ with a Gaussian function for G, is equivalent to the sum of weighted squares of all order derivatives R   2m of the velocity field d m=1 m!2m (Dm v )2 [10, 13] , where D is a derivative operator so that D2m v = 2m v and D2m+1 v = ( 2m v ). The equivalence of the regularization term with that of the Motion Coherence Theory implies that we are imposing motion coherence among the points and thus we call our method the Coherent Point Drift (CPD) method. Detailed discussion of MCT can be found in [10]. Substituting the solution obtained in Eq. 3 back into Eq. 2, we obtain\n\n\f\nCPD algorithm:  Initialize parameters ,  ,   Construct G matrix, initialize Y = Y0  Deterministic annealing:  EM optimization, until convergence:  E-step: Compute P  M-step: Solve for W from Eq. 7  Update Y = Y0 + GW  Anneal  =   Compute the velocity field: v (z) = G(z, )W Figure 2: Pseudo-code of CPD algorithm.\n\nE (W) = -\n\nnN\n\nlog\n\n=1\n\nM m\n\ne\n\n, ,2 PM , , 1 xn -y0m - k=1 wk G(y0k -y0m ) , -2, , , \n\n+\n\n=1\n\nWT  tr GW 2\n\nwhere GM M is a square symmetric Gram matrix with elements gij = e WM D = (w1 , . . . , wM )T is a matrix of the Gaussian kernel weights in Eq. 3.\n\n, y -y ,2 , , - 1 , 0i  0j , 2\n\n(\n\n4)\n\nand\n\nOptimization. Following the EM algorithm derivation for clustering using Gaussian Mixture Model [14], we can find the upper bound of the function in Eq. 4 as (E-step): Q(W) =\nM nN m =1\n\nP old (m|xn )\n\n=1\n\nWT  xn - y0m - G(m, )W 2 + tr GW 2 2 2\n\n(\n\n5)\n\nwhere P old denotes the posterior probabilities calculated using previous parameter values, and G(m, ) denotes the mth row of G. Minimizing the upper bound Q will lead to a decrease in the value of the energy function E in Eq. 4, unless it is already at local minimum. Taking the derivative of Eq. 5 with respect to W, and rewriting the equation in matrix form, we obtain (M-step) Q 1 = 2 G(diag (P1))(Y0 + GW) - PX) + GW = 0 (6) , ,2 , ,2 W  , old , , old , 1 , ym -xn , 1 , y m -x n , -2, -2, M  ,  , where P is a matrix of posterior probabilities with pmn = e / m=1 e . The diag () notation indicates diagonal matrix and 1 is a column vector of all ones. Multiplying Eq. 6 by  2 G-1 (which exists for a Gaussian kernel) we obtain a linear system of equations: (diag (P1)) G +  2 I)W = PX - diag (P1) Y0 (7) Solving the system for W is the M-step of EM algorithm. The E step requires computation of the posterior probability matrix P. The EM algorithm is guaranteed to converge to a local optimum from almost any starting point. Eq. 7 can also be obtained directly by finding the derivative of Eq. 4 with respect to W and equating it to zero. This results in a system of nonlinear equations that can be iteratively solved using fixed point update, which is exactly the EM algorithm shown above. The computational complexity of each EM iteration is dominated by the linear system of Eq. 7, which takes O(M 3 ). If using a truncated Gaussian kernel and/or linear conjugate gradients, this can be reduced to O(M 2 ). Robustness to Noise. The use of a probabilistic assignment of correspondences between point sets is innately more robust than the binary assignment used in ICP. However, the GMM requires that each data point be explained by the model. In order to account for outliers, we add an additional uniform pdf component to the mixture model. This new component changes posterior probability matrix P , ,2 , ,2 , old , , old , 1 y -x 1 y -x D - 2 , m n , - 2 , m n , M , , , , (2  2 ) 2 + m=1 e in Eq. 7, which now is defined as pmn = e /( ), where a a defines the support for the uniform pdf. The use of the uniform distribution greatly improves the noise. Free Parameters. There are three free parameters in the method: ,  and  . Parameter  represents the trade off between data fitting and smoothness regularization. Parameter  reflects the strength\n\n\f\nof interaction between points. Small values of  produce locally smooth transformation, while large values of  correspond to nearly pure translation transformation. The value of  serves as a capture range for each Gaussian mixture component. Smaller  indicates smaller and more localized capture range for each Gaussian component in the mixture model. We use deterministic annealing for  , starting with a large value and gradually reducing it according to  =  , where  is annealing rate (normally between [0.92 0.98]), so that the annealing process is slow enough for the algorithm to be robust. The gradual reducing of  leads to a coarse-to-fine match strategy. We summarize the CPD algorithm in Fig. 2.\n\n3 Experimental Results\nWe show the performance of CPD on artificial data with non-rigid deformations. The algorithm is implemented in Matlab, and tested on a Pentium4 CPU 3GHz with 4GB RAM. The code is available at www.csee.ogi.edu/~myron/matlab/cpd. The initial value of  and  are set to 1.0 in all experiments. The starting value of  is 3.0 and gradually annealed with  = 0.97. The stopping condition for the iterative process is either when the current change in parameters drops below a threshold of 10-6 or the number of iterations reaches the maximum of 150. CPD algorithm\n\nRPM algorithm\n\nICP algorithm\n\nFigure 3: Registration results for the CPD, RPM and ICP algorithms from top to bottom. The first column shows template () and reference (+) point sets. The second column shows the registered position of the template set superimposed over the reference set. The third column represents the recovered underlying deformation . The last column shows the link between initial and final template point positions (only every second point's displacement is shown). On average the algorithm converges in few seconds and requires around 80 iterations. All point sets are preprocessed to have zero mean and unit variance (which normalizes translation and scaling). We compare our method on non-rigid point registration with RPM and ICP. The RPM and ICP implementations and the 2D point sets used for comparison are taken from the TPS-RPM Matlab package [4]. For the first experiment (Fig. 3) we use two clean point sets. Both CPD and RPM algorithms produce accurate results for non-rigid registration. The ICP algorithm is unable to escape a local minimum. We show the velocity field through the deformation of a regular grid. The deformation field for RPM corresponds to parameterized TPS transformation, while that for CPD represents a motion coherent non-linear deformation. For the second experiment (Fig. 4) we make the registration problem more challenging. The fish head in the reference point set is removed, and random noise is added. In the template point set the tail is removed. The CPD algorithm shows robustness even in the area of\n\n\f\nmissing points and corrupted data. RPM incorrectly wraps points to the middle of the figure. We have also tried different values of smoothness parameters for RPM without much success, and we only show the best result. ICP also shows poor performance and is stuck in a local minimum. For the 3D experiment (Fig. 5) we show the performance of CPD on 3D faces. The face surface is defined by the set of control points. We artificially deform the control point positions non-rigidly and use it as a template point set. The original control point positions are used as a reference point set. CPD is effective and accurate for this 3D non-rigid registration problem. CPD algorithm\n\nRPM algorithm\n\nICP algorithm\n\nFigure 4: The reference point set is corrupted to make the registration task more challenging. Noise is added and the fish head is removed in the reference point set. The tail is also removed in the template point set. The first column shows template () and reference (+) point sets. The second column shows the registered position of the template set superimposed over the reference set. The third column represents the recovered underlying deformation. The last column shows the link between the initial and final template point positions.\n\n4 Discussion and Conclusion\nWe intoduce Coherent Point Drift, a new probabilistic method for non-rigid registration of two point sets. The registration is considered as a Maximum Likelihood estimation problem, where one point set represents centroids of a GMM and the other represents the data. We regularize the velocity field over the points domain to enforce coherent motion and define the mathematical formulation of this constraint. We derive the solution for the penalized ML estimation through the variational approach, and show that the final transformation has an elegant kernel form. We also derive the EM optimization algorithm with deterministic annealing. The estimated velocity field represents the underlying non-rigid transformation. Once we have the final positions of the GMM centroids, the correspondence between the two point sets can be easily inferred through the posterior probability of\n\n\f\n(a)\n3 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -1.5 0 1 2 -2 -1.5 -1 -0.5 0 0 1 3 4 2\n\n(b)\n3 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 2 0.5 0 1 1.5 -2\n\n(c)\n\n-1\n\n2 -2 -1.5 -1 -0.5 0 0 0.5 1 1.5 -2\n\n-1\n\n-0.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n-2\n\n-1\n\n(d)\n\n(e)\n\n(f)\n\nFigure 5: The results of CPD non-rigid registration on 3D point sets. (a, d) The reference face and its control point set. (b, e) The template face and its control point set. (c, f) Result obtained by registering the template point set onto the reference point set using CPD. the GMM components given the data. The computational complexity of CPD is O(M 3 ), where M is the number of points in template point set. It is worth mentioning that the components in the point vector are not limited to spatial coordinates. They can also represent the geometrical characteristic of an object (e.g., curvature, moments), or the features extracted from the intensity image (e.g., color, gradient). We compare the performance of the CPD algorithm on 2D and 3D data against ICP and RPM algorithms, and show how CPD outperforms both methods in the presence of noise and outliers. It should be noted that CPD does not work well for large in-plane rotation. Typically such transformation can be first compensated by other well known global registration techniques before CPD algorithm is carried out. The CPD method is most effective when estimating smooth non-rigid transformations.\n\nAppendix A\nE=- nN log\nM m\n\ne\n\n1 -2\n\n=1\n\n=1\n\nxn -ym 2 +   2\n\nR\n\nd\n\n|v (s)|2 ~ ds ~ G(s)\n\n(8)\n\nConsider the function in Eq. 8, where ym = y0m + v (y0m ), R nd y0m is the initial position of ym a ~ point. v is a continuous velocity function and v (y0m ) = d v (s)e2i<y0m ,s> ds in terms of its Fourier transform v . The following derivation follows [13]. Substituting v into equation Eq. 8 we ~ obtain: nN\nM m , , xn -y0m -R d R -1, 2,\nv(s)e2i<y0m ,s> ds , ~ \n\nE (v ) = - ~\n\nlog\n\ne\n\n,2 , ,\n\n+\n\n=1\n\n=1\n\n 2\n\nR\n\nd\n\n|v (s)|2 ~ ds ~ G(s)\n\n(9)\n\nIn order to find the minimum of this functional we take its functional derivatives with respect to v , ~ (v ~ so that E(t)) = 0, t  Rd : v ~\n1 n -ym ~ 2 1 (x - y ) R v(s) e2i<y0m ,s> ds e- 2 x  n m d  v (t) 2 ~ + M 1 - 2 xn -ym 2  m=1 e M 1 R - 2 xn -ym 2 1 2 i<y0 ,t> nN   |v (s)|2 ~ v (-t) ~  m=1 e  2 (xn - ym )e ds = - =0 + M 1 ~ ~ - 2 xn -ym 2 2 d  v (t) G(s) ~ G(t)  e =1 m=1\n\nn  E (v ) ~ =-  v (t) ~ =1\n\nN\n\nM\n\nm=1\n\n\f\nWe now define the coefficients amn = tive as:\nM nN m\n\ne\n\n-1 2\n\nxn -ym 2 \nm=1 -1 e2\n\nPM\n\nx\n\n1 (xn -ym ) 2 n -ym \n\n2\n\n, and rewrite the functional deriva-\n\nM m nN v (-t) ~ v (-t) ~ =- ( amn )e2i<y0m ,t> +  = 0 (10) ~ (t) ~ G G(t) =1 =1 =1 =1 N 1 ~ Denoting the new coefficients wm =  n=1 amn , and changing t to -t, we multiply by G(t) on both sides of this equation, which results in:\n\n-\n\namn e2i<y0m ,t> + \n\n~ v (t) = G(-t) ~\n\nM m\n\nwm e-2i<y0m ,t>\n\n(11)\n\n=1\n\n~ Assuming that G is symmetric (so that its Fourier transform is real), and taking the inverse Fourier transform of the last equation, we obtain: M M m m v (z) = G(z)  wm  (z - y0m ) = wm G(z - y0m ) (12)\n=1 =1\n\nSince wm depend on v through amn and ym , the wm that solve Eq. 12 must satisfy a self consistency ~ equation equivalent to Eq. 7. A specific form of regularizer G results in a specific basis function G.\n\nAcknowledgment\nThis work is partially supported by NIH grant NEI R01 EY013093, NSF grant IIS0313350  (awarded to X. Song) and NSF CAREER award IIS0546857 (awarded to Miguel A. Carreira-Perpinan). ~\n\nReferences\n[1] G.L. Scott and H.C. Longuet-Higgins. An algorithm for associating the features of two images. Royal Society London Proc., B-244:2126, 1991. [2] P.J. Besl and N. D. McKay. A method for registration of 3-d shapes. IEEE Trans. Pattern Anal. Mach. Intell., 14(2):239256, 1992. [3] S. Rusinkiewicz and M. Levoy. Efficient variants of the ICP algorithm. Third International Conference on 3D Digital Imaging and Modeling, page 145, 2001. [4] H Chui and A. Rangarajan. A new algorithm for non-rigid point matching. CVPR, 2:4451, 2000. [5] H. Chui and A. Rangarajan. A feature registration framework using mixture models. IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA), pages 190197, 2000. [6] F. L. Bookstein. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell., 11(6):567585, 1989. [7] R Sibson and G. Stone. Comp. of thin-plate splines. SIAM J. Sci. Stat. Comput., 12(6):13041313, 1991. [8] Y. Tsin and T. Kanade. A correlation-based approach to robust point set registration. ECCV, 3:558569, 2004. [9] B. Jian and B.C. Vemuri. A robust algorithm for point set registration using mixture of gaussians. ICCV, pages 12461251, 2005. [10] A.L. Yuille and N.M. Grzywacz. The motion coherence theory. Int. J. Computer Vision, 3:344353, 1988.  [11] M. A. Carreira-Perpinan, P. Dayan, and G. J. Goodhill. Differential priors for elastic nets. In Proc. of the ~ 6th Int. Conf. Intelligent Data Engineering and Automated Learning (IDEAL'05), pages 335342, 2005.  [12] A. Myronenko, X Song, and M. A. Carreira-Perpinan. Non-parametric image registration using general~ ized elastic nets. Int. Workshop on Math. Foundations of Comp. Anatomy: Geom. and Stat. Methods in Non-Linear Image Registration, MICCAI, pages 156163, 2006. [13] F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219269, 1995. [14] C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.\n\n\f\n", "award": [], "sourceid": 2962, "authors": [{"given_name": "Andriy", "family_name": "Myronenko", "institution": null}, {"given_name": "Xubo", "family_name": "Song", "institution": null}, {"given_name": "Miguel", "family_name": "Carreira-Perpi\u00f1\u00e1n", "institution": null}]}