{"title": "New Algorithms for 2D and 3D Point Matching: Pose Estimation and Correspondence", "book": "Advances in Neural Information Processing Systems", "page_first": 957, "page_last": 964, "abstract": null, "full_text": "New Algorithms for \n\n2D and 3D Point Matching: \n\nPose Estimation and Correspondence \n\nSteven Goldl , Chien Ping LuI, Anand Rangarajan l , \n\nSuguna Pappul and Eric Mjolsness 2 \n\nDepartment of Computer Science \n\nYale University \n\nNew Haven, CT 06520-8285 \n\nAbstract \n\nA fundamental open problem in computer vision-determining \npose and correspondence between two sets of points in space(cid:173)\nis solved with a novel, robust and easily implementable algorithm. \nThe technique works on noisy point sets that may be of unequal \nsizes and may differ by non-rigid transformations. A 2D varia(cid:173)\ntion calculates the pose between point sets related by an affine \ntransformation-translation, rotation, scale and shear. A 3D to 3D \nvariation calculates translation and rotation. An objective describ(cid:173)\ning the problem is derived from Mean field theory. The objective \nis minimized with clocked (EM-like) dynamics. Experiments with \nboth handwritten and synthetic data provide empirical evidence \nfor the method. \n\n1 \n\nIntroduction \n\nMatching the representations of two images has long been the focus of much research \nin Computer Vision, forming an essential component of many machine-based ob-\n\n1 E-mail address of authors: lastname-firstname@cs.yale.edu \n2Department of Computer Science and Engineering, University of California at San \n\nDiego (UCSD), La Jolla, CA 92093-0114. E-mail: emj@cs.ucsd.edu \n\n\f958 \n\nSteven Gold, Chien Ping Lu, Anand Rangarajan, Suguna Pappu, Eric Mjolsness \n\nject recognition systems. Critical to most matching techniques is the determination \nof correspondence between spatially localized features within each image. This \nhas traditionally been considered a hard problem - especially when the issues of \nnoise, missing or spurious data, and non-rigid transformations are tackled [Grim(cid:173)\nson, 1990] . Many approaches have been tried, with tree-pruning techniques and \ngeneralized Hough transforms being the most common. We introduce anew, ro(cid:173)\nbust and easily implementable algorithm to find such poses and correspondences. \nThe algorithm can determine non-rigid transformations between noisy 2D or 3D \nspatially located unlabeled feature sets despite missing or spurious features. It is \nderived by minimizing an objective function describing the problem with a combi(cid:173)\nnation of optimization techniques, incorporating Mean Field theory, slack variables, \niterative projective scaling, and clocked (EM-like) dynamics. \n\n2 2D with Affine Transformations \n\n2.1 Formulating the Objective \n\nOur first algorithm calculates the pose between noisy, 2D point sets of unequal size \nrelated by an affine transformation - translation, rotation, scale and shear. Given \ntwo sets of points {Xj} and {Yk}, one can minimize the following objective to find \nthe affine transformation and permutation which best maps Y onto X : \nE2D(m, t, A) = L L mjkllXj - t - AYkll2 + g(A) - aLL mjk \n\nJ K \n\nJ \n\nK \n\nj=l k=l \n\nj=l k=l \n\nwith constraints: Vj Ef=l mjk ~ 1 , Vk Ef=l mjk ~ 1 , Vjk mjk ~ 0 and \n\ng(A) = 1a2 + /'i,b2 + AC2 \n\nA is decomposed into scale, rotation, vertical shear and oblique shear as follows: \n\nwhere, \n\ns(a) = ( \n\neOa \n\no ) \ne-\n\nb \n\n,Sh2(c) = \n\n( cosh(c) \n. h( ) \nsm c \n\nsinh(c) ) \ncosh(c) \n\nR(8) is the standard 2x2 rotation matrix. g(A) serves to regularize the affine trans(cid:173)\nformation - bounding the scale and shear components. m is a fuzzy correspondence \nmatrix which matches points in one image with corresponding points in the other \nimage. The constraints on m ensure that each point in each image corresponds to \nat most one point in the other image. However, partial matches are allowed, in \nwhich case the sum of these partial matches may add up to no more than one. The \ninequality constraint on m permits a null match or multiple partial matches. \n\nThe a term biases the objective towards matches. The decomposition of A in the \nabove is not required, since A could be left as a 2x2 matrix and solved for directly in \nthe algorithm that follows . The decomposition just provides for more precise regu(cid:173)\nlarization, i.e., specification of the likely kinds oftransformations. Also Sh2(C) could \n\n\fNew Algorithms for 2D and 3D Point Matching \n\n959 \n\nbe replaced by another rotation matrix, using the singular value decomposition of \nA. \n\nWe transform the inequality constraints into equality constraints by introducing \nslack variables, a standard technique from linear programming; \n\nK \n\nV j L mj k ::; 1 -+ V j L mj k = 1 \n\nK+1 \n\nk=l \n\nk=l \n\nand likewise for the column constraints. An extra row and column are added to \nthe matrix m to hold the slack variables. Following the treatment in [Peterson and \nSoderberg, 1989; Yuille and Kosowsky, 1994] we employ Lagrange multipliers and \nan x log x barrier function to enforce the constraints with the following objective: \n\nE2D(m, t, A) = L L mjkllXj -\n\nJ \n\nK \n\nj=l k=l \n\nt - AYkll 2 + g(A) -\n\nJ \n\nK \n\nct L L mjk \n\nj=l k=l \n\n1 J+1K+1 \n\n+~ L: L mik(logmjk -1) + LJlj(L mjk -1) + LlIk(L mjk -1) \n\nK \n\nJ+l \n\nJ \n\nK+1 \n\n(1) \n\ni=l k=l \n\ni=l \n\nk=l \n\nk=l \n\nj=l \n\nIn this objective we are looking for a saddle point. (1) is minimized with respect to \nm, t, and A which are the correspondence matrix, translation, and affine transform, \nand is maximized with respect to Jl and 1I, the Lagrange multipliers that enforce \nthe row and column constraints for m. m is fuzzy, with the degree of fuzziness \ndependent upon f3. \n\n2.2 The Algorithm \n\nThe algorithm to minimize the above objective proceeds in two phases. In phase \none, while {t, A} are held fixed, m is initialized with a coordinate descent step, \ndescribed below, and then iteratively normalized across its rows and columns until \nthe procedure converges (iterative projective scaling). This phase is analogous to a \nsoftmax update, except that instead of enforcing a one-way, winner-take-all (max(cid:173)\nimum) constraint, a two-way, assignment constraint is being enforced. Therefore \nwe describe this phase as a softassign. In phase two {t, A} are updated using co-\nordinate descent. Then f3 is increased and the loop repeats. Let E2D be the above \nobjective (1) without the terms that enforce the constraints (i.e. the x log x barrier \nfunction and the Lagrange parameters). \n\nIn phase one (softassign) m is updated via coordinate descent: \n\nmjk = exp(-f3-a-) \n\naE2D \nmjk \n\nThen m is iteratively normalized across j and k until Ef=l Ef=l Llmiajk < i : \n\nUsing coordinate descent the {t, A} are updated in phase two. If a term of {A} \ncannot be computed analytically (because of its regularization), Newton's method \n\n\f960 \n\nSteven Gold, Chien Ping Lu, Anand Rangarajan, Suguna Pappu, Eric Mjolsness \n\nis used to compute the root of the function . So if a is a term of {t, A} then in phase \ntwo we update a such that 8!~D = O. Finally f3 is increased and the loop repeats. \nBy setting the partial derivatives of E2D to zero and initializing the Lagrange pa(cid:173)\nrameters to zero, the algorithm for phase one may be derived . Beginning with a \nsmall f3 allows minimization over a fuzzy correspondence matrix m, for which a \nglobal minimum is easier to find. Raising f3 drives the m's closer to 0 or 1, as the \nalgorithm approaches a saddle point. \n\n3 3D with Rotation and Translation \n\nThe second algorithm solves the 3D-3D pose estimation problem with unknown \ncorrespondence. Given two sets of 3D points {Xj} and {Yk} find the rotation R, \ntranslation T, and correspondence m that minimize \n\nE3D(m,T,R) = LLmjkllRXj +T-YkI12-aLLmjk \n\nJ K \n\nj=l k=1 \n\nJ K \n\nj=l k=1 \n\nwith the same constraint on the fuzzy correspondence matrix m as in 2D affine \nmatching. Note that there is no regularization term for the T - R parameters. \nThis algorithm also works in two phases. In the first, m is updated by a soft assign \nas was described for 2D affine matching. In the second phase, m is fixed, and the \nproblem becomes a 3D to 3D pose estimation problem formulated as a weighted least \nsquares problem. The rotation and translation are represented by a dual number \nquaternion (r, s) which corresponds to a screw coordinate transform [Walker et al. , \n1991] . The rotation can be written as R(r) = W(r)tQ(r) and the translation as \nW(r)ts . Using these representations, the objective function becomes \n\nE3D = L L mjkllW(r)tQ(r)xj + W(r)t s - Ykl1 2 \n\nJ K \n\nj=1 k=1 \n\nwhere Xj = (Xj, 0)1 and Yk = (Yk,O)t are the quaternion representations of Xj \nand Yk, respectively. Using the properties that Q(a)b = W(b)a and Q(a)tQ(a) = \nW(a)tW(a) = (ata)J, the objective function can be rewritten as \n\n(2) \n\nwhere \n\nJ K \n\n- L L mjkQ(Yk)tW(Xj) \n\nj=1k=1 \nK \n\n1 J \n2LLmjkI \n\nj=1k=1 \nJ K \n\nC3 = L L mjk(W(Xj) - Q(Yk)). \n\nj=1k=1 \n\n\fNew Algorithms for 2D and 3D Point Matching \n\n961 \n\nWith this new representation, all the information, including the current fuzzy esti(cid:173)\nmate of the correspondence m are absorbed into the three 4-by-4 matrices Cl, C2 , C3 \nin (2), which can be minimized in closed-form [Walker et al., 1991]. \n\n4 Experimental Results \n\nIn this section we provide experimental results for both the 20 and 30 matching \nproblems. As an application of the 20 matching algorithm, we present results in \nthe context of handwritten character recognition. \n\n4.1 Handwritten Character Data \n\nThe data were generated using an X-windows tool which enables us to draw an \nimage with the mouse on a writing pad on the screen. The contours of the images \nare discretized and are expressed as a set of points in the plane. In the experiments \nbelow, we generate 70 points per character on average. \n\nThe inputs to the point matching algorithm are the x-y coordinates generated by the \ndrawing program. No other pre-processing is done. The output is a correspondence \nmatrix and a pose. In Figures 1 and 2, we show the correspondences found between \nseveral images drawn in this fashion.To make the actual point matches easier to \nsee, we have drawn the correspondences only for every other model point. \n\n~ : ..... -_ ........ _ ..........\u2022 \n\nFigure 1: Correspondence of digits \n\nIn one experiment, we drew examples of individual digits, one as a model digit \nand then many different variations of it. \nIn Figure 1, it can be seen that the \n\n\f962 \n\nSteven Gold, Chien Ping Lu, Anand Rangarajan, Suguna Pappu, Eric Mjolsness \n\n.. ~ \n\u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \n.. \n..,. ... ..,.. .. \n\n\u2022\u2022 \n\n\\. \n\nI \n\n\u2022\u2022 \n\n\u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022\u2022 \n\u2022 \u2022\u2022 \n\u2022 \u2022 \n..\"..: .... \n\u2022 \n\u2022 \n\u2022 \u2022 \u2022 \u2022 \n\u2022 \n\u2022 \n\u2022 \n\u2022 \n\u2022 \n' \u2022\u2022 J \n\nFigure 2: Correspondence: \"a\" found in \"cat\", \"0\" found in \"song\" \n\ncorrespondences are good for a large variation from the model digit. For example, \nthe correspondence is invariant to scale. Also, the correspondence is good between \ndistorted digits, as in 3 and 6, or between different forms of a digit as in 4, 3, and \n2. \nIn another experiment (Figure 2), individual letters are correctly identified within \nwords. Here, no pre-processing to segment the cursive word into letters is done. \nThe correspondence returned by the point matching algorithm by itself can be good \nenough for identification. Even similar letters may be differentiated, for example \nthe \"a\" in \"cat\" is correctly identified even though the \"e\" has a similar shape and \nthe \"0\" is correctly identified in \"song\" , despite the similarity of the \"s\". \n\n4.2 Randomly generated point sets: 2D \n\nIn the second set of experiments, randomly generated dot patterns were used. In \neach trial a model is created by randomly generating with a uniform distribution, 50 \npoints on a grid of unit area. Independent Gaussian noise N(O, 0-) is added to each of \nthe points creating a jittered image. Then a fraction, Pd, of points are deleted, and \na fraction, P6, of spurious points are added, randomly on the unit square. Finally, \na randomly generated transformation is applied to the set to generate a new image. \nThe objective then is to recover the transformation and correspondence between \nthe transformed image and the original point set. \nThe transformations we have considered are A -+ (Translation, rotation, scale) and \nthe full affine transformation, A -+ (Translation, rotation, scale, vertical shear, \noblique shear) The transformation parameters, {tz, ty , (J, a, b, c} are bounded in the \nfollowing way: -0.5 < t z , ty < 0.5, -270 < (J < 270 , 0.5 ~ ea ~ 2 where a is the \nscale parameter, and 0.7 ~ eb,ee ~ 1/0.7 where b,c are the parameters for the two \nshears. Each of the parameters is chosen independently and uniformly from the \npossible ranges. \nWe use the error measure ea = 31 a \n1 where ea IS the error measure \nfor parameter a and widtha is the range of permissible values for a . Dividing by \nwidtha is preferable to dividing by aaetual, which incorrectly weights small aaetual \nvalues. The reported error (y axes of Figure 3) is the average error over all the \nparameters. \n\nw'id~h\", \n\ne \u2022 .,ima\"te \n\nocrt.a.1 \n\n\u2022 \n\n\f~ \noS \nQ) \nE 0.4 \ne \nas \n~ \nv \n~ 0.2 \n\ne ~ w \n\no \no \n\nJE \n\nJE \n\nJE \n\nJE \n\nJE \n\nIE \n\nJE \n\n\u2022 \n\n+ \n)( \n\n+ \n)( \n\n+ + \n~ ~ \n\n+ \n+ + + ~ \ng ~ ~ \n\nAffine \n\n\u2022 \u2022 \n\n\u2022 \n\n\u2022 \n\nIE \n\n+ + \n\n\u2022 \n\nJE \n\nJE \n\n~ \noS \nQ) \nE 0.4 \ne as \nco .. 0.2 \ne ~ w o QS()()( \n\n+ + + \n\n~ \n\n+ \n\n~ \n\nNew Algorithms for 2D and 3D Point Matching \n\n963 \n\nThe time to recover the correspondence and transformation for a problem instance \nof 50 points is about 50 seconds on a Silicon Graphics workstation with a R4400 \nprocessor. By varying parameters such as the annealing rate or stopping criterion, \nthis can be reduced to about 20 seconds with some degradation in accuracy. For \neach trial combinations of u E {0.01, 0.02, ... , 0.08} and Pd E {O%, 10%,30%, 50%} \nand P6 E {O%, 10%} were used. \nResults are reported separately for transformations A and A. For each combina(cid:173)\ntion of (U,Pd,P6) 500 test instances were generated. Each data point in Figures \n3.a and 3.b represents the average error measure for these 500 experiments. The \nnoise and/or deletion-addition factor increases the error measure monotonically. As \nexpected, the transformation A has better results than the affine transformation A. \n\nTranslation, rotation, scale \n\n0.6r---~----~----~----~ \n\n0.6r---~----~----~----~ \n\n0.02 0.04 0.06 0.08 \n\nStandard deviation of noise \n\no \n\n0.02 0.04 0.06 0.08 \n\nStandard deviation of noise \n\nFigure 3: 2D Results for Synthetic Data \n\nx: Pd = 0.0,P6 = 0.0, \n+ : Pd = 0.3,P6 = 0.1, \n\n0: Pd = O.l,p& = 0.1 \n* : Pd = 0.5,p& = 0.1 \n\n4.3 Randomly generated point sets: 3D \n\nA test instance for 3D point matching involves generating a random 3D point set \nas a model image, and then generating a test image by applying a random trans(cid:173)\nformation, adding noise and then randomly deleting points. \n\n20 points are generated uniformly within an unit cube. The parameters for the \ntransformation are generated as follows: The three rotation angles for R are selected \nfrom a uniform distribution U[20, 70]. Translation parameters T~, Ty, Tz are selected \nfrom a uniform distribution U[2.5,7.5]. Gaussian noise N(O, u) is added to the \npoints. The objective then is to recover the three translation and three rotation \nparameters and to find the correspondence between this and the original point set. \nThe results are summarized in Figure 4. \n\n\f964 \n\nSteven Gold, Chien Ping Lu, Anand Rangarajan, Suguna Pappu, Eric Mjolsness \n\n... 0.8 \n... \n... \n0 \nWO.6 \nc \n0 \n\u2022 .0:; \nJ2 0.4 \nen \nc \n.= 0.2 \nas \n\n00 \n\nX \n\nX \n\n0 \n\n0 \n\n~ \n\n+ \n\n~ \n+ \n0.2 \n\nX \n\n0 \n\n~ \n\n+ \n\nX \n\n0 \n\n~ \n+ \n\nX \n\n0 \n\n~ \n\n+ \n\n30 \n\n... 0 \nt:: 20 \nW \nC \n0 \n'.0:; \n~ 10 \na: \n\nX \n\n0 \n\nX \n\n0 \n\n~ \n\n~ \n\n0 \n0 \n\n-+ \n0.2 \n\nX \n\n0 \n\n~ \n\n+ \n\nX \n\n0 \n\n~ \n\n+ \n\nX \n\n0 \n\n~ \n\n+ \n0.4 \n\nStandard deviation of noise \n\nStandard deviation of noise \n\n0.6 \n\n0.4 \n\n0.6 \n\nFigure 4: 3D Results for Synthetic Data \n\nx: Pd = 0.0,P8 = 0.0, \n+ : Pd = 0.2,P8 = 0.2, \n\n0 : Pd = 0.1,P8 = 0.1 \n*: Pd = 0.3, ps = 0.3 \n\n5 Conclusion \n\nWe have developed an algorithm for solving 2D and 3D correspondence problems. \nThe algorithm handles significant noise, missing or spurious features , and non(cid:173)\nrigid transformations. Moreover it works with point feature data alone; inclusion \nof other types of feature information could improve its accuracy and speed . This \napproach may also be extended to solve multi-level problems. Additionally, the \naffine transform might be modified to include higher order transformations. It may \nalso be used as a distance measure in learning [Gold et al.,1994] . \n\nAcknowledgements \n\nThis work has been supported by AFOSR grant F49620-92-J-0465, ONR/DARPA \ngrant N00014-92-J-4048, and the Yale Center for Theoretical and Applied Neuro(cid:173)\nscience (CTAN) . Jing Yan developed the handwriting interface. \n\nReferences \n\nS. Gold, E. Mjolsness and A. Rangarajan. (1994) Clustering with a domain-specific \ndistance measure. In J.D. Cowan et al., (eds.), NIPS 6. Morgan Kaufmann. \n\nE. Grimson, (1990) Object Recognition by Computer, Cambridge, MA : MIT Press \n\nC. Peterson and B. Soderberg. (1989) A new method for mapping optimization \nproblems onto neural networks, Int. Journ . of Neural Sys., 1(1) :3:22. \n\nM. W. Walker, L. Shoo and R. Volz . (1991) Estimating 3-D location parameters \nusing dual number quaternions, CVGIP: Image Understanding 54(3):358-367. \nA. L. Yuille and J. J . Kosowsky. (1994) . Statistical physics algorithms that con(cid:173)\nverge. Neural Computation, 6:341-356. \n\n\f", "award": [], "sourceid": 977, "authors": [{"given_name": "Steven", "family_name": "Gold", "institution": null}, {"given_name": "Chien-Ping", "family_name": "Lu", "institution": null}, {"given_name": "Anand", "family_name": "Rangarajan", "institution": null}, {"given_name": "Suguna", "family_name": "Pappu", "institution": null}, {"given_name": "Eric", "family_name": "Mjolsness", "institution": null}]}