{"title": "Resolving motion ambiguities", "book": "Advances in Neural Information Processing Systems", "page_first": 977, "page_last": 984, "abstract": null, "full_text": "Resolving motion ambiguities \n\nK. I. Diamantaras \n\nSiemens Corporate Research \n\n755 College Rd . East \nPrinceton, NJ 08540 \n\nD. Geiger* \n\nCourant Institute, NYU \n\nMercer Street \n\nNew York, NY 10012 \n\nAbstract \n\nWe address the problem of optical flow reconstruction and in par(cid:173)\nticular the problem of resolving ambiguities near edges. They oc(cid:173)\ncur due to (i) the aperture problem and (ii) the occlusion problem, \nwhere pixels on both sides of an intensity edge are assigned the same \nvelocity estimates (and confidence). However, these measurements \nare correct for just one side of the edge (the non occluded one). \nOur approach is to introduce an uncertamty field with respect to \nthe estimates and confidence measures. We note that the confi(cid:173)\ndence measures are large at intensity edges and larger at the con(cid:173)\nvex sides of the edges, i.e. inside corners, than at the concave side. \nWe resolve the ambiguities through local interactions via coupled \nMarkov random fields (MRF) . The result is the detection of motion \nfor regions of images with large global convexity. \n\n1 \n\nIntroduction \n\nIn this paper we discuss the problem of figure ground separation, via optical flow, for \nhomogeneous images (textured images just provide more information for the disam(cid:173)\nbiguation of figure-ground). We address the problem of optical flow reconstruction \nand in particular the problem of resolving ambiguities near intensity edges. We \nconcentrate on a two frames problem, where all the motion ambiguities we discuss \ncan be disambiguiated by the human visual system. \n\n*work done when the author was at the Isaac Newton Institute and at Siemens Corpo(cid:173)\n\nrate Research \n\n977 \n\n\f978 \n\nDiamantaras and Geiger \n\nOptical flow is a 2D (two dimensional) field defined as to capture the projection \nof the 3D (three dimensional) motion field into the view plane (retina). The Horn \nand Schunk[8] formulation of the problem is to impose (i) the brightness constraint \ndE(;~y,t) = 0, where E is the intensity image, and (ii) the smoothness of the velocity \nfield. The smoothness can be thought of coming from a rigidity or quasi-rigidity \nassumption (see Ullman [12]). \n\nWe utilize two improvements which are important for the optical flow computation , \n(i) the introduction of the confidence measure (Nagel and Enkelman [10], Anandan \n[1]) and (ii) the application of smoothness while preserving discontinuities (Geman \nand Geman [6], Blake and Zisserman [2], Mumford and Shah [9]). It is clear that as \nan object moves with respect to a background not only optical flow discontinuities \noccur, but also occlusions occur (and revelations). In stereo, occlusions are related \nto discontinuities (e.g. Geiger et. al 1992 [5]) , and for motion a similar relation \nmust exist. We study ambiguities ocuring at motion discontinuities and occlusions \nin images. \n\nThe paper is organized as follows : Section 2 describes the problem with examples \nand a brief discussion on possible approaches, section 3 presents our approach, with \nthe formulation of the model and a method to solve it , section 4 gives the results . \n\n2 Motion ambiguities \n\nFigure 1 shows two synthetic problems involving a translation and a rotation of \nsimple objects in front of stationary backgrounds. \n\nConsider the case of the square translation (see figure Ia.). Humans perceive the \nsquare translating , although block-matching (and any other matching technique) \ngives translation on both sides of the square edges. Moreover , there are other inter(cid:173)\npretations of the scene, such as the square belonging to the stationary background \nand the outside being a translating foreground with a square hole. The examples \nare synthetic, but emphasize the ambiguities. Real images may have more texture , \nthus many times helping resolve these ambiguities, but not everywhere. \n\n(a) \n\n(b) \n\nFigure 1: Two image sequences of 128 x 128. (a) Square translation of 3 pixels; (b) \n\"Eight\" rotation of 10 0 \u2022 Note that the \"eight\" has concave and convex regions. \n\n\fResolving Motion Ambiguities \n\n979 \n\n3 A Markov random field model \n\nWe describe a model capable of solving these ambiguities. It is based on coupled \nMarkov random fields and thus, based on local processes. Our main contribution is \nto introduce the idea of uncertainty on the estimates and confidence measures. We \npropose a Markov field that allows the estimates of each pixel to be chosen among \na large neighborhood, thus each pixel estimate can be neglected. We show that \nconvex regions of the image do bias the confidence measures such that the final \nmotion solutions are expected to be the ones with global larger convexity Note \nthat locally, one can have concave regions of a shape that give \"wrong\" bias (see \nfigure 1 b). \n\n3.1 Block Matching \n\nBlock matching is the process of correlating a block region of one image, say of size \n(2w M + 1) x (2w M + 1), with a block regIOn of the other image. Block-matching yields \na set of matching errors dir, where (i, j) is a pixel in the image and v = [m, n] is \na displacement vector in a search window of size (2ws + 1) x (2ws + 1) around the \npixel. We define the velocity measurements gij and the covariance matrix Cij as the \nmean and variance of the-vector v = [m, n] averaged according to the distribution \ne \n\n_kdmn \n\n']: \n\ngij = ~ _kdmn \n.] \n\n~ e \n\nm,n \n\nFigure 2 shows the block matching data gij for the two problems discussed above \nand figure 3 shows the correspondent confidence measurse (inverse of the covariance \nmatrix as defined below). \n\n3.2 The aperture problem and confidence \n\nThe aperture problem [7] occurs where there is a low confidence on the measure(cid:173)\nments (data) in the direction along an edge; In particular we follow the approach \nby [1]. \n\nThe eigenvalues AI, A2, of C ij correspond to the variance of distribution of v along \nthe directions of the corresponding eigenvectors VI, V2. The confidence of the esti(cid:173)\nmate should be inversely proportional to the variance of the distribution, i.e. the \nconfidence along direction VI (V2) is ex 1/ Al (ex 1/ A2)' All this confidence informa(cid:173)\ntion can be packaged inside the confidence matrzx defined as follows: \n\nRij=f(Cij+f)-1 \n\n(1) \nwhere \u20ac is a very small constant that guarantees invertibility. Thus the eigenvalues of \nRij are values between 0 and 1 corresponding to the confidence along the directions \nVI and V2, whereas VI and V2 are still eigenvectors of Rij . \nThe confidence measures at straight edges is high perpendincular to the edges and \nlow (zero) along the edges. However, at corners, the confidence is high on both \n\n\f980 \n\nDiamantaras and Geiger \n\ndirections thus through smoothness this result can be propagated through the other \nparts of the image, then resolving the aperture problem. \n\n3.3 The localization problem and a binary decision field \n\nThe localization problem arises due to the local symmetry at intensity edges , where \nboth sides of an edge give the same correspondences. These cases occur when \noccluded regions are homogeneous and so , block matching, pixel matching or any \nmatching technique can not distinguish which side of the edge is being occluded or \nis occluding. Even if one considers edge based methods, the same problem arises in \nthe reconstruction stage, where the edge velocities have to be propagated to the rest \nof the image. In this cases a localization uncertainty is introduced. More precisely, \npixels whose matching block contains a strong feature (e .g. a corner) will obtain a \nhigh-confidence motion estimate along the direction in which this feature moved. \nPixels on both sides of this feature , and at distances less than half the matching \nwindow size, W M , will receive roughly the same motion estimates associated with \nhigh confidences. However, it could have been just one of the two sides that have \nmoved in this direction . In that case this estimate should not be taken into account \non the other side. We note however a bias towards inside of corner regions from the \nconfidence measures. \n\nNote that in a corner, despite both sides getting roughly the same velocity estimate \nand high confidence measures, the inside pixel always get a larger confidence. This \nbias is due to having more pixels outside the edge of a closed contour than outside, \nand occurs at the convex regions (e .g. a corner) . Thus, in general , the convex \nregions will have a stronger confidence measure than outside them . Note that at \nconcavities in the \"eight\" rotation image, the confidence will be higher outside the \n\"eight\" and correct at convex regions . Thus, a global optimization will be required \nto decide which confidences to \"pick up\" . \n\nOur approach to resolve this ambiguity is to allow for the motion estimate at pixel \n(i , j) to select data from a neighborhood Nij , and its goal is to maximize the total \nestimates (taking into account the confidence measures). More precisely, let iij be \nthe vector motion field at pixel (i , j) . We introduce a binary field ai'r that indicates \nwhich data gi+m,j+n in a neighborhood N ij of (i,j) should correspond to a motion \nestimate iij . The size of Nij is given by W M + 1 to overcome the localization \nuncertainty. For a given lattice point (i , j) the boolean parameters ai'r should be \nmutually exclusive, i.e. only one of them, a~\u00b7n\u00b7 , should be equal to 1 indicating \nthat iij should correspond to gi+m. ,j+n. , While the rest a'?;n , m =f:. m* , n =f:. n*, \nshould be zero (or 2:m.n.EN'J a7r n \u2022 = 1). The conditional probability reflects \nboth an uncertainty due to noise and an uncertainty due to spatial localization of \nthe data \n\n\fResolving Motion Ambiguities \n\n981 \n\n3.4 The piecewise smooth prior \n\nThe prior probability of the motion field fij is a piecewise smoothness condition, as \nin [6]. \n\nP(f . a. h, v) = ~1 exp{ -(I: J.L( i-hi) )llitJ - il-1.) 11 2+J-L( I-Vi] )lIiij - ii ,J _1112+~'i) (hlJ +Vi) )) } . \n\nIJ \n\nI \n\n(3) \nwhere hij = 0 (Vij = 0) if there is no motion discontinuity separating pixels (i , j) , \n(i - l,j) ((i,j), (i,j - 1) \n, otherwise hij = 1 (vii = 1). The parameter J.L has to \nbe estimated. We have considered that the cost to create motion discontinuities \nshould be lowered at intensity edges (see Poggio et al. [11]) , i.e Iii = 1(1 - 6eij), \nwhere eij is the intensity edge and 0 ~ 6 ~ 1 and I have to be estimated. \n\n3.5 The posterior distribution \n\nThe posterior distribution is given by Bayes' law \n\nP(f a h vlg R) = \n\n, \n\n'\n\n\" \n\nP(g, R) \n\n\" \n\n1 P(g Rlf a)P(f a h v) = !e-V(f,o ,h,v;g) \n\n(4) \n\n'\n\n\" \n\nZ \n\nwhere \n\nV(j, a , h, v) \n\nL { L a~TIIRi+m,j+n(jij - gi+m ,j+n)11 2 \nij mneN'J \n\n+ J.L(I- hij)llfij - fi_i,jI12 + J.L(I- Vij)llfij - !i,j_1112 + \n\nlij(hij+Vij)} \n\n(5) \n\nIdeally, we would like to mInImIZe V under all \nis the energy of the system. \npossible configurations of the fields f , h, v and a , while obeying the constraint \nEmneN'J ai]n = 1. \n\n3.6 Mean field techniques \n\nIntroducing the inverse temperature parameter (3(= liT) we can obtain the trans(cid:173)\nformed probability distribution \n\nP{3(j, alg , R) = _e-{3V(f,o) \n\n1 \nZ{3 \n\nwhere \n\nZ{3 \n\nU} \n\nL exp{ -(3 L J.L?j Ilfij - Ii-i,j W + J.L~j Ilfij - Ii ,i _1112} \nx (LexP{-(3L L a~TIIRi+m,j+n(jij - gi+m,j+n)11 2 }) \n\nij \n\n{o} \n\nij mneN'J \n\n(6) \n\n(7) \n\n\f982 \n\nDiamantaras and Geiger \n\nwhere J-lij = J-l(1 - Vij) and J-l?j = J-l(1 - hij ). \nWe have to obey the constraint LmnEN'l o:ir = 1. For the sake of simplicity we \nhave assumed that the neighborhood Nij around site (i, j) is N ij = {( i + m, j + \nn): -1 ~ m ~ 1, -1 ~ n ~ I}. The second factor in (7) can be explicitly \ncomputed. Employing the mean field techniques proposed in [3] and extended in \n[4] we can average out the variables h, v and 0: (including the constraint) and yield \n\n1 \n\n:LII( L (exp{-j1\"~+m,j+n(fij - 9i+m,j+n)11 2 }) \n\n{J} ij m,n=-1 \n(elll +el-'lIf'l-f,-1,lI12)(el'l +el-'llf.l-f.o1-dI 2)) \n\n(8) \n\nwhich yields the following effective energy \n\nVeff(f) \n\nsince Z{3 = L{f} e-{3V eff(f) . Using the saddle point approximation, i.e. considering \nZ{3 ~ e -(3Veff Cf) with! minimizing Veff (/; g). the mean field equations become \n\n0= L cii.t ~+m,j+n(hj - 9i+m,J+n) + J-lij!1v lij + J-l7j !1 h iij \n\nmn \n\nwith J-lij = J-l( 1 - Vij) and J-lt \n(hj - h,j -1), and \n\nThe normalization constant Z{3 called the partition function, has the important \nproperty that \n\n(9) \n\nlim - ~ In Z{3 = min {V(f, 0:, h, v)} \n\n{3-00 \n\nfJ \n\n{f,cr,h,v} \n\n(10) \n\nThen using an annealing method we let j1 -\napproaches asymptotically the desired minimum. \n\n00 and the minimum of V{3 = - ~ In Z{3 \n\n\fResolving Motion Ambiguities \n\n983 \n\n4 Results \n\nWe have applied an iterative method along with an annealing schedule to solvE' the \nabove mean field equations for f3 ~ 00. The method was run on the two examples \nalready described. Figure 4 depicts the results of the experiments . The system \nchooses a natural interpretation (in agreement with human perception) , namely \nit interprets the object (e.g. the square in the first example or the eight-shaped \nregion in the second example) moving and the background being stationary. In the \nbeginning of the annealing process the localization field a may produce \"erroneous\" \nresults, however the neighbor information eventually forces the pixels outside the \nmoving object to coincide with the rest of the background which has zero motion . \nFor the pixels inside the object, on the contrary, the neighbor information eventually \nreinforces the adoption of the motion of the edges. \n\nReferences \n\n[1] P. Anandan, \"Measuring Visual Motion from Image Sequences\", PhD thesis. \n\nCOINS Dept., Univ. Massachusetts, Amherst, 1987. \n\n[2] A. Blake and A. Zisserman, \"Visual Reconstruction\", Cambridge, Mass, MIT \n\npress, 1987. \n\n[3] D. Geiger and F. Girosi, \"Parallel and Deterministic Algorithms for MRFs: \n\nSurface Reconstruction and Integration\", IEEE PAMI: 13(5), May 1991. \n\n[4] D. Geiger and A. Yuille, \"A Common Framework for Image Segmentation\" , Int . \n\nJ. Comput. Vision, 6(3) , pp . 227-243, 1991. \n\n[5] D. Geiger and B. Ladendorf and A. Yuille , \"Binocular stereo with occlusion\" , \nComputer Vision- ECCV92, ed. G. Sandini, Springer-Verlag, 588, pp 423-433 , \nMay 1992. \n\n[6] S. Geman and D. Geman , \"Stochastic Relaxation , Gibbs Distributions, and the \n\nBayesian Restoration of Images\", IEEE PAMI 6, pp. 721-741 , 1984. \n\n[7] E. C . Hildreth, \"The measurement of visual motion\" , MIT press, 1983. \n[8] B.K.P. Horn and B.G. Schunk , \"Determining optical flow\", Artificial Intelli(cid:173)\n\ngence, vol 17, pp . 185-203, August 1981. \n\n[9] D. Mumford and J. Shah , \"Boundary detection by minimizing functionals, 1\" , \nProc. IEEE Conf. on Computer Vision & Pattern Recognition , San Francisco, \nCA,1985. \n\n[10] H.-H. Nagel and W. Enkelmann, \"An Investigation of Smoothness Constraints \nfor the Estimation of Displacement Vector Fields from Image Sequences\" , IEEE \nPAMI: 8, 1986. \n\n[11] T. Poggio and E. B. Gamble and J. J. Little, \"Parallel Integration of Vision \n\nModule\" , Science, vol 242, pp . 436-440 , 1988. \n\n[12] S. Ullman, \"The Interpretation of Visual Motion\" , Cambridge, Mass, MIT \n\npress, 1979. \n\n\f984 \n\nDiamantaras and Geiger \n\n(a) \n\n(b) \n\n(c) \n\nFigure 2: Block matching data giJ' Both sides of the edges have the same data \n(and same confidence). White represents motion to the right (x-direction) or up \n(y-direction). Black is the complement. (a) The x-component of the data for the \nsquare translation. (b) The x-component of the data for the rotation and (c) the \ny-component of the data. \n\n(a) \n\n(b) \n\nFigure 3: The confidence R extracted from the block matching data gij. The display \nis the sum of both eigenvalues, i.e. the trace of R. Both sides of the edges have the \nsame confidence. White represents high confidence. (a) For the square translation. \n(b) For the rotation. \n\n(a) \n\n(b) \n\n(c) \n\nFigure 4: The final motion estimation, after 20000 iterations, resolved the ambigu(cid:173)\nities with a natural interpretation of the scene. J.l = 10, 6 = 1\" = 100. (a) square \ntranslation (b) x component of the motion rotation (c) y component of the motion \nrotation \n\n\f", "award": [], "sourceid": 771, "authors": [{"given_name": "K. I.", "family_name": "Diamantaras", "institution": null}, {"given_name": "D.", "family_name": "Geiger", "institution": null}]}