{"title": "Nonrigid Structure from Motion in Trajectory Space", "book": "Advances in Neural Information Processing Systems", "page_first": 41, "page_last": 48, "abstract": "Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this lateral approach is that we do not need to estimate any basis vectors during computation. Instead, we show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) bases, can be used to effectively describe most real motions. This results in a significant reduction in unknowns, and corresponding stability, in estimation. We report empirical performance, quantitatively using motion capture data and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, articulated motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing).", "full_text": "Nonrigid Structure from Motion in Trajectory Space\n\nLUMS School of Science and Engineering\n\nIjaz Akhter\n\nLahore, Pakistan\n\nYaser Sheikh\n\nCarnegie Mellon University\n\nPittsburgh, PA, USA\n\nakhter@lums.edu.pk\n\nyaser@cs.cmu.edu\n\nLUMS School of Science and Engineering\n\nCarnegie Mellon University\n\nTakeo Kanade\n\nPittsburgh, PA, USA\ntk@cs.cmu.edu\n\nSohaib Khan\n\nLahore, Pakistan\n\nsohaib@lums.edu.pk\n\nAbstract\n\nExisting approaches to nonrigid structure from motion assume that the instanta-\nneous 3D shape of a deforming object is a linear combination of basis shapes,\nwhich have to be estimated anew for each video sequence. In contrast, we pro-\npose that the evolving 3D structure be described by a linear combination of basis\ntrajectories. The principal advantage of this approach is that we do not need to\nestimate any basis vectors during computation. We show that generic bases over\ntrajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to\ncompactly describe most real motions. This results in a signi\ufb01cant reduction in\nunknowns, and corresponding stability in estimation. We report empirical per-\nformance, quantitatively using motion capture data, and qualitatively on several\nvideo sequences exhibiting nonrigid motions including piece-wise rigid motion,\npartially nonrigid motion (such as a facial expression), and highly nonrigid motion\n(such as a person dancing).\n\n1 Introduction\nNonrigid structure from motion is the process of recovering the time varying 3D coordinates of\npoints on a deforming object from their 2D locations in an image sequence. Factorization ap-\nproaches, \ufb01rst proposed for recovering rigid structure by Tomasi and Kanade in [1], were extended\nto handle nonrigidity in the seminal paper by Bregler et al. in [2]. The key idea in [2] is that ob-\nserved shapes can be represented as a linear combination of a compact set of basis shapes. Each\ninstantaneous structure, such as the mouth of a smiling actor shown in Figure 1(a), is expressed\nas a point in the linear space of shapes spanned by the shape basis. A number of approaches that\ndevelop the use of shape basis have subsequently been proposed, including [3, 4, 5]. Since the space\nof spatial deformations is highly object speci\ufb01c, the shape basis need to be estimated anew for each\nvideo sequence. The shape basis of a mouth smiling, for instance, cannot be recycled to compactly\nrepresent a person walking.\nIn this paper, we posit that representing nonrigid structure as a combination of basis shapes is one\nof two ways of looking at the space-time structure induced by P points seen across F frames. In-\nstead of a shape space representation, we propose looking across time, representing the time-varying\nstructure of a nonrigid object as a linear combination of a set of basis trajectories, as illustrated in\nFigure 1(b). The principal advantage of taking this \u201clateral\u201d approach arises from the fact that com-\npact representation in trajectory space is better motivated physically than compact representation in\nshape space. To see this, consider a deformable object being acted upon by a force. The extent\nof its deformation is limited by the force that can be applied. Hence, a tree swaying in the wind\nor a person walking cannot arbitrarily and randomly deform; the trajectories of their points are a\nfunction of the speed of the wind and the \ufb02exing of muscles respectively. Deformations are, there-\n\n\f(a)\n\n(b)\n\nFigure 1: 3D points on a smiling mouth: a comparison of shape and trajectory space. (a) In approaches that\nrepresent the time varying structure in shape space, all 3D points observed at one time instant are projected onto\na single point in the shape space. S1, S2,\u00b7\u00b7\u00b7 , Sk each represent a shape basis vector. (b) In our approach, we\nrepresent the time varying structure in trajectory space, where a 3D point\u2019s trajectory over time is projected to\na single point in the trajectory space. \u03b81, \u03b82,\u00b7\u00b7\u00b7 , \u03b8k each represent a trajectory basis vector. P points observed\nacross F frames are expressed as F projected points in shape space and P points in trajectory space.\n\nfore, constrained by the physical limits of actuation to remain incremental, not random, across time.\nSince this property is, to a large degree, ubiquitous, basis can be de\ufb01ned in trajectory that are object\nindependent.\nWe show that while the inherent representative power of both shape and trajectory projections of\nstructure data are equal (a duality exists), the signi\ufb01cant reduction in number of unknowns that\nresults from knowing the basis apriori allows us to handle much more nonrigidity of deformation\nthan state of the art methods, like [4] and [5]. In fact, most previous results consider deformations\nwhich have a large rigid component, such as talking-head videos or the motion of a swimming\nshark. To the best of our knowledge, we are the \ufb01rst to show reasonable reconstructions of highly\nnonrigid motions from a single video sequence without making object speci\ufb01c assumptions. For all\nresults, we use the same trajectory basis, the Discrete Cosine Transform (DCT) basis, underlining\nthe generic nature of the trajectory space representation. A useful byproduct of this approach is\nthat structure is automatically compressed for compact transmission without the need for post facto\ncompression or the overhead transmission of object speci\ufb01c basis.\n\n2 Related work\nIf deformation of a 3D scene is unconstrained, the structure observed in each image would be in-\ndependent of those in other images.\nIn this case, recovering structure from motion is ill-posed,\nequivalent to \ufb01nding 3D structure from a single 2D image at each time instant. To make nonrigid\nstructure recovery tractable, some consistency in the deformation of structure has to be imposed.\nOne early measure of consistency that was applied assumes that the scene consists of multiple rigid\nobjects which are moving independently [6, 7, 8]. However, the \ufb01rst general solution to the problem\nof nonrigid structure recovery was introduced by Bregler et al. in [2], approximating the structure at\neach time instant as a linear combination of basis shapes. They recovered the structure, the shape ba-\nsis and the camera rotations simultaneously, by exploiting orthonormality constraints of the rotation\nmatrices. Xiao et al. [4] showed that these orthonormality constraints alone lead to ambiguity in the\nsolution, and introduced additional constraints to remove ambiguity. In [9] Xiao et al. proposed a\nrank de\ufb01cient basis. Other extensions of the work by Bregler et al. include [10] which improved the\nnumerical stability of the estimation process and [3] which introduced a Gaussian prior on the shape\ncoef\ufb01cients. Common to all of these approaches is that results are shown on objects which have a\nsigni\ufb01cant number of points that move rigidly, such as faces. Some approaches, such as [11] make\nexplicit use of this fact to initialize rotation matrices, while others favor such sequences for stability\nin estimation.\nIn contrast to this entire corpus of work, which approximate structure by a shape basis, we propose a\nnew representation of time varying structure, as a collection of trajectories. We not only demonstrate\nthat a compact trajectory space can be de\ufb01ned, but also that the basis of this trajectory space can\nbe pre-de\ufb01ned, removing a large number of unknowns from the estimation process altogether. The\nduality of spatial and temporal representations has been hinted at earlier in literature. Shashua [12]\ndiscusses the duality of the joint image space and the joint point space in the context of multiview\ngeometry. Zelnik-Manor and Irani [13] have exploited a similar duality for an alternate approach to\n\nG\u001fG G!5\u001f5 5!\fFigure 2: As described in Equation 3, each trajectory is represented as a linear combination of k prede\ufb01ned\nbasis trajectories. In this paper, we use DCT basis to compactly represent trajectories.\n\nsegmenting video sequences. Ours is the \ufb01rst paper to use this dual representation in the structure\nfrom motion problem, and to note that a generic basis can be de\ufb01ned in trajectory space which\ncompactly represents most real trajectories.\n\n3 Representing Nonrigid Structure\nThe structure at a time instant t can be represented by arranging the 3D locations of the P points in\na matrix S(t) \u2208 R3\u00d7P ,\n\n(cid:34)\n\n(cid:35)\n\n.\n\nS(t) =\n\nXt1\nYt1\nZt1\n\nXtP\n\u00b7\u00b7\u00b7 YtP\nZtP\n\nThe complete time varying structure can be represented by concatenating these instantaneous struc-\ntures as S3F\u00d7P = [S(1)T S(2)T \u00b7\u00b7\u00b7 S(F )T ]T . In [2], each instantaneous shape matrix S(t) is\napproximated as a linear combination of basis shapes,\n\nS(t) =\n\ncj(t)Sj,\n\n(1)\n\nwhere Sj \u2208 R3\u00d7P is a basis shape and cj(t) is the coef\ufb01cient of that basis shape. If the set of\nobserved structures can be compactly expressed in terms of k such basis shapes, S has a rank of at\nmost 3k. This rank constraint can be restated by rearrangement of S as the following rank k matrix,\n\n\uf8ee\uf8ef\uf8f0 X11\n\n...\nXF 1\n\nS\u2217 =\n\n\u00b7\u00b7\u00b7 X1P\n...\n\nY11\n...\n\u00b7\u00b7\u00b7 XF P YF 1\n\n\u00b7\u00b7\u00b7 Y1P\n...\n\nZ11\n...\n\u00b7\u00b7\u00b7 YF P ZF 1\n\n\u00b7\u00b7\u00b7 Z1P\n...\n\u00b7\u00b7\u00b7 ZF P\n\n\uf8f9\uf8fa\uf8fb .\n\n(cid:88)\n\nj\n\nk(cid:88)\n\nj=1\n\nThe row space of this matrix corresponds to the shape space. Since the row and column space of a\nmatrix are of equal dimension, it follows that the columns of S\u2217 are also spanned by k vectors. We\ncall the column space of this matrix the trajectory space and note that it enjoys a dual relationship\nwith the shape space. Speci\ufb01cally, if the time varying shape of an object can be expressed by a\nminimum of k shape basis, then there exist exactly k trajectory basis vectors that can represent the\nsame time varying shape.\nTo represent the time varying structure in terms of trajectory basis, we consider the structure\nas a set of trajectories, T (i) = [Tx(i)T Ty(i)T Tz(i)T ]T , (see Figure 1(b)) where Tx(i) =\n[X1i,\u00b7\u00b7\u00b7 , XF i]T , Ty(i) = [Y1i,\u00b7\u00b7\u00b7 , YF i]T , Tz(i) = [Z1i,\u00b7\u00b7\u00b7 , ZF i]T are the x, y, and z coordinates\nof the ith trajectory. As illustrated in Figure 2, we describe each trajectory as a linear combination\nof basis trajectory,\n\nTx(i) =\n\naxj(i)\u03b8j, Ty(i) =\n\nayj(i)\u03b8j, Tz(i) =\n\nazj(i)\u03b8j,\n\n(3)\n\nwhere \u03b8j \u2208 RF is a trajectory basis vector and axj(i), ayj(i) and azj(i) are the coef\ufb01cients corre-\nsponding to that basis vector. The time varying structure matrix can then be factorized into an inverse\nprojection matrix and coef\ufb01cient matrix as S3F\u00d7P = \u03983F\u00d73kA3k\u00d7P , where A = [AT\nz ]T\nand\n\ny AT\n\nx AT\n\nk(cid:88)\n\nj=1\n\nk(cid:88)\n\nj=1\n\n\uf8eb\uf8ed ax1(1)\n\n...\n\naxk(1)\n\nAx =\n\n\u00b7\u00b7\u00b7\n\n\u00b7\u00b7\u00b7\n\nax1(P )\n\n...\n\naxk(P )\n\n\uf8f6\uf8f8 , \u0398 =\n\n\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed\n\n\u03b8T\n1\n\n\u03b8T\nF\n\n\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8 ,\n\n\u03b8T\n1\n\n...\n\n\u03b8T\nF\n\n\u03b8T\n1\n\n\u03b8T\nF\n\n(2)\n\n(4)\n\n==N\u001f\u0002=N \u0002\u0002\u0002\u0002=N\u0006\fv11\n\n...\n\n\uf8eb\uf8ec\uf8ec\uf8ec\uf8ed u11\n\uf8eb\uf8ed R1\n\nuF 1\nvF 1\n\nW =\n\nR =\n\n\uf8f6\uf8f7\uf8f7\uf8f7\uf8f8 .\n\uf8f6\uf8f8 ,\n\n. . .\n. . .\n\nu1P\nv1P\n\n...\n\n. . . uF P\n. . .\nvF P\n\n...\n\nRF\n\nThis measurement matrix can be decomposed as W = RS where R is a 2F \u00d7 3F matrix,\n\nand Rt is a 2 \u00d7 3 orthographic projection matrix. In the previous section we showed that S = \u0398A,\nas a result we can further factorize W as\n\n(5)\nwhere \u039b = R\u0398. Since \u039b is a 3F \u00d7 3k matrix, the rank of matrix W will be at most 3k. This is a\ndual property to the rank constraint de\ufb01ned by [2]. We can use SVD to factorize W as,\n\nW = R\u0398A = \u039bA,\n\nW = \u02c6\u039b \u02c6A.\n\nIn general, the matrix \u02c6\u039b and \u02c6A will not be equal to \u039b and A respectively, because the above factor-\nization is not unique. For any invertible 3k \u00d7 3k matrix Q, \u02c6\u039bQ and Q\u22121A are also valid factoriza-\ntions. Therefore, to recover metric structure we need to estimate the recti\ufb01cation matrix Q such that\nthe following equations hold true,\n\n\u039b = \u02c6\u039bQ,\n\nA = Q\u22121 \u02c6A.\n\n(6)\n\n5 Metric Upgrade\nThe problem of recovering the rotation and structure is reduced to estimating the recti\ufb01cation matrix\nQ. The elements of matrix \u039b are,\n\nHere \u03b8i represents a truncated basis for transformation from coef\ufb01cient space to original space. The\nprincipal bene\ufb01t of the trajectory space representation is that a basis can be pre-de\ufb01ned that can\ncompactly approximate most real trajectories. A number of bases such as the Hadamard Transform\nbasis, the Discrete Fourier Transform basis, and the Discrete Wavelet Transform basis can all com-\npactly represent trajectories in an object independent way. In this paper, we use the Discrete Cosine\nTransform basis set to generate \u0398 (shown in Figure 2) for all reconstructions results shown. The\nef\ufb01cacy of the DCT basis has been demonstrated for compressing motion capture data, [14], and has\nbeen effective in our experiments as well.\n\n4 Nonrigid Structure and Motion Factorization\nThe measured 2D trajectories are contained in a 2F \u00d7 P measurement matrix W, containing the\nlocation of P image points across F frames,\n\n\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8 .\n\n\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed\n\n\u039b =\n\nr1\n1\u03b8T\n1\nr1\n4\u03b8T\n1\n\nrF\n1 \u03b8T\nF\nrF\n4 \u03b8T\nF\n\n2\u03b8T\nr1\n1\nr1\n5\u03b8T\n1\n...\nrF\n2 \u03b8T\nF\nrF\n5 \u03b8T\nF\n\nr1\n3\u03b8T\n1\nr1\n6\u03b8T\n1\n\nrF\n3 \u03b8T\nF\nrF\n6 \u03b8T\nF\n\n\uf8eb\uf8ed \u03b81,1R1\n\n...\n\n\u03b8F,1RF\n\n\uf8f6\uf8f8 .\n\n\u02c6\u039bQ||| =\n\nInstead of estimating the whole matrix Q, to rectify \u02c6\u039b and \u02c6A it is suf\ufb01cient to estimate only three\ncolumns of Q. Let us de\ufb01ne Q||| to be the \ufb01rst, k + 1st and 2k + 1st columns of the matrix Q. From\nEquation 6, if we just use Q||| instead of Q, we get\n\n(7)\n\n\fFigure 3: Effect of increasing camera motion on reconstruction stability. Reconstruction stability\nis measured in terms of condition number of matrix \u039bT \u039b with different values of k and different\nvalues of F . Synthetic rotations were generated by revolving the camera around the z-axis and\ncamera motion was measured in terms of the angle the camera moved per frame.\n\nThis equation shows that the unknowns in matrix Q||| can be found by exploiting the fact that Ri\nis a truncated rotation matrix (as was done in [1]). Speci\ufb01cally, if \u02c6\u039b2i\u22121:2i denotes the two rows of\nmatrix \u02c6\u039b at positions 2i \u2212 1 and 2i, then we have\n\u02c6\u039b2i\u22121:2iQ|||QT||| \u02c6\u039bT\n\n2i\u22121:2i = \u03b82\n\ni,1I2\u00d72,\n\n(8)\n\nwhere I2\u00d72 is an identity matrix, giving three indepedent constraints for each image i. Therefore\nfor F frames, we have 3F constraints and 9k unknowns in Q|||. Hence at least 3k non-degenerate\nimages are required to estimate Q|||. Once Q||| has been computed, using a nonlinear minimization\nroutine (e.g. Levenberg Marquardt), we can estimate the rotation matrices, and therefore R, using\nEquation 7.\nOnce R is known, it can be multiplied with the (known) DCT basis matrix \u03983F\u00d73k to recover\nthe matrix \u039b2F\u00d73k = R2F\u00d73F \u03983F\u00d73k. The coef\ufb01cients can then be estimated by solving the\nfollowing overconstrained linear system of equations,\n\n\u039b2F\u00d73k \u02c6A3k\u00d7P = W2F\u00d7P .\n\n(9)\n\n6 Results\nThe proposed algorithm has been validated quantitatively on motion capture data over different\nactions and qualitatively on video data. We have tested the approach extensively on highly nonrigid\nhuman motion like volleyball digs, handstands, karate moves and dancing. Figure 4 shows a few\nsample reconstructions of different actors. As mentioned earlier, we choose DCT as the basis for\nthe trajectory space. In subsequent experiments, we compare our approach with [5] and [9] (we use\ncode kindly provided by the respective authors). The results, data and the code used to produce the\nresults are all shared at http://cvlab.lums.edu.pk/nrsfm.\nIn nonrigid structure from motion, the key relationship that determines successful reconstruction\nis the one between the degree of deformation of the object, measured by the number of basis k\nrequired to approximate it and the degree of camera motion. To test the relationship between k,\ncamera motion and reconstruction stability, we constructed \u039b matrices using different values of k\nand synthetic rotations around the z-axis, at various magnitudes of motion per frame. In Figure 3,\nthe reconstruction stability, measured by the condition number of \u039bT \u039b, is shown as k is varied\nbetween 2 and 6, for 200, 400, and 800 frames (at different angular velocities per frame). The plots\ncon\ufb01rm intuition: the smaller the degree of object deformation and the larger the camera motion,\nthe more stable reconstruction tends to be.\nFor quantitative evaluation of reconstruction accuracy we used the drink, pickup, yoga, stretch,\nand dance actions from the CMU Mocap database, and the shark dataset of [3]. Multiple rigid\nbody data was generated by simulation of points on rigidly moving cubes. We generated synthetic\ncamera rotations and projected 3D data using these rotations to get image observations. The camera\nrotation for the Mocap datasets was 5 degrees per frame and 2 degrees per frame for the multi-body\n\n0.511.522.533.544.55100101102103104105106107108 F=200Camera motion per frame (in Degrees)Condition # of \u039bT \u039bK=2K=3K=4K=5K=60.511.522.533.544.55100101102103104105106107108K=2K=3K=4K=5K=60.511.522.533.544.5510010110210310410510610710 K=2K=3K=4K=5K=6Camera motion per frame (in Degrees)Camera motion per frame (in Degrees)F=400F=800Condition # of \u039bT \u039bCondition # of \u039bT \u039b\fFigure 4: Simultaneous reconstruction accuracy for three actors. The X-coordinate trajectories for three\ndifferent points on the actors is shown. The approximation error introduced by DCT projection has a smoothing\nimpact on the reconstruction. Red lines indicate ground truth data and blue lines indicate reconstructed data.\n\nFigure 5: The dance sequence from the CMU mocap database. The black dots are the ground truth points\nwhile the gray circles are the reconstructions by the three methods respectively.\n\nsequence. We did not rotate the camera for the dance and shark sequences, since the object itself was\nrotating in these sequences. In obtaining the results discussed below, k was chosen to provide the\nbest reconstructions, the value varying between 2 and 13 depending on the length of the sequence\nand the nonrigidity of motion. We normalize the structure, so that the average standard deviation\nof the structure matrix S becomes equal to unity (to make comparison of error across datasets more\nmeaningful).\nTable 1 shows a quantitative comparison of our method with the shape basis approach of Torresani\net al. [5] and Xiao and Kanade [9]. This table shows both the camera rotation estimation error and\nstructure reconstruction error. The estimated structure is valid up to a 3D rotation and translation\nand the estimated rotations also have a 3D rotation ambiguity. We therefore align them for error\nmeasurement. Procrustes analysis was used for aligning camera rotations and the 3D structure. The\nerror measure for camera rotations was the average Frobenius norm difference between the original\ncamera rotation and the estimated camera rotation. For structure evaluation we compute the per\nframe mean squared error between original 3D points and the estimated 3D points.\nFinally, to test the proposed approach on real data, we used a face sequence from the PIE dataset,\na sequence from the movie \u201cThe Matrix\u201d, a sequence capturing two rigidly moving cubes and a\nsequence of a toy dinosaur moving nonrigidly. For the last three sequences, the image points were\ntracked in a semi-automatic manner, using the approach proposed in [15] with manual correction.\nWe show the resulting reconstructions in Figure 6, and compare against the reconstructions obtained\nfrom Torresani et al. [5] and Xiao and Kanade [9].\n\n0501001500.10.20.30.40.50.60.70.80.90501001500.80.60.40.200.20.40.60501001500.40.30.20.100.10.20.30.40.58\u0006\u0006\u0006AO>=\u0006\u0006\u0014,EC:\u0002?\u0006\u0006H@E\u0006=JA\u0014\u0014\u0014\u0014\u0006B\u0014DA=@:\u0002?\u0006\u0006H@E\u0006=JA\u0014\u0014\u0014\u0014\u0006B\u0014D=\u0006@:\u0002?\u0006\u0006H@E\u0006=JA\u0014\u0014\u0014\u0014\u0014\u0006B\u0014B\u0006\u0006J05010015010.80.60.40.200.20.40501001501.61.41.210.80.60.40.200501001500.40.30.20.100.10.20.30=\u0006@\u0002IJ=\u0006@\u00145\u0006EF\u0014=\u0006@\u0014.=\u0006\u00060204060801001201401601801.61.41.210.80.60.40.200.20204060801001201401601801.61.41.210.80.60.40.200.202040608010012014016018010.80.60.40.200.20.4Trajectory BasisTorresani et al. [5]Xiao et al. [9]\fTable 1: The quantitative comparison of proposed algorithm with the techniques described in Xiao and Kanade\n[9] and Torresani et al. [5]. The Erot is the average Frobenius difference between original rotations and aligned\nestimated rotations, and E\u2206 is the average distance between original 3D points and aligned reconstructed points\n\nDatset\nDRINK\nPICKUP\nYOGA\n\nSTRETCH\n\nMULTIRIGID\n\nDANCE\nSHARK\n\nTrajectory Bases\nErot\nE\u2206\n\n5.8E-03\n1.55E-01\n1.06E-01\n5.49E-02\n1.96E-08\n\nNA\nNA\n\n2.50E-02\n2.37E-01\n1.62E-01\n1.09E-01\n4.88E-02\n2.96E-01\n3.12E-01\n\nTorresani\u2019s EM-Gaussian Xiao\u2019s Shape Bases\nErot\n0.2906\n0.4277\n0.8089\n0.7594\n0.1718\n\nErot\n0.3359\n0.4687\n1.2014\n0.9489\n0.0806\n\nE\u2206\n\n3.5186\n3.3721\n7.4935\n4.2415\n11.7013\n2.9962\n0.4772\n\nE\u2206\n\n0.3393\n0.5822\n0.8097\n1.1111\n2.5902\n0.9839\n0.1086\n\nNA\nNA\n\nNA\nNA\n\n7 Conclusion\nWe describe an algorithm to reconstruct nonrigid structure of an object from 2D trajectories of\npoints across a video sequence. Unlike earlier approaches that require an object-speci\ufb01c shape basis\nto be estimated for each new video sequence, we demonstrate that a generic trajectory basis can\nbe de\ufb01ned that can compactly represent the motion of a wide variety of real deformations. Results\nare shown using the DCT basis to recover structures of piece-wise rigid motion, facial expressions,\nactors dancing, walking, and doing yoga. Our experiments show that there is a relationship between\ncamera motion, degree of object deformation, and reconstruction stability. We observe that as the\nmotion of the camera increases with respect to the degree of deformation, the reconstruction stability\nincreases. Future directions of research include experimenting with different unitary transform bases\nto verify that DCT basis are, in fact, the best generic basis to use, and developing a synergistic\napproach to use both shape and trajectory bases concurrently.\n\n8 Acknowledgements\nThis research was partially supported by a grant from the Higher Education Commission of Pakistan.\nThe authors would like to acknowledge Fernando De La Torre for useful discussions. We further\nthank J. Xiao, L. Agapito, I. Matthews and L. Torresani for making their code or data available to\nus. The motion capture data used in this project was obtained from http://mocap.cs.cmu.edu.\nReferences\n\n[1] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A fac-\n\ntorization method. IJCV, 9:137\u2013154, 1992.\n\n[2] C. Bregler, A. Hertzmann, and H. Biermann. Recovering non-rigid 3D shape from image\n\nstreams. CVPR, 2:690\u2013696, 2000.\n\n[3] L. Torresani, A. Hertzmann, and C. Bregler. Learning non-rigid 3D shape from 2D motion.\n\nNIPS, 2005.\n\n[4] J. Xiao, J. Chai, and T. Kanade. A closed form solution to non-rigid shape and motion recovery.\n\nIJCV, 67:233\u2013246, 2006.\n\n[5] L. Torresani, A. Hertzmann, and C. Bregler. Nonrigid structure-from motion: Estimating shape\n\nand motion with hierarchical priors. PAMI, 30(5):878\u2013892, May 2008.\n\n[6] J.P. Costeira and T. Kanade. A multibody factorization method for independently moving\n\nobjects. IJCV, 49:159\u2013179, 1998.\n\n[7] M. Han and T. Kanade. Reconstruction of a scene with multiple linearly moving objects. IJCV,\n\n59:285\u2013300, 2004.\n\n[8] A. Gruber and Y. Weiss. Multibody factorization with uncertainity and missing data using the\n\nEM algorithm. CVPR, 1:707\u2013714, 2004.\n\n[9] J. Xiao and T. Kanade. Non-rigid shape and motion recovery: Degenerate deformations.\n\nCVPR, 1:668\u2013675, 2004.\n\n\fFigure 6: Results on Dinosaur, Matrix, PIE face, and Cubes sequences. k was set to 12, 3, 2, and 2 respectively.\n\n[10] M. Brand. Morphable 3D models from video. CVPR, 2:456, 2001.\n[11] A. Del Bue, F.Smeraldi, and L. Agapito. Non-rigid structure from motion using ranklet-based\n\ntracking and non-linear optimization. IVC, pages 297\u2013310, 2007.\n\n[12] Amnon Shashua. Trilinear tensor: The fundamental construct of multiple-view geometry and\n\nits applications. AFPAC, 1997.\n\n[13] Lihi Zelnik-Manor and Michal Irani. Temporal factorization vs. spatial factorization. ECCV,\n\n2004.\n\n[14] O. Arikan. Compression of motion capture databases. ACM Trans. on Graphics, 2006.\n[15] A. Datta, Y. Sheikh, and T. Kanade. Linear motion estimation for systems of articulated planes.\n\nCVPR, 2008.\n\n6H=\u0006A?J\u0006HO*=IEI6\u0006HHAI=\u0006E\u0014AJ\u0014=\u0006\u0002\u0005#\u0005:E=\u0006\u0014AJ\u0014=\u0006\u0002\u0005'\u00056H=\u0006A?J\u0006HO*=IEI6\u0006HHAI=\u0006E\u0014AJ\u0014=\u0006\u0002\u0005#\u0005:E=\u0006\u0014AJ\u0014=\u0006\u0002\u0005'\u00056H=\u0006A?J\u0006HO*=IEI6\u0006HHAI=\u0006E\u0014AJ\u0014=\u0006\u0002\u0005#\u0005:E=\u0006\u0014AJ\u0014=\u0006\u0002\u0005'\u00056H=\u0006A?J\u0006HO*=IEI6\u0006HHAI=\u0006E\u0014AJ\u0014=\u0006\u0002\u0005#\u0005:E=\u0006\u0014AJ\u0014=\u0006\u0002\u0005'\u0005\f", "award": [], "sourceid": 607, "authors": [{"given_name": "Ijaz", "family_name": "Akhter", "institution": null}, {"given_name": "Yaser", "family_name": "Sheikh", "institution": null}, {"given_name": "Sohaib", "family_name": "Khan", "institution": null}, {"given_name": "Takeo", "family_name": "Kanade", "institution": null}]}