{"title": "Large-Scale Matrix Factorization with Missing Data under Additional Constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 1651, "page_last": 1659, "abstract": "Matrix factorization in the presence of missing data is at the core of many computer vision problems such as structure from motion (SfM), non-rigid SfM and photometric stereo. We formulate the problem of matrix factorization with missing data as a low-rank semidefinite program (LRSDP) with the advantage that: $1)$ an efficient quasi-Newton implementation of the LRSDP enables us to solve large-scale factorization problems, and $2)$ additional constraints such as ortho-normality, required in orthographic SfM, can be directly incorporated in the new formulation. Our empirical evaluations suggest that, under the conditions of matrix completion theory, the proposed algorithm finds the optimal solution, and also requires fewer observations compared to the current state-of-the-art algorithms. We further demonstrate the effectiveness of the proposed algorithm in solving the affine SfM problem, non-rigid SfM and photometric stereo problems.", "full_text": "Large-Scale Matrix Factorization with Missing Data\n\nunder Additional Constraints\n\nKaushik Mitra (cid:3)y\n\nDepartment of Electrical and Computer Engineering and UMIACS\n\nUniversity of Maryland, College Park, MD 20742\n\nkmitra@umiacs.umd.edu\n\nSameer Sheoreyy\n\nToyota Technological Institute, Chicago\n\nssameer@ttic.edu\n\nRama Chellappa\n\nDepartment of Electrical and Computer Engineering and UMIACS\n\nUniversity of Maryland, College Park, MD 20742\n\nrama@umaics.umd.edu\n\nAbstract\n\nMatrix factorization in the presence of missing data is at the core of many com-\nputer vision problems such as structure from motion (SfM), non-rigid SfM and\nphotometric stereo. We formulate the problem of matrix factorization with miss-\ning data as a low-rank semide(cid:2)nite program (LRSDP) with the advantage that:\n1) an ef(cid:2)cient quasi-Newton implementation of the LRSDP enables us to solve\nlarge-scale factorization problems, and 2) additional constraints such as ortho-\nnormality, required in orthographic SfM, can be directly incorporated in the new\nformulation. Our empirical evaluations suggest that, under the conditions of ma-\ntrix completion theory, the proposed algorithm (cid:2)nds the optimal solution, and also\nrequires fewer observations compared to the current state-of-the-art algorithms.\nWe further demonstrate the effectiveness of the proposed algorithm in solving the\naf(cid:2)ne SfM problem, non-rigid SfM and photometric stereo problems.\n\n1 Introduction\n\nMany computer vision problems such as SfM [26], non-rigid SfM [3] and photometric stereo [11]\ncan be formulated as a matrix factorization problem. In all these problems, the measured data are\nobservations of the elements of an m (cid:2) n measurement matrix M of known rank r. The objective\nis to factorize this measurement matrix M into factors A and B of dimensions m (cid:2) r and n (cid:2) r,\nrespectively such that the error jjM (cid:0) ABTjj is minimized. When all the elements of M are known,\nand assuming that the elements are corrupted by Gaussian noise, the solution to this problem is given\nby the singular value decomposition (SVD) of M. However, in most real applications many of the\nelements of M will be missing and we need to solve a modi(cid:2)ed problem given by:\n\n(1)\nwhere (cid:12) is the Hadamard element-wise product, W is a weight matrix with zeroes at indices corre-\nsponding to the missing elements of M, and jjAjj2\nF are regularization terms which prevent\n\nA;B jjW (cid:12) (M (cid:0) ABT )jj2\nmin\n\nF , jjBjj2\n\n(cid:3)Partially supported by an ARO MURI on oppurtunistic sensing under the grant W911NF-09-1-0383.\nyKaushik Mitra and Sameer Sheorey contributed equally to this work.\n\nF + (cid:21)1jjAjj2\n\nF + (cid:21)2jjBjj2\n\nF\n\n1\n\n\fdata over(cid:2)tting. Matrix factorization with missing data is a dif(cid:2)cult non-convex problem with no\nknown globally convergent algorithm. The damped Newton algorithm [4], a variant of Newton\u2019s\nmethod, is one of the most popular algorithms for solving this problem. However, this algorithm has\nhigh computational complexity and memory requirements and so cannot be used for solving large\nscale problems.\nWe formulate the matrix factorization with missing data problem as a LRSDP [6], which is es-\nsentially a rank constrained semide(cid:2)nite programming problem (SDP) and was proposed to solve\nlarge SDP in an ef(cid:2)cient way. The advantages of formulating the matrix factorization problem as\na LRSDP problem are the following: 1) it inherits the ef(cid:2)ciency of the LRSDP algorithm. The\nLRSDP algorithm is based on a quasi-Newton method which has lower computational complexity\nand memory requirements than that of Newton\u2019s method, and so is ideally suited for solving large\nscale problems. 2) Many additional constraints, such as the ortho-normality constraints for the or-\nthographic SfM, can be easily incorporated into the LRSDP-based factorization formulation; this is\npossible because of the (cid:3)exible framework of the LRSDP (see section 2).\nPrior Work Algorithms for matrix factorization in the presence of missing data can be broadly\ndivided into two main categories: initialization algorithms and iterative algorithms. Initialization\nalgorithms [26, 13, 10, 18, 25] generally minimize an algebraic or approximate cost of (1) and are\nused for providing a good starting point for the iterative algorithms. Iterative algorithms are those\nalgorithms that directly minimize the cost function (1). Alternation algorithms [23, 28, 12, 1, 2, 14],\ndamped Newton algorithm [4] and our approach fall under this category. Alternation algorithms are\nbased on the fact that if one of the factors A or B is known, then there are closed form or numerical\nsolutions for the other factor. Though the alternation-based algorithms minimize the cost in each\niteration, they are essentially a coordinate descent approach and suffer from (cid:3)atlining, requiring\nan excessive number of iterations before convergence [4]. To solve this problem, damped Newton\nand hybrid algorithms between damped Newton and alternation were proposed in [4]. Although\nthese algorithms give very good results, they cannot be used for solving large-scale problems be-\ncause of their high computational complexity and memory requirements. Other algorithms based on\nNewton\u2019s method have been proposed in [7, 21], which also cannot be used for solving large-scale\nproblems.\nThe matrix factorization with missing data problem is closely related to the matrix completion prob-\nlem [9]. The goal of matrix completion is to (cid:2)nd a low-rank matrix which agrees with the observed\nentries of the matrix M. Recently, many ef(cid:2)cient algorithms have been proposed for solving this\nproblem [8, 17, 19, 16, 15, 20]. Some of them [16, 15, 20] are formulated as matrix factoriza-\ntion problems. However, we note that these algorithms, by themselves, can not handle additional\nconstraints. Matrix factorization also arises while solving the collaborative (cid:2)ltering problem. Col-\nlaborative (cid:2)ltering is the task of predicting the interests of a user by collecting taste information\nfrom many users, for example in a movie recommendation system. In [24], collaborative (cid:2)ltering\nis formulated as a matrix completion problem and solved using a semide(cid:2)nite program. Later a\nfast version, using conjugate gradient, was proposed in [22], but it also cannot handle additional\nconstraints.\n\n2 Background: Low-rank semide\ufb01nite programming (LRSDP)\n\nLRSDP was proposed in [6] to ef(cid:2)ciently solve a large scale SDP [27]. In the following paragraphs,\nwe brie(cid:3)y de(cid:2)ne the SDP and LRSDP problems, and discuss the ef(cid:2)cient algorithm used for solving\nthe LRSDP problem.\nSDP is a sub(cid:2)eld of convex optimization concerned with the optimization of a linear objective\nfunction over the intersection of the cone of positive semide(cid:2)nite matrices with an af(cid:2)ne space. The\nstandard-form SDP is given by:\n\nmin C (cid:15) X subject to Ai (cid:15) X = bi;\n\n(2)\nwhere C and Ai are n (cid:2) n real symmetric matrices, b is k-dimensional vector, and X is an n (cid:2) n\nmatrix variable, which is required to be symmetric and positive semide(cid:2)nite, as indicated by the\nconstraint X (cid:23) 0. The operator (cid:15) denotes the inner product in the space of n (cid:2) n symmetric\nj=1 Aij Bij. The most common algorithms\nmatrices de(cid:2)ned as A (cid:15) B = trace(AT B) = Pn\nfor solving (2) are the interior point methods [27]. However, these are second-order methods, which\n\ni = 1; : : : ; k X (cid:23) 0\n\ni=1Pn\n\n2\n\n\fneed to store and factorize a large (and often dense) matrix and hence are not suitable for solving\nlarge scale problems.\nIn LRSDP a change of variables is introduced as X = RRT , where R is a real, n (cid:2) r matrix with\nr (cid:20) n. This has the advantage that it removes the non-linear constraint X (cid:23) 0, which is the most\nchallenging aspect of solving (2). However, this comes with the cost that the problem may no longer\nbe a convex problem. The LRSDP formulation is given by:\n\ni = 1; : : : ; k\n\n(Nr) min C (cid:15) RRT subject to Ai (cid:15) RRT = bi;\n\n(3)\nNote that the LRSDP formulation depends on r; when r = n, (3) is equivalent to (2). But the\nintention is to choose r as small as possible so as to reduce the number of variables, while the\nproblem remains equivalent to the original problem (2).\nA non-linear optimization technique called the augmented Lagrangian method is used for solving\n(3). The majority of the iterations in this algorithm involve the minimization of an augmented La-\ngrangian function with respect to the variable R which is done by a limited memory BFGS method.\nBFGS, a quasi-Newton method, is much more ef(cid:2)cient than Newton\u2019s method both in terms of com-\nputations and memory requirement. The LRSDP algorithm further optimizes the computations and\nstorage requirements for sparse C and Ai matrices, which is true for problems of our interest. For\nfurther details on the algorithm, see [6, 5].\n\n3 Matrix factorization using LRSDP (MF-LRSDP)\n\nIn this section, we formulate the matrix factorization with missing data as an LRSDP problem. We\ndo this in the following stages:\nin section 3.1, we look at the noiseless case, that is, where the\nmeasurement matrix M is not corrupted with noise, followed by the noisy measurement case in\nsection 3.2, and (cid:2)nally in section 3.3, we look at how additional constraints can be incorporated in\nthe LRSDP formulation.\n\n3.1 Noiseless Case\nWhen the observed elements of the m (cid:2) n dimensional measurement matrix M are not corrupted\nwith noise, a meaningful cost to minimize would be:\n\nA;B jjAjj2\nmin\n\nF + jjBjj2\n\nF subject to (ABT )i;j = Mi;j for (i; j) 2 (cid:10)\n\n(4)\n\nwhere (cid:10) is the index set of the observed entries of M, and A, B are the desired factor matrices of\ndimensions m (cid:2) r and n (cid:2) r respectively. To formulate this as a LRSDP problem, we introduce a\n(m + n) (cid:2) r dimensional matrix R = (cid:18) A\n\nB (cid:19). Then\n\nWe observe that the cost functionjjAjj2\nas (RRT )i;j+m = Mi;j. Thus, (4) is equivalent to:\n\nRRT = (cid:18) AAT ABT\nF +jjBjj2\n\nBAT BBT (cid:19)\nF can be expressed as trace(RRT ) and the constraints\n\n(5)\n\ntrace(RRT ) subject to (RRT )i;j+m = Mi;j for (i; j) 2 (cid:10)\nThis is already in the LRSDP form, since we can express the above equation as\n\nmin\n\nR\n\nmin\n\nR\n\nC (cid:15) RRT subject to Al (cid:15) RRT = bl;\n\nl = 1; : : : ;j(cid:10)j\n\n(6)\n\n(7)\n\nwhere C is an (m + n) (cid:2) (m + n) identity matrix, and to simplify the notations we have introduced\nthe index l with (cid:10)(l) = (i; j)\nl = 1; : : : ;j(cid:10)j. Al are sparse matrices with the non-zero entries at\nindices (i; j + m) and (j + m; i) equal to 1=2 and bl = Mi;j. This completes the formulation of\nthe matrix factorization problem as an LRSDP problem for the noiseless case. Next we look at the\nnoisy case.\n\n3\n\n\f3.2 Noisy case\n\nWhen the observed entries of M are corrupted with noise, an appropriate cost function to minimize\nwould be:\n\nA;B jjW (cid:12) (M (cid:0) ABT )jj2\nmin\n\n(8)\nwhere (cid:12) is the Hadamard element-wise product and W is a weight matrix with zeros corresponding\nto the missing entries and 1 to the observed entries in M. To formulate this as an LRSDP problem,\nwe introduce noise variables el; l = 1; 2; : : : ;j(cid:10)j which are de(cid:2)ned as el = (M (cid:0) (ABT ))l . Now,\n(8) can be expressed as\n(9)\n\nF subject to (M (cid:0) ABT )l = el for l = 1; 2; : : : ;j(cid:10)j\n\nF + (cid:21)jjBjj2\n\nF + (cid:21)jjAjj2\n\nA;B;e jjejj2\nmin\n\n2 + (cid:21)jjAjj2\n\nF + (cid:21)jjBjj2\n\nF\n\nNext, we aim to formulate this as a LRSDP problem. For this, we construct an augmented noise\nvector E = [eT\n\n1]T and de(cid:2)ne R to be\n\nR = 0\n@\n\n(cid:18) A\n\nB (cid:19) 0\n\n0\n\nE\n\n1\nA\n\n(10)\n\nR is a \u2018block-diagonal\u2019 matrix, where the blocks are of sizes (m + n) (cid:2) r and (j(cid:10)j + 1) (cid:2) 1\nrespectively. With this de(cid:2)nition, RRT is a block-diagonal matrix given by\n\nRRT = 0\n@\n\n(cid:18) AAT ABT\n\nBAT BBT (cid:19)\n\n0\n\nWe can now express (8) in the following LRSDP form:\n\n0\n\nEET\n\n1\nA\n\nmin\n\nR\n\nC (cid:15) RRT subject to Al (cid:15) RRT = bl;\n\nl = 1; : : : ;j(cid:10)j + 1\n\n(11)\n\n(12)\n\nwith\n\n0\n\n0\n\nI(j(cid:10)j+1)(cid:2)(j(cid:10)j+1) (cid:19)\n\nC = (cid:18) (cid:21)I(m+n)(cid:2)(m+n)\n\n(13)\nNote that the number of constraints j(cid:10)j + 1 in (12) is one more than the number of observations j(cid:10)j.\nThis is because the last constraint is used to set Ej(cid:10)j+1 = 1, which is done by choosing Aj(cid:10)j+1 to be\na sparse matrix with the non-zero entry at index (j(cid:10)j + l + m + n;j(cid:10)j + 1 + m + n) equal to 1 and\nbj(cid:10)j+1 = 1. For the remaining values of l, the Al are sparse matrices with the non-zero entries at\nindices (i; j +m), (j +m; i), (j(cid:10)j+1+m+n; l+m+n) and (l+m+n;j(cid:10)j+1+m+n) equal to 1=2\nand bl = Ml. Note that (12) is a block-LRSDP problem (R has a block-diagonal structure), which\nis a simple extension of the original LRSDP problem [5]. This completes the LRSDP formulation\nfor the noisy case. Next, we look at incorporating additional constraints in this framework.\n\n3.3 Enforcing Additional Constraints\n\nMany additional constraints can be easily incorporated in the LRSDP formulation. We illustrate\nthis using the speci(cid:2)c example of orthographic SfM [26]. SfM is the problem of reconstructing the\nscene structure (3-D point positions and camera parameters) from 2-D projections of the points in\nthe cameras. Suppose that m=2 cameras are looking at n 3-D points, then under the af(cid:2)ne camera\nmodel, the 2-D imaged points can be arranged as an m (cid:2) n measurement matrix M with columns\ncorresponding to the n 3-D points and rows corresponding to the m=2 cameras (2 consecutive rows\nper camera) [26]. Under this arrangement, M can be factorized as M = AB T , where A is a m (cid:2) 4\ncamera matrix and B is a n (cid:2) 4 structure matrix with the last column of B, an all-one vector.\nThus, M is a rank 4 matrix with a special structure for the last column of B. Further, under the\northographic camera model, A has more structure (constraints): pair of \u2019rows\u2019 that corresponds to\nthe same camera is ortho-normal. To state this constraints precisely, we decompose the A matrix as\nA = [P t] where P is a m (cid:2) 3 sub-matrix consisting of the (cid:2)rst three columns and t is the last\ncolumn vector. We can now express the camera ortho-normality constraint through the P P T matrix,\nwhose diagonal elements should be 1 (normality constraint) and appropriate off-diagonal elements\nshould be 0 (orthogonality constraint). Since, the last column of B is the all one vector, we can write\n\n4\n\n\fB = [X 1], where X is a n (cid:2) 3 matrix. Thus, ABT = P X + t1T and the observation error can\nbe expressed as el = (M (cid:0) P X)l (cid:0) ti for (cid:10)(l) = (i; j). A meaningful optimization problem to\nsolve here would be to minimize the observation error subject to the ortho-normality constraints:\n\ne;P;X;t jjejj2\nmin\n\n2\n\nsubject to el = (M (cid:0) P X)l (cid:0) ti;\n\nl = 1; 2; : : : ;j(cid:10)j\n\n(P P T )k;k = 1;\n\nk = 1; 2; : : : ; m\n\n(P P T )k;l = 0; if k and l are rows from same camera\n\n(14)\n\nTo formulate this as an LRSDP problem, we introduce the augmented translation variable T =\n[tT\n\n1]T , and propose the following block-diagonal matrix R:\n1\nCA\n\nR = 0\nB@\n\nX (cid:19) 0\n\nT\n0\n0 E\n\n(cid:18) P\n\n0\n0\n\n0\n\n(15)\n\nWith this de(cid:2)nition of R, we can express (14) as a LRSDP problem; following steps similar to the\nprevious sections, it is should be straight forward to (cid:2)gure out the appropriate C and Al matrices\nrequired in this LRSDP formulation (3). This completes our illustration on the incorporation of\nthe ortho-normality constraints for the orthographic SfM case. This example should convince the\nreader that many other application-speci(cid:2)c constraints can be directly incorporated into the LRSDP\nformulation; this is because of the underlying SDP structure of the LRSDP.\n\n4 Matrix Completion, Uniqueness and Convergence of MF-LRSDP\n\nIn this section, we state the main result of the matrix completion theory and discuss its implications\nfor the matrix factorization problem.\n\n4.1 Matrix Completion Theory\n\nmin\nX\n\nMatrix completion theory considers the problem of recovering a low-rank matrix from a few samples\nof its entries:\n\nrank(X) subject to Xi;j = Mi;j for (i; j) 2 (cid:10)\n\n(16)\nMore speci(cid:2)cally, it considers the following questions: 1) when does a partially observed matrix\nhave a unique low-rank solution? 2) How can this matrix be recovered? The answers to these\nquestions were provided in theorem 1:3 of [9] which states that if 1) the matrix M, that we want to\nrecover, has row and columns spaces incoherent with the standard basis and 2) we are given enough\nentries ((cid:21) O(rd6=5 log d), where d = max(m; n)), then there exists a unique low-rank solution to\n(16). Further, the solution can be obtained by solving a convex relaxation of (16) given by:\n(17)\n\nX jjXjj(cid:3) subject to Xi;j = Mi;j for (i; j) 2 (cid:10)\n\nmin\n\nwhere jjXjj(cid:3) is the nuclear norm of X, given by the sum of its singular values.\n4.2 Relation with Matrix Factorization and its Implications\n\nIn matrix completion the objective is to (cid:2)nd a minimum rank matrix which agrees with the partial\nobservations (16), whereas in matrix factorization we assume the rank r to be known, as in the\nproblems of SFM and photometric stereo, and we use the rank as a constraint. For example, in our\nLRSDP formulation, we have imposed this rank constraint by (cid:2)xing the number of columns of the\nfactors A and B to r. However, though the matrix completion and factorization problems are de-\n(cid:2)ned differently, they are closely related as revealed by their very similar Lagrangian formulations.\nThis fact has been used in solving the matrix completion problem via matrix factorization with an\nappropriate rank [16, 15, 20]. We should also note that matrix completion theory helps us answer the\nquestion raised in [4]: when is missing data matrix factorization unique (up to a gauge)? And from\nthe discussion in the previous section, it should be clear that the conditions of the matrix completion\ntheory are suf(cid:2)cient for guaranteeing us the required uniqueness. Further, in our experimental evalu-\nations (see next section), we have found that the LRSDP formulation, though a non-convex problem\nin general, converges to the global minimum solution under these conditions.\n\n5\n\n\f5 Experimental Evaluation\n\nWe evaluate the performance of the proposed LRSDP-based factorization algorithm (MF-LRSDP)\non both synthetic and real data and compare it against other algorithms such as alternation [4],\ndamped Newton [4] and OptSpace [15], which is one of state-of-the-art algorithms for matrix com-\npletion.\n\n5.1 Evaluation with Synthetic Data\n\nThe important parameters in the matrix factorization with missing data problem are: the size of\nthe matrix M characterized by m and n, rank r, fraction of missing data and the variance (cid:27) 2 of\nthe observation noise. We evaluate the factorization algorithms by varying these parameters. We\nconsider two cases: data without noise and data with noise. For synthetic data without noise, we\ngenerate n(cid:2) n matrices M of rank r by M = ABT , where A and B are n(cid:2) r random matrices with\neach entry being sampled independently from a standard Gaussian distribution N (0; 1). Each entry\nis then revealed randomly according to the missing data fraction. For synthetic data with noise, we\nadd independent Gaussian noise N (0; (cid:27)2) to the observed entries generated as above.\nExact Factorization: a \ufb01rst comparison. We study the reconstruction rate of different algorithms\nby varying the fraction of revealed entries per column (j(cid:10)j=n) for noiseless 500 (cid:2) 500 matrices of\nrank 5. We declare a matrix to be reconstructed if jjM (cid:0) ^MjjF =jjMjjF (cid:20) 10(cid:0)4, where ^M = ^A ^B is\nthe reconstructed matrix and jj:jjF denotes the Frobenius norm. Reconstruction rate is de(cid:2)ned as the\nfraction of trials for which the matrix was successfully reconstructed. In all the synthetic data exper-\niments, we performed 10 trials. Figure 1(a) shows the reconstruction rate by MF-LRSDP, alternation\nand OptSpace. MF-LRSDP gives the best reconstruction results as it needs fewer observations for\nmatrix reconstruction than the other algorithms. It is followed by OptSpace and alternation, respec-\ntively. MF-LRSDP also takes the least time, followed by OptSpace and alternation. For similar\ncomparison to other matrix completion algorithms such as ADMiRA [16], SVT [8] and FPCA [17],\nthe interested reader can look at [15], where OptSpace was shown to be consistently better than\nthese algorithms. For the remaining experiments on synthetic data, we mostly compare MF-LRSDP\nagainst OptSpace. Note that we have not included the damped Newton algorithm in this comparison\nbecause it is very slow for matrices of this size.\n\nMF\u2212LRSDP\nAlternation\nOptSpace\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\ne\nt\na\nr\n \nn\no\ni\nt\nc\nu\nr\nt\ns\nn\no\nc\ne\nR\n\n \n0\n0\n\nl\n\ne\na\nc\ns\ng\no\n\n \n\nl\n \n\nn\n\ni\n \n\ns\nd\nn\no\nc\ne\ns\nn\n\n \n\ni\n \n\ne\nm\nT\n\ni\n\n10\n\n20\n|/n\n\n|W\n\n30\n\n40\n\n(a) Reconstruction rate\n\n \n\n \n\n104\n\n102\n\n100\n\n10\u22122\n \n0\n\nMF\u2212LRSDP\nAlternation\nOptSpace\n\n10\n\n20\n|/n\n\n|W\n\n(b) Timing results\n\n30\n\n40\n\nFigure 1: (a) Reconstruction rate vs. fraction of revealed entries per column j(cid:10)j=n for 500 (cid:2) 500 matrices\nof rank 5 by MF-LRSDP, alternation and OptSpace. The proposed algorithm MF-LRSDP gives the best recon-\nstruction results since it can reconstruct matrices with fewer observed entries. (b) Time taken for reconstruction\nby different algorithms. MF-LRSDP takes the least time.\nExact Factorization: vary size. We study the reconstruction rate vs. fraction of revealed entries per\ncolumn j(cid:10)j=n for different sizes n of rank 5 square matrices by MF-LRSDP and OptSpace. Figure\n2(a) shows that MF-LRSDP reconstructs matrices from fewer observed entries than OptSpace.\nExact Factorization: vary rank. We study the reconstruction rate vs. j(cid:10)j=n as we vary the rank r\nof 500(cid:2) 500 matrices. Figure 2(b) again shows that MF-LRSDP gives better results than OptSpace.\n\n6\n\n\fMF\u2212LRSDP\n n=400\n\n n=1000\n\n n=2000\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\ne\nt\na\nr\n \nn\no\ni\nt\nc\nu\nr\nt\ns\nn\no\nc\ne\nR\n\n \n\nOptSpace\n\nn=400\n\nn=1000\n\nn=2000\n\n \n0\n5\n\n10\n\n15\n\n20\n|W\n\n25\n\n|/n\n\n30\n\n35\n\n40\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\ne\nt\na\nr\n \nn\no\ni\nt\nc\nu\nr\nt\ns\nn\no\nc\ne\nR\n\n \n\n0\n0\n\nOptSpace\nr=5\n\nr=10\n\nr=20\n\nMF\u2212LRSDP\nr=5\n\nr=10\n\nr=20\n\n100\n\n \n\n150\n\nE\nS\nM\nR\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n \n0\n0\n\n50\n\n|W\n\n|/n\n\nMF\u2212LRSDP\nAlternation\nDamped Newton\nOptSpace\n\n1\n\n4\nNoise standard deviation s\n\n2\n\n3\n\n \n\n5\n\n(a) Reconstruction rate for different\nsizes\n\n(b) Reconstruction rate for differ-\nent ranks\n\n(c) RMSE vs. noise std\n\nFigure 2: (a) Reconstruction rate vs. fraction of revealed entries per column j(cid:10)j=n for rank 5 square matrices\nof different sizes n by MF-LRSDP and OptSpace. MF-LRSDP reconstructs matrices from fewer observed\nentries than OptSpace. (b) Reconstruction rate vs. j(cid:10)j=n for 500 (cid:2) 500 matrices of different ranks by MF-\nLRSDP and OptSpace. Again MF-LRSDP needs fewer observations than OptSpace.\n(c) RMSE vs. noise\nstandard deviation for rank 5, 200 (cid:2) 200 matrices by MF-LRSDP, OptSpace, alternation and damped Newton.\nAll algorithms perform equally well.\n\nNoisy Factorization: vary noise standard deviation.\n\nFor noisy data, we use the root mean\nsquare error RMSE = 1=pmnjjM (cid:0) ^MjjF as a performance measure. We vary the standard\ndeviation (cid:27) of the additive noise for rank 5, 200 (cid:2) 200 matrices and study the performance by\nMF-LRSDP, OptSpace, alternation and damped Newton. Figure 2(c) shows that all the algorithms\nperform equally well.\nFor timing comparisons, please refer to the supplementary material.\n\n5.2 Evaluation with Real Data\n\nWe consider three problems: 1) af(cid:2)ne SfM 2) non-rigid SfM and 3) photometric stereo.\nAf\ufb01ne SfM. As discussed in section 3.3, for af(cid:2)ne SfM, the m(cid:2) n measurement matrix M is a rank\n4 matrix with the last column of matrix B an all-one vector. M is generally an incomplete matrix\nbecause not all the points are visible in all the cameras. We evaluate the performance of MF-LRSDP\non the \u2018Dinosaur\u2019 sequence used in [4, 7], for which M is a 72 (cid:2) 319 matrix with 72% missing\nentries. We perform 25 trials and at each trial we provide the same random initializations to MF-\nLRSDP, alternation and damped Newton (OptSpace has its only initialization technique). We use\nthe root mean square error over the observed entries, jjW (cid:12) (M (cid:0) ^M)jjF =pj(cid:10)j, as our performance\nmeasure. Figure 3 shows the cumulative histogram over the RMS pixel error. MF-LRSDP gives\nthe best performance followed by damped Newton, alternation and OptSpace. We further tested the\nalgorithms on a \u2019longer Dinosaur\u2019, the result of which is provided in the supplementary material.\nNon-rigid SfM. In non-rigid SfM, non-rigid objects are expressed as a linear combination of b basis\nshapes. In this case, the m (cid:2) n measurement matrix M can be expressed as M = AB T , where A\nis an m (cid:2) 3b matrix and B is an n (cid:2) 3b matrix [3]. This makes M a rank 3b matrix. We test the\nperformance of the algorithms on the \u2019Giraffe\u2019 sequence [4, 7] for which M is a 240 (cid:2) 167 matrix\nwith 30% missing entries. We choose the rank as 6. Figure 3 shows the cumulative histogram of 25\ntrials from which we conclude that MF-LRSDP, alternation and damped Newton give good results.\nPhotometric Stereo. Photometric stereo is the problem of estimating the surface normals of an\nobject by imaging that object under different lighting conditions. Suppose we have n images of\nthe object under different lighting conditions with each image consisting of m pixels (m surface\nnormals) and we arrange them as an m (cid:2) n measurement matrix M. Then under Lambertian as-\nsumptions, we can express M as M = ABT , where A is an m (cid:2) 3 matrix representing the surface\nnormals and re(cid:3)ectance and B is an n (cid:2) 3 matrix representing the light-source directions and in-\ntensities [11]. Thus, M is a rank 3 matrix. Some of the image pixels are likely to be affected by\nshadows and specularities and those pixels should not be included in the M matrix as they do not\nobey the Lambertian assumption. This makes M, an incomplete matrix. We test the algorithms\non the \u2018Face\u2019 sequence [4, 7] for which M is a 2944 (cid:2) 20 matrix with 42% missing entries. The\ncumulative histogram in (cid:2)gure 3 shows that MF-LRSDP and damped Newton gives the best results\nfollowed by alternation and OptSpace.\n\n7\n\n\fm\na\nr\ng\no\nt\ns\nh\ne\nv\n\n \n\ni\n\nl\n\ni\nt\na\nu\nm\nu\nC\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\n \n\n2\n\n \n\nMF\u2212LRSDP\nAlternation\nDamped Newton\nOptSpace\n\n4\n\n6\n\nRMS pixel error\n\n8\n\n(a) Dinosaur sequence\n\nm\na\nr\ng\no\nt\ns\ni\nh\n \ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n25\n\n20\n\n15\n\n10\n\n5\n\n \n0\n0\n\n \n\nMF\u2212LRSDP\nAlternation\nDamped Newton\nOptSpace\n\n25\n\n20\n\n15\n\n10\n\n5\n\nm\na\nr\ng\no\nt\ns\ni\nh\n \ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n \n\nMF\u2212LRSDP\nAlternation\ndamped Newton\nOptSpace\n\n0.2\n\n0.4\nRMS pixel error\n\n0.6\n\n0.8\n\n1\n\n(b) Giraffe sequence\n\n \n0\n0.035\n\n0.04\n\n0.045\n\nRMS error\n\n0.05\n\n0.055\n\n(c) Face sequence\n\nFigure 3: Cumulative histogram (of 25 trials) for the Dinosaur, Giraffe and the Face sequence. For all of them,\nMF-LRSDP consistently gives good results.\n\nAdditional constraints: Orthographic SfM. Orthographic SfM is a special case of af(cid:2)ne SfM,\nwhere the camera matrix A satis(cid:2)es the additional constraint of ortho-normality, see section 3.3. We\nshow here that incorporating these constraints leads to a better solution. Figure 4 shows the input\npoint tracks, reconstructed point tracks without the constraints and reconstructed point tracks with\nthe constraints for the Dinosaur turntable sequence. Without the constraints many tracks fail to be\ncircular, whereas with the constraints all of them are circular (the dinosaur sequence is a turntable\nsequence and the tracks are supposed to be circular). Thus, incorporating all the constraints of a\nproblem leads to better solution and MR-LRSDP provides a very (cid:3)exible framework for doing so.\n\n(a) Input point tracks\n\n(b) Reconstructed tracks without\nconstraints\n\n(c) Reconstructed tracks with con-\nstraints\n\nFigure 4: (a) Input (incomplete) point tracks of the Dinosaur turntable sequence, (b) reconstructed tracks\nwithout orthonormality constraints and (c) reconstructed tracks with orthonormality contraints. Without the\nconstraints many tracks fail to be circular, whereas with the constraints all of them are circular (the dinosaur\nsequence is a turntable sequence and the tracks are supposed to be circular).\n\n6 Conclusion and Discussion\n\nWe have formulated the matrix factorization with missing data problem as a low-rank semide(cid:2)nite\nprogramming problem MF-LRSDP. MF-LRSDP is an ef(cid:2)cient algorithm that can be used for solv-\ning large-scale factorization problems. It is also (cid:3)exible for handling many additional constraints\nsuch as the ortho-normality constraints of orthographic SfM. Our empirical evaluations on synthetic\ndata show that it needs fewer observations for matrix factorization as compared to other algorithms\nand it gives very good results on the real problems of SfM, non-rigid SfM and photometric stereo.\nWe note that though MF-LRSDP is a non-convex problem, it (cid:2)nds the global minimum under the\nconditions of matrix completion theory. As a future work, it would be interesting to (cid:2)nd a theo-\nretical justi(cid:2)cation for this. Also, it would be interesting to (cid:2)nd out how MF-LRSDP performs on\ncollaborative (cid:2)ltering problems.\n\nReferences\n[1] H. Aan(cid:230)s, R. Fisker, K. (cid:9)Astr\u00a4om, and J. M. Carstensen. Robust factorization. IEEE TPAMI, 2002.\n\n8\n\n\f[2] S. Brandt. Closed-form solutions for af(cid:2)ne reconstruction under missing data. In Stat. Methods for Video\n\nProc. (ECCV 02 Workshop), 2002.\n\n[3] C. Bregler, A. Hertzmann, and H. Biermann. Recovering non-rigid 3d shape from image streams. In\n\nCVPR, 2000.\n\n[4] A. M. Buchanan and A. W. Fitzgibbon. Damped newton algorithms for matrix factorization with missing\n\ndata. In CVPR, 2005.\n\n[5] S. Burer and C. Choi. Computational enhancements in low-rank semide(cid:2)nite programming. Optimization\n\nMethods and Software, 2006.\n\n[6] S. Burer and R.D.C. Monteiro. A nonlinear programming algorithm for solving semide(cid:2)nite programs\n\nvia low-rank factorization. Mathematical Programming (series B, 2001.\n\n[7] Pei C. Optimization algorithms on subspaces: Revisiting missing data problem in low-rank matrix. IJCV,\n\n2008.\n\n[8] J. Cai, E. J. Cand(cid:30)es, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM\n\nJournal on Optimization, 2010.\n\n[9] E. J. Cand(cid:30)es and B. Recht. Exact matrix completion via convex optimization. Foundations on Computa-\n\ntional Mathematics, 2009.\n\n[10] N. Guilbert, A.E. Bartoli, and A. Heyden. Af(cid:2)ne approximation for direct batch recovery of euclidian\n\nstructure and motion from sparse data. IJCV, 2006.\n\n[11] H. Hayakawa. Photometric stereo under a light source with arbitrary motion. JOSA, 1994.\n[12] D. Q. Huynh, R. Hartley, and A. Heyden. Outlier correction in image sequences for the af(cid:2)ne camera. In\n\nICCV, 2003.\n\n[13] D. W. Jacobs. Linear (cid:2)tting with missing data for structure-from-motion. CVIU, 2001.\n[14] Q. Ke and T. Kanade. Robust l1 norm factorization in the presence of outliers and missing data by\n\nalternative convex programming. In CVPR, 2005.\n\n[15] R. H. Keshavan and S. Oh. A gradient descent algorithm on the grassman manifold for matrix completion.\n\nCoRR, abs/0910.5260, 2009.\n\n[16] K. Lee and Y. Bresler. Admira: Atomic decomposition for minimum rank approximation. CoRR,\n\nabs/0905.0044, 2009.\n\n[17] S. Ma, D. Goldfarb, and L. Chen. Fixed point and bregman iterative methods for matrix rank minimiza-\n\ntion. Mathematical Programming, 2009.\n\n[18] D. Martinec and T. Pajdla. 3d reconstruction by (cid:2)tting low-rank matrices with missing data. In CVPR,\n\n2005.\n\n[19] R. Mazumder, T. Hastie, and R. Tibshirani. Spectral regularization algorithms for learning large incom-\n\nplete matrices. http://www-stat.stanford.edu/ hastie/Papers/SVD JMLR.pdf, 2009.\n\n[20] R. Meka, P. Jain, and I. S. Dhillon. Guaranteed rank minimization via singular value projection. CoRR,\n\nabs/0909.5457, 2009.\n\n[21] T. Okatani and K. Deguchi. On the wiberg algorithm for matrix factorization in the presence of missing\n\ncomponents. IJCV, 2007.\n\n[22] J. D. M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction.\n\nIn ICML, 2005.\n\n[23] H. Shum, K. Ikeuchi, and R. Reddy. Principal component analysis with missing data and its application\n\nto polyhedral object modeling. IEEE TPAMI, 1995.\n\n[24] N. Srebro, J. D. M. Rennie, and T. Jaakkola. Maximum-margin matrix factorization. In NIPS, 2004.\n[25] J. P. Tardif, A. Bartoli, M. Trudeau, N. Guilbert, and S. Roy. Algorithms for batch matrix factorization\n\nwith application to structure-from-motion. In CVPR, 2007.\n\n[26] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization\n\nmethod. IJCV, 1992.\n\n[27] L. Vandenberghe and S. Boyd. Semide(cid:2)nite programming. SIAM Rev., 1996.\n[28] R. Vidal and R. Hartley. Motion segmentation with missing data using powerfactorization and gpca. In\n\nIn CVPR, 2004.\n\n9\n\n\f", "award": [], "sourceid": 120, "authors": [{"given_name": "Kaushik", "family_name": "Mitra", "institution": null}, {"given_name": "Sameer", "family_name": "Sheorey", "institution": null}, {"given_name": "Rama", "family_name": "Chellappa", "institution": null}]}