{"title": "Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization", "book": "Advances in Neural Information Processing Systems", "page_first": 2080, "page_last": 2088, "abstract": "Principal component analysis is a fundamental operation in computational data analysis, with myriad applications ranging from web search to bioinformatics to computer vision and image analysis. However, its performance and applicability in real scenarios are limited by a lack of robustness to outlying or corrupted observations. This paper considers the idealized \u201crobust principal component analysis\u201d problem of recovering a low rank matrix A from corrupted observations D = A + E. Here, the error entries E can be arbitrarily large (modeling grossly corrupted observations common in visual and bioinformatic data), but are assumed to be sparse. We prove that most matrices A can be efficiently and exactly recovered from most error sign-and-support patterns, by solving a simple convex program. Our result holds even when the rank of A grows nearly proportionally (up to a logarithmic factor) to the dimensionality of the observation space and the number of errors E grows in proportion to the total number of entries in the matrix. A by-product of our analysis is the first proportional growth results for the related problem of completing a low-rank matrix from a small fraction of its entries. Simulations and real-data examples corroborate the theoretical results, and suggest potential applications in computer vision.", "full_text": "Robust Principal Component Analysis:\n\nExact Recovery of Corrupted Low-Rank Matrices by\n\nConvex Optimization\n\nJohn Wright\u2217, Yigang Peng, Yi Ma\n\nVisual Computing Group\nMicrosoft Research Asia\n\n{jowrig,v-yipe,mayi}@microsoft.com\n\nArvind Ganesh, Shankar Rao\nCoordinated Science Laboratory\n{abalasu2,srrao}@uiuc.edu\n\nUniversity of Illinois at Urbana-Champaign\n\nAbstract\n\nPrincipal component analysis is a fundamental operation in computational data\nanalysis, with myriad applications ranging from web search to bioinformatics to\ncomputer vision and image analysis. However, its performance and applicability\nin real scenarios are limited by a lack of robustness to outlying or corrupted ob-\nservations. This paper considers the idealized \u201crobust principal component anal-\nysis\u201d problem of recovering a low rank matrix A from corrupted observations\nD = A + E. Here, the corrupted entries E are unknown and the errors can be\narbitrarily large (modeling grossly corrupted observations common in visual and\nbioinformatic data), but are assumed to be sparse. We prove that most matrices\nA can be ef\ufb01ciently and exactly recovered from most error sign-and-support pat-\nterns by solving a simple convex program, for which we give a fast and provably\nconvergent algorithm. Our result holds even when the rank of A grows nearly\nproportionally (up to a logarithmic factor) to the dimensionality of the observa-\ntion space and the number of errors E grows in proportion to the total number of\nentries in the matrix. A by-product of our analysis is the \ufb01rst proportional growth\nresults for the related problem of completing a low-rank matrix from a small frac-\ntion of its entries. Simulations and real-data examples corroborate the theoretical\nresults, and suggest potential applications in computer vision.\n\n1 Introduction\nThe problem of \ufb01nding and exploiting low-dimensional structure in high-dimensional data is taking\non increasing importance in image, audio and video processing, web search, and bioinformatics,\nwhere datasets now routinely lie in thousand- or even million-dimensional observation spaces. The\ncurse of dimensionality is in full play here: meaningful inference with limited number of obser-\nvations requires some assumption that the data have low intrinsic complexity, e.g., that they are\nlow-rank [1], sparse in some basis [2], or lie on some low-dimensional manifold [3, 4]. Perhaps the\nsimplest useful assumption is that the observations all lie near some low-dimensional subspace. In\nother words, if we stack all the observations as column vectors of a matrix M \u2208 Rm\u00d7n, the matrix\nshould be (approximately) low rank. Principal component analysis (PCA) [1, 5] seeks the best (in\nan (cid:96)2-sense) such low-rank representation of the given data matrix. It enjoys a number of optimality\nproperties when the data are only mildly corrupted by small noise, and can be stably and ef\ufb01ciently\ncomputed via the singular value decomposition.\n\n\u2217For more information, see http://perception.csl.illinois.edu/matrix-rank/home.html.\n\nThis work was partially supported by NSF IIS 08-49292, NSF ECCS 07-01676, and ONR N00014-09-1-0230.\n\n1\n\n\fOne major shortcoming of classical PCA is its brittleness with respect to grossly corrupted or outly-\ning observations [5]. Gross errors are ubiquitous in modern applications in imaging and bioinformat-\nics, where some measurements may be arbitrarily corrupted (e.g., due to occlusion or sensor failure)\nor simply irrelevant to the structure we are trying to identify. A number of natural approaches to\nrobustifying PCA have been explored in the literature. These approaches include in\ufb02uence function\ntechniques [6, 7], multivariate trimming [8], alternating minimization [9], and random sampling\ntechniques [10]. Unfortunately, none of these existing approaches yields a polynomial-time algo-\nrithm with strong performance guarantees.1\nIn this paper, we consider an idealization of the robust PCA problem, in which the goal is to recover a\nlow-rank matrix A from highly corrupted measurements D = A + E. The errors E can be arbitrary\nin magnitude, but are assumed to be sparsely supported, affecting only a fraction of the entries\nof D. This should be contrasted with the classical setting in which the matrix A is perturbed by\nsmall (but densely supported) noise. In that setting, classical PCA, computed via the singular value\ndecomposition, remains optimal if the noise is Gaussian. Here, on the other hand, even a small\nfraction of large errors can cause arbitrary corruption in PCA\u2019s estimate of the low rank structure, A.\nOur approach to robust PCA is motivated by two recent, and tightly related, lines of research. The\n\ufb01rst set of results concerns the robust solution of over-determined linear systems of equations in the\npresence of arbitrary, but sparse errors. These results imply that for generic systems of equations, it\nis possible to correct a constant fraction of arbitrary errors in polynomial time [11]. This is achieved\nby employing the (cid:96)1-norm as a convex surrogate for the highly-nonconvex (cid:96)0-norm. A parallel\n(and still emerging) line of work concerns the problem of computing low-rank matrix solutions\nto underdetermined linear equations [12, 13]. One of the most striking results concerns the exact\ncompletion of low-rank matrices from only a small fraction of their entries [13, 14, 15, 16].2 There, a\nsimilar convex relaxation is employed, replacing the highly non-convex matrix rank with the nuclear\nnorm (or sum of singular values).\nThe robust PCA problem outlined above combines aspects of both of these lines of work: we wish\nto recover a low-rank matrix from large but sparse errors. We will show that combining the solutions\nto the above problems (nuclear norm minimization for low-rank recovery and (cid:96)1-minimization for\nerror correction) yields a polynomial-time algorithm for robust PCA that provably succeeds under\nbroad conditions:\n\nWith high probability, solving a simple convex program perfectly recovers a\ngeneric matrix A \u2208 Rm\u00d7m of rank as large as C m\nlog(m), from errors affecting\nup to a constant fraction of the m2 entries.\n\nThis conclusion holds with high probability as the dimensionality m increases, implying that in\nhigh-dimensional observation spaces, sparse and low-rank structures can be ef\ufb01ciently and exactly\nseparated. This behavior is an example of the so-called the blessing of dimensionality [17].\nHowever, this result would remain a theoretical curiosity without scalable algorithms for solving the\nassociated convex program. To this end, we discuss how a near-solution to this convex program can\nbe obtained relatively ef\ufb01ciently via proximal gradient [18, 19] and iterative thresholding techniques,\nsimilar to those proposed for matrix completion in [20, 21]. For large matrices, these algorithms are\nsigni\ufb01cantly faster and more scalable than general-purpose convex program solvers.\nOur analysis also implies an extension of existing results for the low-rank matrix completion prob-\nlem, and including the \ufb01rst results applicable to the proportional growth setting where the rank of\nthe matrix grows as a constant (non-vanishing) fraction of the dimensionality:\n\nWith overwhelming probability, solving a simple convex program perfectly re-\ncovers a generic matrix A \u2208 Rm\u00d7m of rank as large as Cm, from observations\nconsisting of only a fraction \u03c1m2 (\u03c1 < 1) of its entries.\n\n1Random sampling approaches guarantee near-optimal estimates, but have complexity exponential in the\nrank of the matrix A0. Trimming algorithms have comparatively lower computational complexity, but guarantee\nonly locally optimal solutions.\n\n2A major difference between robust PCA and low-rank matrix completion is that here we do not know\n\nwhich entries are corrupted, whereas in matrix completion the support of the missing entries is given.\n\n2\n\n\fOrganization of this paper. This paper is organized as follows. Section 2 formulates the robust\nprincipal component analysis problem more precisely and states the main results of this paper, plac-\ning these results in the context of existing work. The proof (available in [22]) relies on standard ideas\nfrom linear algebra and concentration of measure, but is beyond the scope of this paper. Section 3\nextends existing proximal gradient techniques to give a simple, scalable algorithm for solving the\nrobust PCA problem. In Section 4, we perform simulations and experiments corroborating the theo-\nretical results and suggesting their applicability to real-world problems in computer vision. Finally,\nin Section 5, we outline several promising directions for future work.\n\n2 Problem Setting and Main Results\nWe assume that the observed data matrix D \u2208 Rm\u00d7n was generated by corrupting some of the\nentries of a low-rank matrix A \u2208 Rm\u00d7n. The corruption can be represented as an additive error\nE \u2208 Rm\u00d7n, so that D = A + E. Because the error affects only a portion of the entries of D, E is a\nsparse matrix. The idealized (or noise-free) robust PCA problem can then be formulated as follows:\nProblem 2.1 (Robust PCA). Given D = A + E, where A and E are unknown, but A is known to\nbe low rank and E is known to be sparse, recover A.\n\nThis problem formulation immediately suggests a conceptual solution: seek the lowest rank A that\ncould have generated the data, subject to the constraint that the errors are sparse: (cid:107)E(cid:107)0 \u2264 k. The\nLagrangian reformulation of this optimization problem is\n\nrank(A) + \u03b3(cid:107)E(cid:107)0\n\nmin\nA,E\n\nsubj A + E = D.\n\n(1)\n\nIf we could solve this problem for appropriate \u03b3, we might hope to exactly recover the pair (A0, E0)\nthat generated the data D. Unfortunately, (1) is a highly nonconvex optimization problem, and\nno ef\ufb01cient solution is known.3 We can obtain a tractable optimization problem by relaxing (1),\ni \u03c3i(A),\n\nreplacing the (cid:96)0-norm with the (cid:96)1-norm, and the rank with the nuclear norm (cid:107)A(cid:107)\u2217 = (cid:80)\n\nyielding the following convex surrogate:\n\n(cid:107)A(cid:107)\u2217 + \u03bb(cid:107)E(cid:107)1\n\nmin\nA,E\n\nsubj A + E = D.\n\n(2)\nThis relaxation can be motivated by observing that (cid:107)A(cid:107)\u2217 + \u03bb(cid:107)E(cid:107)1 is the convex envelope of\nrank(A) + \u03bb(cid:107)E(cid:107)0 over the set of (A, E) such that max((cid:107)A(cid:107)2,2,(cid:107)E(cid:107)1,\u221e) \u2264 1. Moreover, recent\nadvances in our understanding of the nuclear norm heuristic for low-rank solutions to matrix equa-\ntions [12, 13] and the (cid:96)1 heuristic for sparse solutions to underdetermined linear systems [11, 24],\nsuggest that there might be circumstances under which solving the tractable problem (2) perfectly\nrecovers the low-rank matrix A0. The main result of this paper will be to show that this is indeed\ntrue under surprisingly broad conditions. A sketch of the result is as follows: For \u201calmost all\u201d pairs\n(A0, E0) consisting of a low-rank matrix A0 and a sparse matrix E0,\n\n(A0, E0) = arg min\nA,E\n\n(cid:107)A(cid:107)\u2217 + \u03bb(cid:107)E(cid:107)1\n\nsubj A + E = A0 + E0,\n\nand the minimizer is uniquely de\ufb01ned. That is, under natural probabilistic models for low-rank and\nsparse matrices, almost all observations D = A0 + E0 generated as the sum of a low-rank matrix\nA0 and a sparse matrix E0 can be ef\ufb01ciently and exactly decomposed into their generating parts by\nsolving a convex program.4\nOf course, this is only possible with an appropriate choice of the regularizing parameter \u03bb > 0.\nFrom the optimality conditions for the convex program (2), it is not dif\ufb01cult to show that for matrices\n\nD \u2208 Rm\u00d7m, the correct scaling is \u03bb = O(cid:0)m\u22121/2(cid:1). Throughout this paper, unless otherwise stated,\n\nwe will \ufb01x \u03bb = m\u22121/2. For simplicity, all of our results in this paper will be stated for square\nmatrices D \u2208 Rm\u00d7m, although there is little dif\ufb01culty in extending them to non-square matrices.\n\n3In a sense, this problem subsumes both the low rank matrix completion problem and the (cid:96)0-minimization\n\nproblem, both of which are NP-hard and hard to approximate [23].\n\n4Notice that this is not an \u201cequivalence\u201d result for (1) and (2) \u2013 rather than asserting that the solutions\nof these two problems are equal with high probability, we directly prove that the convex program correctly\ndecomposes D = A0 + E0 into (A0, E0). A natural conjecture, however, is that under the conditions of our\nmain result, (A0, E0) is also the solution to (1) for some choice of \u03b3.\n\n3\n\n\fIt should be clear that not all matrices A0 can be successfully recovered by solving the convex\nprogram (2). Consider, e.g., the rank-1 case where U = [ei] and V = [ej]. Without additional prior\nknowledge, the low-rank matrix A = U SV \u2217 cannot be recovered from even a single gross error. We\ntherefore restrict our attention to matrices A0 whose row and column spaces are not aligned with the\nstandard basis. This can be done probabilistically, by asserting that the marginal distributions of U\nand V are uniform on the Stiefel manifold Wm\nr :\nDe\ufb01nition 2.2 (Random orthogonal model [13]). We consider a matrix A0 to be distributed accord-\ning to the random orthogonal model of rank r if its left and right singular vectors are independent\nuniformly distributed m\u00d7r matrices with orthonormal columns.5 In this model, the nonzero singular\nvalues of A0 can be arbitrary.\n\nOur model for errors is similarly natural: each entry of the matrix is independently corrupted with\nsome probability \u03c1s, and the signs of the corruptions are independent Rademacher random variables.\nDe\ufb01nition 2.3 (Bernoulli error signs and support). We consider an error matrix E0 to be drawn from\nthe Bernoulli sign and support model with parameter \u03c1s if the entries of sign(E0) are independently\ndistributed, each taking on value 0 with probability 1 \u2212 \u03c1s, and \u00b11 with probability \u03c1s/2 each. In\nthis model, the magnitude of the nonzero entries in E0 can be arbitrary.\n\nOur main result is the following (see [22] for a proof):\nTheorem 2.4 (Robust recovery from non-vanishing error fractions). For any p > 0, there exist\ns > 0, m0) with the following property: if m > m0, (A0, E0) \u2208 Rm\u00d7m \u00d7\nconstants (C (cid:63)\nRm\u00d7m with the singular spaces of A0 \u2208 Rm\u00d7m distributed according to the random orthogonal\nmodel of rank\n\n0 > 0, \u03c1(cid:63)\n\n(3)\nand the signs and support of E0 \u2208 Rm\u00d7m distributed according to the Bernoulli sign-and-support\nmodel with error probability \u2264 \u03c1(cid:63)\n\ns, then with probability at least 1 \u2212 Cm\u2212p\n\nlog(m)\n\n0\n\n(A0, E0) = arg min(cid:107)A(cid:107)\u2217 +\n\n1\u221a\nm\n\n(cid:107)E(cid:107)1\n\nsubj A + E = A0 + E0,\n\n(4)\n\nand the minimizer is uniquely de\ufb01ned.\n\nr \u2264 C (cid:63)\n\nm\n\nIn other words, matrices A0 whose singular spaces are distributed according to the random orthogo-\nnal model can, with probability approaching one, be ef\ufb01ciently recovered from almost all corruption\nsign and support patterns without prior knowledge of the pattern of corruption.\nOur line of analysis also implies strong results for the matrix completion problem studied in [13, 15,\n14, 16]. We again refer the interested reader to [22] for a proof of the following result:\nTheorem 2.5 (Matrix completion in proportional growth). There exist numerical constants m0, \u03c1(cid:63)\nr,\ns, C all > 0, with the following property: if m > m0 and A0 \u2208 Rm\u00d7m is distributed according to\n\u03c1(cid:63)\nthe random orthogonal model of rank\n(5)\nand \u03a5 \u2282 [m] \u00d7 [m] is an independently chosen subset of [m] \u00d7 [m] in which the inclusion of each\npair (i, j) is an independent Bernoulli(1\u2212 \u03c1s) random variable with \u03c1s \u2264 \u03c1(cid:63)\ns, then with probability\nat least 1 \u2212 exp (\u2212Cm),\n\nr \u2264 \u03c1(cid:63)\n\nr m,\n\nsubj A(i, j) = A0(i, j) \u2200 (i, j) \u2208 \u03a5,\n\n(6)\n\nA0 = arg min(cid:107)A(cid:107)\u2217\nand the minimizer is uniquely de\ufb01ned.\n\nRelationship to existing work. Contemporaneous results due to [25] show that for A0 distributed\naccording to the random orthogonal model, and E0 with Bernoulli support, correct recovery occurs\nwith high probability provided\n\n(7)\nThis is an interesting result, especially since it makes no assumption on the signs of the errors.\nHowever, even for constant rank r it guarantees correction of only a vanishing fraction o(m1.5) (cid:28)\n\n(cid:107)E0(cid:107)0 \u2264 C m1.5 log(m)\u22121 max(r, log m)\u22121/2.\n\n5I.e., distributed according to the Haar measure on the Stiefel manifold Wm\nr .\n\n4\n\n\fm2 of errors. In contrast, our main result, Theorem 2.4, states that even if r grows proportional\nto m/ log(m), non-vanishing fractions of errors are corrected with high probability. Both analyses\nstart from the optimality condition for the convex program (2). The key technical component of\nthis improved result is a probabilistic analysis of an iterative re\ufb01nement technique for producing a\ndual vector that certi\ufb01es optimality of the pair (A0, E0). This approach extends techniques used\nin [11, 26], with additional care required to handle an operator norm constraint arising from the\npresence of the nuclear norm in (2). For further details we refer the interested reader to [22].\nFinally, while Theorem 2.5 is not the main focus of this paper, it is interesting in light of results by\n[15]. That work proves that in the probabilistic model considered here, a generic m \u00d7 m rank-r\nmatrix can be ef\ufb01ciently and exactly completed from a subset of only\n\nm\n\nCmr log8(m)\n\n(8)\nentries. For r >\npolylog(m), this bound exceeds the number m2 of possible observations. A similar\nresult for spectral methods [14] gives exact completion from O(m log(m)) measurements when\nr = O(1). In contrast, our Theorem 2.5 implies that for certain scenarios with r as large as \u03c1rm,\nthe matrix can be completed from a subset of (1 \u2212 \u03c1s)m2 entries. For matrices of large rank, this is\na signi\ufb01cant extension of [15]. However, our result does not supersede (8) for smaller ranks.\n\n3 Scalable Optimization for Robust PCA\n\nThere are a number of possible approaches to solving the robust PCA semide\ufb01nite program (2). For\nsmall problem sizes, interior point methods offer superior accuracy and convergence rates. However,\noff-the-shelf interior point solvers become impractical for data matrices larger than about 70 \u00d7 70,\ndue to the O(m6) complexity of solving for the step direction. For the experiments in this paper\nwe use an alternative \ufb01rst-order method based on the proximal gradient approach of [18],6 which\nwe brie\ufb02y introduce here. For further discussion of this approach, as well as alternatives based on\nduality, please see [27]. This algorithm solves a slightly relaxed version of (2), in which the equality\nconstraint is replaced with a penalty term:\n\nmin \u00b5(cid:107)A(cid:107)\u2217 + \u03bb\u00b5(cid:107)E(cid:107)1 + 1\n\n2(cid:107)D \u2212 A \u2212 E(cid:107)2\nF .\n\n(9)\n\nHere, \u00b5 is a small constant; as \u00b5 (cid:38) 0, the solutions to (9) approach the solution set of (2).\nThe approach of [18] minimizes functions of this type by forming separable quadratic approxima-\ntions to the data \ufb01delity term (cid:107)D\u2212A\u2212E(cid:107)2\nF at a special set of points ( \u02dcAk, \u02dcEk) that are conspicuously\n\nchosen to obtain a convergence rate of O(cid:0)k\u22122(cid:1). The solutions to these subproblems,\n(cid:17)(cid:13)(cid:13)(cid:13)2\n(cid:12)(cid:12) \u02dcAk, \u02dcEk\n(cid:17)(cid:13)(cid:13)(cid:13)2\n(cid:12)(cid:12) \u02dcAk, \u02dcEk\n(cid:17)\n(cid:1)(cid:12)(cid:12)(cid:12)Ak+1,Ek+1\n\n(cid:13)(cid:13)(cid:13)A \u2212(cid:16) \u02dcAk \u2212 1\n(cid:13)(cid:13)(cid:13)E \u2212(cid:16) \u02dcEk \u2212 1\n\u2208 \u2202(cid:0)\u00b5(cid:107)A(cid:107)\u2217 + \u03bb\u00b5(cid:107)E(cid:107)1 + 1\n\n(11)\ncan be ef\ufb01ciently computed via the soft thresholding operator (for E) and the singular value thresh-\nolding operator (for A, see [20]). We terminate the iteration when the subgradient\n\n(cid:16) \u02dcAk \u2212 Ak+1 + Ek+1 \u2212 \u02dcEk, \u02dcEk \u2212 Ek+1 + Ak+1 \u2212 \u02dcAk\n\nAk+1 = arg min\nA\nEk+1 = arg min\nE\n\n4\u2207A(cid:107)D \u2212 A \u2212 E(cid:107)2\n4\u2207E(cid:107)D \u2212 A \u2212 E(cid:107)2\n\nF\n\n\u00b5(cid:107)A(cid:107)\u2217 +\n\u03bb\u00b5(cid:107)E(cid:107)1 +\n\n2(cid:107)D \u2212 A \u2212 E(cid:107)2\n\nF\n\n(10)\n\nF\n\n,\n\nF\n\n,\n\nF\n\nhas suf\ufb01ciently small Frobenius norm.7 In practice, convergence speed is dramatically improved by\nemploying a continuation strategy in which \u00b5 starts relatively large and then decreases geometrically\nat each iteration until reaching a lower bound, \u00af\u00b5 (as in [21]).\nThe entire procedure is summarized as Algorithm 1 below. We encourage the interested reader to\nconsult [18] for a more detailed explanation of the choice of the proximal points ( \u02dcAk, \u02dcEk), as well\nas a convergence proof ([18] Theorem 4.1). As we will see in the next section, in practice the total\nnumber of iterations is often as small as 200. Since the dominant cost of each iteration is computing\nthe singular value decomposition, this means that it is often possible to obtain a provably robust\nPCA with only a constant factor more computational resources than required for conventional PCA.\n\n6That work is similar in spirit to the work of [19], and has also applied to matrix completion in [21].\n7More precisely, as suggested in [21], we terminate when the norm of this subgradient is less than\n\n2 max(1,(cid:107)(Ak+1, Ek+1)(cid:107)F ) \u00d7 \u03c4. In our experiments, we set \u03c4 = 10\u22127.\n\n5\n\n\fAlgorithm 1: Robust PCA via Proximal Gradient with Continuation\n1: Input: Observation matrix D \u2208 Rm\u00d7n, weight \u03bb.\n2: A0, A\u22121 \u2190 0, E0, E\u22121 \u2190 0, t0, t\u22121 \u2190 1, \u00b50 \u2190 .99(cid:107)D(cid:107)2,2, \u00af\u00b5 \u2190 10\u22125\u00b50.\n3: while not converged\n(Ek \u2212 Ek\u22121).\n4:\n\n(Ak \u2212 Ak\u22121), \u02dcEk \u2190 Ek + tk\u22121\u22121\n\ntk\n\n+ V \u2217.\n\ntk\n\n(cid:16) \u02dcAk + \u02dcEk \u2212 D\n(cid:17)\n2 I(cid:3)\nk ), Ak+1 \u2190 U(cid:2)S \u2212 \u00b5\n(cid:16) \u02dcAk + \u02dcEk \u2212 D\n(cid:17)\n2 11\u2217(cid:105)\nk ] \u25e6(cid:104)|Y E\n.\nk | \u2212 \u03bb\u00b5\n\n.\n\n.\n\n+\n\n, \u00b5 \u2190 max(.9\u00b5, \u00af\u00b5).\n\n2\n\n7:\n\n5:\n6:\n\n\u02dcAk \u2190 Ak + tk\u22121\u22121\nk \u2190 \u02dcAk \u2212 1\nY A\n(U, S, V ) \u2190 svd(Y A\nk \u2190 \u02dcEk \u2212 1\nY E\nEk+1 \u2190 sign[Y E\n\u221a\ntk+1 \u2190 1+\n1+4t2\n9:\nk\n2\n10: end while\n11: Output: A, E.\n\n8:\n\n2\n\n4 Simulations and Experiments\n\nIn this section, we \ufb01rst perform simulations corroborating our theoretical results and clarifying their\nimplications. We then sketch two computer vision applications involving the recovery of intrin-\nsically low-dimensional data from gross corruption: background estimation from video and face\nsubspace estimation under varying illumination. 8\n\nSimulation: proportional growth. We \ufb01rst demonstrate the exactness of the convex program-\nming heuristic, as well as the ef\ufb01cacy of Algorithm 1, on random matrix examples of increasing\ndimension. We generate A0 as a product of two independent m \u00d7 r matrices whose elements are\ni.i.d. N (0, 1) random variables. We generate E0 as a sparse matrix whose support is chosen uni-\nformly at random, and whose non-zero entries are independent and uniformly distributed in the range\n[\u2212500, 500]. We apply the proposed algorithm to the matrix D\n.= A0 + E0 to recover \u02c6A and \u02c6E. The\nresults are presented in Table 1. For these experiments, we choose \u03bb = m\u22121/2. We observe that the\nproposed algorithm is successful in recovering A0 even when 10% of its entries are corrupted.\n\nm rank(A0)\n100\n200\n400\n800\n100\n200\n400\n800\n\n5\n10\n20\n40\n5\n10\n20\n40\n\n(cid:107)E0(cid:107)0\n500\n2,000\n8,000\n32,000\n1,000\n4,000\n16,000\n64,000\n\n(cid:107) \u02c6A\u2212A0(cid:107)F\n(cid:107)A0(cid:107)F\n3.0 \u00d7 10\u22124\n2.1 \u00d7 10\u22124\n1.4 \u00d7 10\u22124\n9.9 \u00d7 10\u22125\n3.1 \u00d7 10\u22124\n2.3 \u00d7 10\u22124\n1.6 \u00d7 10\u22124\n1.2 \u00d7 10\u22124\n\nrank( \u02c6A)\n\n5\n10\n20\n40\n5\n10\n20\n40\n\n(cid:107) \u02c6E(cid:107)0\n506\n2,012\n8,030\n32,062\n1,033\n4,042\n16,110\n64,241\n\n#iterations\n\ntime (s)\n\n104\n104\n104\n104\n108\n107\n107\n106\n\n1.6\n7.9\n64.8\n531.6\n1.6\n8.0\n66.7\n542.8\n\nTable 1: Proportional growth. Here the rank of the matrix grows in proportion (5%) to the dimensionality\nm; and the number of corrupted measurements grows in proportion to the number of entries m2, top 5% and\nbottom 10%, respectively. The time reported is for Matlab implementation run on a 2.8 GHz MacBook Pro.\n\nSimulation: phase transition w.r.t. rank and error sparsity. We next examine how the rank\nof A and the proportion of errors in E affect the performance our algorithm. We \ufb01x m = 200,\nand vary \u03c1r\nand the error probability \u03c1s between 0 and 1. For each \u03c1r, \u03c1s pair, we\ngenerate 10 pairs (A0, E0) as in the above experiment. We deem (A0, E0) successfully recovered\n\n.= rank(A0)\n\nm\n\n8Here, we use these intuitive examples and data illustrate how our algorithm can be used as a simple, general\ntool to effectively separate low-dimensional and sparse structures occurring in real visual data. Appropriately\nharnessing additional structure (e.g., the spatial coherence of the error [28]) may yield even more effective\nalgorithms.\n\n6\n\n\fif the recovered \u02c6A satis\ufb01es (cid:107) \u02c6A\u2212A0(cid:107)F\n< 0.01. Figure 1 (left) plots the fraction of correct recoveries.\n(cid:107)A0(cid:107)F\nWhite denotes perfect recovery in all experiments, and black denotes failure for all experiments. We\nobserve that there is a relatively sharp phase transition between success and failure of the algorithm\nroughly above the line \u03c1r + \u03c1s = 0.35. To verify this behavior, we repeat the experiment, but only\nvary \u03c1r and \u03c1s between 0 and 0.4 with \ufb01ner steps. These results, seen in Figure 1 (right), show that\nphase transition remains fairly sharp even at higher resolution.\n\nFigure 1: Phase transition wrt rank and error sparsity. Here, \u03c1r = rank(A)/m, \u03c1s = (cid:107)E(cid:107)0/m2. Left:\n(\u03c1r, \u03c1s) \u2208 [0, 1]2. Right: (\u03c1r, \u03c1s) \u2208 [0, 0.4]2.\n\nExperiment: background modeling from video. Background modeling or subtraction from\nvideo sequences is a popular approach to detecting activity in the scene, and \ufb01nds application in\nvideo surveillance from static cameras. Background estimation is complicated by the presence of\nforeground objects such as people, as well as variability in the background itself, for example due\nto varying illumination. In many cases, however, it is reasonable to assume that these background\nvariations are low-rank, while the foreground activity is spatially localized, and therefore sparse. If\nthe individual frames are stacked as columns of a matrix D, this matrix can be expressed as the sum\nof a low-rank background matrix and a sparse error matrix representing the activity in the scene. We\nillustrate this idea using two examples from [29] (see Figures 2). In Figure 2(a)-(c), the video se-\nquence consists of 200 frames of a scene in an airport. There is no signi\ufb01cant change in illumination\nin the video, but a lot of activity in the foreground. We observe that our algorithm is very effective in\nseparating the background from the activity. In Figure 2(d)-(f), we have 550 frames from a scene in\na lobby. There is little activity in the video, but the illumination changes drastically towards the end\nof the sequence. We see that our algorithm is once again able to recover the background, irrespective\nof the illumination change.\n\nExperiment: removing shadows and specularities from face images. Face recognition is an-\nother domain in computer vision where low-dimensional linear models have received a great deal\nof attention, mostly due to the work of [30]. The key observation is that under certain idealized\ncircumstances, images of the same face under varying illumination lie near an approximately nine-\ndimensional linear subspace known as the harmonic plane. However, since faces are neither per-\nfectly convex nor Lambertian, face images taken under directional illumination often suffer from\nself-shadowing, specularities, or saturations in brightness.\nGiven a matrix D whose columns represent well-aligned training images of a person\u2019s face under\nvarious illumination conditions, our Robust PCA algorithm offers a principled way of removing\nsuch spatially localized artifacts. Figure 3 illustrates the results of our algorithm on images from\nsubsets 1-3 of the Extended Yale B database [31]. The proposed algorithm algorithm removes the\nspecularities in the eyes and the shadows around the nose region. This technique is potentially useful\nfor pre-processing training images in face recognition systems to remove such deviations from the\nlow-dimensional linear model.\n\n5 Discussion and Future Work\n\nOur results give strong theoretical and empirical evidences for the ef\ufb01cacy of using convex pro-\ngramming to recover low-rank matrices from corrupted observations. However, there remain many\nfascinating open questions in this area. From a mathematical perspective, it would be interesting to\n\n7\n\n\u03c1r\u03c1s0.20.40.60.810.20.40.60.81\u03c1r\u03c1s0.10.20.30.30.20.1\f(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\n(f)\n\nFigure 2: Background modeling. (a) Video sequence of a scene in an airport. The size of each frame is\n72 \u00d7 88 pixels, and a total of 200 frames were used. (b) Static background recovered by our algorithm. (c)\nSparse error recovered by our algorithm represents activity in the frame. (d) Video sequence of a lobby scene\nwith changing illumination. The size of each frame is 64 \u00d7 80 pixels, and a total of 550 frames were used. (e)\nStatic background recovered by our algorithm. (f) Sparse error. The background is correctly recovered even\nwhen the illumination in the room changes drastically in the frame on the last row.\n\n(a)\n\n(b)\n\n(c)\n\n(a)\n\n(b)\n\n(c)\n\nFigure 3: Removing shadows and specularities from face images. (a) Cropped and aligned images of a\nperson\u2019s face under different illuminations from the Extended Yale B database. The size of each image is\n96 \u00d7 84 pixels, a total of 31 different illuminations were used for each person. (b) Images recovered by our\nalgorithm. (c) The sparse errors returned by our algorithm correspond to specularities in the eyes, shadows\naround the nose region, or brightness saturations on the face.\n\nknow if it is possible to remove the logarithmic factor in our main result. The phase transition exper-\niment in Section 4 suggests that convex programming actually succeeds even for rank(A0) < \u03c1rm\nand (cid:107)E0(cid:107)0 < \u03c1sm2, where \u03c1r and \u03c1s are suf\ufb01ciently small positive constants. Another interesting\nand important question is whether the recovery is stable in the presence of small dense noise. That\nis, suppose we observe D = A0 + E0 + Z, where Z is a noise vector of small (cid:96)2-norm (e.g., Gaus-\nsian noise). A natural approach is to now minimize (cid:107)A(cid:107)\u2217 + \u03bb(cid:107)E(cid:107)1, subject to a relaxed constraint\n(cid:107)D \u2212 A \u2212 E(cid:107)F \u2264 \u03b5. For matrix completion, [16] showed that a similar relaxation gives stable\nrecovery \u2013 the error in the solution is proportional to the noise level. Finally, while this paper has\nsketched several examples on visual data, we believe that this powerful new tool pertains to a wide\nrange of high-dimensional data, for example in bioinformatics and web search.\n\nReferences\n[1] C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika,\n\n1(3):211\u2013218, 1936.\n\n8\n\n\f[2] S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. SIAM Review, 43(1):129\u2013\n\n159, 2001.\n\n[3] J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality\n\nreduction. Science, 290(5500):2319\u20132323, 2000.\n\n[4] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.\n\nNeural Computation, 15(6):1373\u20131396, 2003.\n\n[5] I. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, New York, 1986.\n[6] P. Huber. Robust Statistics. Wiley, New York, New York, 1981.\n[7] F. De La Torre and M. Black. A framework for robust subspace learning. IJCV, 54(1-3):117\u2013142, 2003.\n[8] R. Gnanadesikan and J. Kettenring. Robust estimates, residuals, and outlier detection with multiresponse\n\ndata. Biometrics, 28(1):81\u2013124, 1972.\n\n[9] Q. Ke and T. Kanade. Robust l1 norm factorization in the presence of outliers and missing data by\n\nalternative convex programming. In CVPR, 2005.\n\n[10] M. Fischler and R. Bolles. Random sample consensus: A paradigm for model \ufb01tting with applications to\n\nimage analysis and automated cartography. Communications of the ACM, 24(6):381\u2013385, 1981.\n\n[11] E. Cand`es and T. Tao. Decoding by linear programming. IEEE Trans. Info. Thy., 51(12):4203\u20134215,\n\n2005.\n\n[12] B. Recht, M. Fazel, and P. Parillo. Guaranteed minimum rank solution of matrix equations via nuclear\n\nnorm minimization. SIAM Review, submitted for publication.\n\n[13] E. Candes and B. Recht. Exact matrix completion via convex optimzation. Foundations of Computational\n\nMathematics, to appear.\n\n[14] A. Montanari R. Keshavan and S. Oh. Matrix completion from a few entries. preprint, 2009.\n[15] E. Candes and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Transac-\n\ntions on Information Theory, submitted for publication.\n\n[16] E. Candes and Y. Plan. Matrix completion with noise. Proceedings of the IEEE, to appear.\n[17] D. Donoho. High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math\n\nChallenges Lecture, 2000.\n\n[18] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.\n\nSIAM Journal on Imaging Science, (1):183\u2013202, 2009.\n\n[19] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127\u2013\n\n152, 2005.\n\n[20] J. Cai, E. Candes, and Z. Shen. A singular value thresholding algorithm for matrix completion. preprint,\n\nhttp://arxiv.org/abs/0810.3286, 2008.\n\n[21] K.-C. Toh and S. Yun. An accelerated proximal gradient algorithm for nuclear norm regularized least\n\nsquares problems. preprint, http://math.nus.edu.sg/\u02dcmatys/apg.pdf, 2009.\n\n[22] J. Wright, A. Ganesh, S. Rao, and Y. Ma. Robust principal component analysis: Exact recovery of\n\ncorrupted low-rank matrices via convex optimization. Journal of the ACM, submitted for publication.\n\n[23] E. Amaldi and V. Kann. On the approximability of minimizing nonzero variables or unsatis\ufb01ed relations\n\nin linear systems. Theoretical Computer Science, 209(2):237\u2013260, 1998.\n\n[24] D. Donoho. For most large underdetermined systems of linear equations the minimal l1-norm solution is\n\nalso the sparsest solution. Communications on Pure and Applied Mathematics, 59(6):797\u2013829, 2006.\n\n[25] V. Chandrasekaran, S. Sanghavi, P. Parrilo, and A. Willsky. Sparse and low-rank matrix decompositions.\n\nIn IFAC Symposium on System Identi\ufb01cation, 2009.\n\n[26] J. Wright and Y. Ma. Dense error correction via (cid:96)1-minimization. IEEE Transactions on Information\n\nTheory, to appear.\n\n[27] Z. Lin, A. Ganesh, J. Wright, M. Chen, L. Wu, and Y. Ma. Fast convex optimization algorithms for exact\n\nrecovery of a corrupted low-rank matrix. SIAM Journal on Optimization, submitted for publication.\n\n[28] V. Cevher, , M. F. Duarte, C. Hegde, and R. G. Baraniuk. Sparse signal recovery using markov random\n\n\ufb01elds. In NIPS, 2008.\n\n[29] L. Li, W. Huang, I. Gu, and Q. Tian. Statistical modeling of complex backgrounds for foreground object\n\ndetection. IEEE Transactions on Image Processing, 13(11), 2004.\n\n[30] R. Basri and D. Jacobs. Lambertian re\ufb02ection and linear subspaces. IEEE Trans. PAMI, 25(3):218\u2013233,\n\n2003.\n\n[31] A. Georghiades, P. Belhumeur, and D. Kriegman. From few to many: Illumination cone models for face\n\nrecognition under variable lighting and pose. IEEE Trans. PAMI, 23(6):643\u2013660, 2001.\n\n9\n\n\f", "award": [], "sourceid": 116, "authors": [{"given_name": "John", "family_name": "Wright", "institution": null}, {"given_name": "Arvind", "family_name": "Ganesh", "institution": null}, {"given_name": "Shankar", "family_name": "Rao", "institution": null}, {"given_name": "Yigang", "family_name": "Peng", "institution": null}, {"given_name": "Yi", "family_name": "Ma", "institution": null}]}