{"title": "Analysis of Robust PCA via Local Incoherence", "book": "Advances in Neural Information Processing Systems", "page_first": 1819, "page_last": 1827, "abstract": "We investigate the robust PCA problem of decomposing an observed matrix into the sum of a low-rank and a sparse error matrices via convex programming Principal Component Pursuit (PCP). In contrast to previous studies that assume the support of the error matrix is generated by uniform Bernoulli sampling, we allow non-uniform sampling, i.e., entries of the low-rank matrix are corrupted by errors with unequal probabilities. We characterize conditions on error corruption of each individual entry based on the local incoherence of the low-rank matrix, under which correct matrix decomposition by PCP is guaranteed. Such a refined analysis of robust PCA captures how robust each entry of the low rank matrix combats error corruption. In order to deal with non-uniform error corruption, our technical proof introduces a new weighted norm and develops/exploits the concentration properties that such a norm satisfies.", "full_text": "Analysis of Robust PCA via Local Incoherence\n\nHuishuai Zhang\nDepartment of EECS\nSyracuse University\nSyracuse, NY 13244\nhzhan23@syr.edu\n\nYi Zhou\n\nDepartment of EECS\nSyracuse University\nSyracuse, NY 13244\nyzhou35@syr.edu\n\nAbstract\n\nYingbin Liang\n\nDepartment of EECS\nSyracuse University\nSyracuse, NY 13244\n\nyliang06@syr.edu\n\nWe investigate the robust PCA problem of decomposing an observed matrix into\nthe sum of a low-rank and a sparse error matrices via convex programming Prin-\ncipal Component Pursuit (PCP). In contrast to previous studies that assume the\nsupport of the error matrix is generated by uniform Bernoulli sampling, we allow\nnon-uniform sampling, i.e., entries of the low-rank matrix are corrupted by er-\nrors with unequal probabilities. We characterize conditions on error corruption of\neach individual entry based on the local incoherence of the low-rank matrix, under\nwhich correct matrix decomposition by PCP is guaranteed. Such a re\ufb01ned analy-\nsis of robust PCA captures how robust each entry of the low rank matrix combats\nerror corruption. In order to deal with non-uniform error corruption, our technical\nproof introduces a new weighted norm and develops/exploits the concentration\nproperties that such a norm satis\ufb01es.\n\nIntroduction\n\n1\nWe consider the problem of robust Principal Component Analysis (PCA). Suppose a n-by-n1 data\nmatrix M can be decomposed into a low-rank matrix L and a sparse matrix S as\n\nM = L + S.\n\n(1)\n\nRobust PCA aims to \ufb01nd L and S with M given. This problem has been extensively studied recently.\nIn [1, 2], Principal Component Pursuit (PCP) has been proposed to solve the robust PCA problem\nvia the following convex programming\n\nPCP: minimize\n\nL,S\n\nkLk\u21e4 + kSk1\nsubject to M = L + S,\n\n(2)\n\nwhere k\u00b7k \u21e4 denotes the nuclear norm, i.e., the sum of singular values, and k\u00b7k 1 denotes the l1\nnorm i.e., the sum of absolute values of all entries. It was shown in [1, 2] that PCP successfully\nrecovers L and S if the two matrices are distinguishable from each other in properties, i.e., L is not\nsparse and S is not low-rank. One important quantity that determines similarity of L to a sparse\nmatrix is the incoherence of L, which measures how column and row spaces of L are aligned with\ncanonical basis and between themselves. Namely, suppose that L is a rank-r matrix with SVD\nL = U \u2303V \u21e4, where \u2303 is a r \u21e5 r diagonal matrix with singular values as its diagonal entries, U is a\nn \u21e5 r matrix with columns as the left singular vectors of L, V is a n \u21e5 r matrix with columns as the\nright singular vectors of L, and V \u21e4 denotes the transpose of V . The incoherence of L is measured\n\n1In this paper, we focus on square matrices for simplicity. Our results can be extended to rectangular\n\nmatrices in a standard way.\n\n1\n\n\fby \u00b5 = max{\u00b50, \u00b51}, where \u00b50 and \u00b51 are de\ufb01ned as\nkV \u21e4ejk \uf8ffr \u00b50r\n\nn\n\nn\n\n,\n\nfor all i, j = 1,\u00b7\u00b7\u00b7 , n\n\n(3)\n\n(4)\n\n,\n\nkU\u21e4eik \uf8ffr \u00b50r\nkU V \u21e4k1 \uf8ffr \u00b51r\n\nn2 .\n\nPrevious studies suggest that the incoherence crucially determines conditions on sparsity of S in\norder for PCP to succeed. For example, Theorem 2 in [3] explicitly shows that the matrix L with\nlarger \u00b5 can tolerate only smaller error density to guarantee correct matrix decomposition by PCP.\nIn all previous work on robust PCA, the incoherence is de\ufb01ned to be the maximum over all column\nand row spaces of L as in (3) and (4), which can be viewed as the global parameter for the entire\nmatrix L, and consequently, characterization of error density is based on such global (and in fact the\nworst case) incoherence.\nIn fact, each (i, j) entry of the low rank matrix L can be associated with a local incoherence parame-\nter \u00b5ij, which is less than or equal to the global parameter \u00b5, and then the allowable entry-wise error\ndensity can be potentially higher than that characterized based on the global incoherence. Thus,\nthe total number of errors that the matrix can tolerate in robust PCA can be much higher than that\ncharacterized based on the global incoherence when errors are distributed accordingly. Motivated\nby such an observation, this paper aims to characterize conditions on error corruption of each entry\nof the low rank matrix based on the corresponding local incoherence parameter, which guarantee\nsuccess of PCP. Such conditions imply how robust each individual entry of L to resist error corrup-\ntion. Naturally, the error corruption probability is allowed to be non-uniform over the matrix (i.e.,\nlocations of non-zero entries in S are sampled non-uniformly).\nWe note that the notion of local incoherence was \ufb01rst introduced in [4] for studying the matrix\ncompletion problem, in which local incoherence determines the local sampling density in order to\nguarantee correct matrix completion. Here, local incoherence plays a similar role, and determines\nthe maximum allowable error density at each entry to guarantee correct matrix decomposition. The\ndifference lies in that local incoherence here depends on both localized \u00b50 and \u00b51 rather than only\non localized \u00b50 in matrix completion due to further dif\ufb01culty of robust PCA, in which locations of\nerror corrupted entries are unknown, as pointed out in [1, 3].\nOur Contribution. In this paper, we investigate a more general robust PCA problem, in which\nentries of the low rank matrix are corrupted by non-uniformly distributed Bernoulli errors. We\ncharacterize the conditions that guarantee correct matrix decomposition by PCP. Our result identi\ufb01es\nthe local incoherence (de\ufb01ned by localized \u00b50 and \u00b51 for each entry of the low rank matrix) to\ndetermine the condition that each local Bernoulli error corruption parameter should satisfy. Our\nresults provide the following useful understanding of the robust PCA problem:\n\nconditions for robust PCA.\n\non how errors are distributed over the matrix.\n\ndetermines how robust each entry of the low rank matrix combats error corruption.\n\n\u2022 Our characterization provides a localized (and hence more re\ufb01ned) view of robust PCA, and\n\u2022 Our results suggest that the total number of errors that the low-rank matrix can tolerate depends\n\u2022 Via cluster problems, our results provide an evidence that \u00b51 is necessary in characterizing\nIn order to deal with non-uniform error corruption, our technical proof introduces a new weighted\nnorm denoted by lw(1), which involves the information of both localized \u00b50 and \u00b51 and is hence dif-\nferent from the weighted norms introduced in [4] for matrix completion. Thus, our proof necessarily\ninvolves new technical developments associated with such a new norm.\nRelated Work. A closely related but different problem from robust PCA is matrix completion, in\nwhich a low-rank matrix is partially observed and is to be completed. Such a problem has been\npreviously studied in [5\u20138], and it was shown that a rank-r n-by-n matrix can be provably recov-\nerable by convex optimization with as few as \u21e5(max{\u00b50, \u00b51}nr log2 n)2 observed entries. Later\non, it was shown in [4] that \u00b51 does not affect sample complexity for matrix completion and hence\n\u21e5(\u00b50nr log2 n) observed entries are suf\ufb01cient for guaranteeing correct matrix completion. It was\nfurther shown in [9] that a coherent low-rank matrix (i.e., with large \u00b50) can be recovered with\n\n2f (n) 2 \u21e5(g(n)) means k1 \u00b7 g(n) \uf8ff f (n) \uf8ff k2 \u00b7 g(n) for some positive k1, k2.\n\n2\n\n\f\u21e5(nr log2 n) observations as long as the sampling probability is proportional to the leverage score\n(i.e., localized \u00b50). Our problem can be viewed as its counterpart in robust PCA, where the differ-\nence lies in the local incoherence in our problem depends on both localized \u00b50 and \u00b51.\nRobust PCA aims to decompose an observed matrix into the sum of a low-rank matrix and a sparse\nmatrix. In [2, 10], robust PCA with \ufb01xed error matrix was studied, and it was shown that the max-\nimum number of errors in any row or column should be bounded from above in order to guarantee\ncorrect decomposition by PCP. Robust PCA with random error matrix was investigated in a number\nof studies. It has been shown in [1] that such decomposition can be exact with high probability if\nthe percentage of corrupted entries is small enough, under the assumptions that the low-rank matrix\nis incoherent and the support set of the sparse matrix is uniformly distributed. It was further shown\nin [11] that if signs of nonzero entries in the sparse matrix are randomly chosen, then an adjusted\nconvex optimization can produce exact decomposition even when the percentage of corrupted en-\ntries goes to one (i.e., error is dense). The problem was further studied in [1, 3, 12] for the case\nwith the error-corrupted low-rank matrix only partially observed. Our work provides a more re\ufb01ned\n(i.e. entry-wise) view of robust PCA with random error matrix, aiming at understanding how local\nincoherence affects susceptibility of each matrix entry to error corruption.\n2 Model and Main Result\n2.1 Problem Statement\n\nWe consider the robust PCA problem introduced in Section 1. Namely, suppose an n-by-n matrix\nM can be decomposed into two parts: M = L + S, where L is a low rank matrix and S is a sparse\n(error) matrix. We assume that the rank of L is r, and the support of S is selected randomly but\nnon-uniformly. More speci\ufb01cally, let \u2326 denote the support of S and then \u2326 \u2713 [n] \u21e5 [n], where [n]\ndenotes the set {1, 2, . . . , n}. The event {(i, j) 2 \u2326} is independent across different pairs (i, j) and\n(5)\nwhere \u21e2ij represents the probability that the (i, j)-entry of L is corrupted by error. Hence, \u2326 is\ndetermined by Bernoulli sampling with non-uniform probabilities.\nWe study both the random sign and \ufb01xed sign models for S. For the \ufb01xed sign model, we assume\nsigns of nonzero entries in S are arbitrary and \ufb01xed, whereas for the random sign model, we assume\nthat signs of nonzero entries in S are independently distributed Bernoulli variables, randomly taking\nvalues +1 or 1 with probability 1/2 as follows:\n\nP ((i, j) 2 \u2326) = \u21e2ij,\n\nIn this paper, our goal is to characterize conditions on \u21e2ij that guarantees correct recovery of L and\nS with observation of M.\nWe provide some notations that are used throughout this paper. A matrix X is associated with \ufb01ve\nnorms: kXkF denotes the Frobenius norm, kXk\u21e4 denotes the nuclear norm (i.e., the sum of singular\nvalues), kXk denotes the spectral norm (i.e., the largest singular value), and kXk1 and kXk1\nrepresent respectively the l1 and l1 norms of the long vector stacked by X. The inner product\nbetween two matrices is de\ufb01ned as hX, Y i := trace(X\u21e4Y ). For a linear operator A that acts on the\nspace of matrices, kAk denotes the operator norm given by kAk = sup{kXkF =1} kAXkF .\n2.2 Main Theorems\n\nWe adopt the PCP to solve the robust PCA problem. We de\ufb01ne the following local incoherence\nparameters, which play an important role in our characterization of conditions on entry-wise \u21e2ij.\n\nn\n\n(7)\n(8)\nIt is clear that \u00b50ij \uf8ff \u00b50 and \u00b51ij \uf8ff \u00b51 for all i, j = 1,\u00b7\u00b7\u00b7 , n. We note that although maxi,j \u00b5ij >\n1, some \u00b5ij might take values as small as zero.\n\n\u00b50ij :=\n\u00b5ij := max{\u00b50ij, \u00b51ij}.\n\n2rkU\u21e4eik2 + kV \u21e4ejk2 , \u00b51ij :=\n\nn2([U V \u21e4]ij)2\n\nr\n\n3\n\n[sgn(S)]ij =8<:\n\nwith prob. \u21e2ij/2\n1\nwith prob. 1 \u21e2ij\n0\n1 with prob. \u21e2ij/2.\n\n(6)\n\n\fWe \ufb01rst consider the robust PCA problem under the random sign model as introduced in Section\n2.1. The following theorem characterizes the condition that guarantees correct recovery by PCP.\nTheorem 1. Consider the robust PCA problem under the random sign model. If\n\n1 \u21e2ij max\u21e2C0r \u00b5ijr\n\nn\n\nlog n,\n\n1\n\nn3\n\nfor some suf\ufb01ciently large constant C0 and for all i, j 2 [n], then P CP yields correct matrix\nrecovery with =\n\n32pn log n, with probability at least 1 cn10 for some constant c.\n\n1\n\n1\n\nWe note that the term 1/n3 is introduced to justify dual certi\ufb01cate conditions in the proof (see Ap-\n\npendix A.2). We further note that satisfying the condition in Theorem 1 implies C0p\u00b5r/n log n \uf8ff\n\n1, which is an essential bound required in our proof and coincides with the conditions in previ-\nous studies [1, 12]. Although we set =\n32pn log n for the sake of proof, in practice is often\ndetermined via cross validation.\nThe above theorem suggests that the local incoherence parameter \u00b5ij is closely related to how ro-\nbust each entry of L to error corruption in matrix recovery. An entry corresponding to smaller \u00b5ij\ntolerates larger error density \u21e2ij. This is consistent with the result in [4] for matrix completion, in\nwhich smaller local incoherence parameter requires lower local sampling rate. The difference lies in\nthat here both \u00b50ij and \u00b51ij play roles in \u00b5ij whereas only \u00b50ij matters in matrix completion. The\nnecessity of \u00b51ij for robust PCA is further demonstrated in Section 2.3 via an example.\nTheorem 1 also provides a more re\ufb01ned view for robust PCA in the dense error regime, in which\nthe error corruption probability approaches one. Such an interesting regime was previously studied\nin [3, 11]. In [11], it is argued that PCP with adaptive yields exact recovery even when the error\ncorruption probability approaches one if errors take random signs and the dimension n is suf\ufb01ciently\nlarge. In [3], it is further shown that PCP with a \ufb01xed also yields exact recovery and the scaling\nbehavior of the error corruption probability is characterized. The above Theorem 1 further provides\nthe scaling behavior of the local entry-wise error corruption probability \u21e2ij as it approaches one,\nand captures how such scaling behavior depends on local incoherence parameters \u00b5ij. Such a result\nimplies that robustness of PCP depends not only on the error density but also on how errors are\ndistributed over the matrix with regard to \u00b5ij.\nWe next consider the robust PCA problem under the \ufb01xed sign model as introduced in Section 2.1.\nIn this case, non-zero entries of the error matrix S can take arbitrary and \ufb01xed values, and only\nlocations of non-zero entries are random.\nTheorem 2. Consider the robust PCA problem under the \ufb01xed sign model. If\n\nn3\n32pn log n, with probability at least 1 cn10 for some constant c.\n\n(1 2\u21e2ij) max\u21e2C0r \u00b5ijr\n\nfor some suf\ufb01cient large constant C0 and for all i, j 2 [n], then PCP yields correct recovery with\n =\n\nlog n,\n\nn\n\n1\n\n1\n\nTheorem 2 follows from Theorem 1 by adapting the elimination and derandomization arguments [1,\nSection 2.2] as follows. Let \u21e2 be the matrix with each (i, j)-entry being \u21e2ij. If PCP yields exact\nrecovery with a certain probability for the random sign model with the parameter 2\u21e2, then it also\nyields exact recovery with at least the same probability for the \ufb01xed sign model with locations of\nnon-zero entries sampled using Bernoulli model with the parameter \u21e2.\nWe now compare Theorem 2 for robust PCA with non-uniform error corruption to Theorem 1.1 in [1]\nfor robust PCA with uniform error corruption. It is clear that if we set \u21e2i,j = \u21e2 for all i, j 2 [n],\nthen the two models are the same. It can then be easily checked that conditionsp\u00b5r/n log n \uf8ff \u21e2r\nand \u21e2 \uf8ff \u21e2s in Theorem 1.1 of [1] implies the conditions in Theorem 2. Thus, Theorem 2 provides\na more relaxed condition than Theorem 1.1 in [1]. Such bene\ufb01t of condition relaxation should be\nattributed to the new gol\ufb01ng scheme introduced in [3, 12], and this paper provides a more re\ufb01ned\nview of robust PCA by further taking advantage of such a new gol\ufb01ng scheme to analyze local\nconditions.\nMore importantly, Theorem 2 characterizes relationship between local incoherence parameters and\nlocal error corruption probabilities, which implies that different areas of the low-rank matrix have\n\n4\n\n\fImplication on Cluster Matrix\n\ndifferent levels of ability to resist errors: a more incoherent area (i.e., with smaller \u00b5ij) can tolerate\nmore errors. Thus, Theorem 2 illustrates the following interesting fact. Whether PCP yields correct\nrecovery depends not only on the total number of errors but also on how errors are distributed. If\nmore errors are distributed to more incoherent areas (i.e, with smaller \u00b5ij), then more errors in total\ncan be tolerated. However, if errors are distributed in an opposite manner, then only smaller number\nof errors can be tolerated.\n2.3\nIn this subsection, we further illustrate our result when the low rank matrix is a cluster matrix.\nAlthough robust PCA and even more sophisticated approaches have been applied to solve clustering\nproblems, e.g., [13\u201315], our perspective here is to demonstrate how local incoherence affects entry-\nwise robustness to error corruption, which has not been illustrated in previous studies.\nSuppose there are n elements to be clustered. We use a cluster matrix L to represent the clustering\nrelationship of these n elements with Lij = 1 if elements i and j are in the same cluster and Lij = 0\notherwise. Thus, with appropriate ordering of the elements, L is a block diagonal matrix with all\ndiagonal blocks containing all \u20181\u2019s and off-diagonal blocks containing all \u20180\u2019s. Hence, the rank r of\nL equals the number of clusters, which is typically small compared to n. Suppose these entries are\ncorrupted by errors that \ufb02ip entries from one to zero or from zero to one. This can be thought of as\nadding a (possibly sparse) error matrix S to L so that the observed matrix is L + S. Then PCP can\nbe applied to recover the cluster matrix L.\nWe \ufb01rst consider an example with clusters having equal size n/r. We set n = 600 and r = 4 (i.e.,\nfour equal-size clusters). We apply errors to diagonal-block entries and off-diagonal-block entries\nrespectively with the probabilities \u21e2d and \u21e2od. In Fig. 1a, we plot recovery accuracy of PCP for\neach pairs of (\u21e2od,\u21e2 d). It is clear from the \ufb01gure that failure occurs for larger \u21e2od than \u21e2d, which\nthus implies that off-diagonal blocks are more robust to errors than diagonal blocks. This can be\nexplained by Theorem 2 as follows. For a cluster matrix with equal cluster size n/r, the local\nincoherence parameters are given by\n\n\u00b50ij = 1 for all (i, j),\n\nand\n\nand thus\n\n\u00b51ij =\u21e2r,\n\n0,\n\n(i, j) is in diagonal blocks\n(i, j) is in off-diagonal blocks,\n\n\u00b5ij = max{\u00b50ij, \u00b51ij} =\u21e2r,\n\n1,\n\n(i, j) is in diagonal blocks\n(i, j) is in off-diagonal blocks.\n\nd\n\n\u03c1\n \n \n \nr\no\nr\nr\ne\n \nk\nc\no\nb\n\u2212\na\nn\no\ng\na\nD\n\nl\n\ni\n\nl\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n2\n\n\u03c1\n \nr\no\nr\nr\ne\n\n \n2\nr\ne\nt\ns\nu\nC\n\nl\n\n0.5\n\n0\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\nCluster1 error \u03c11\n\n0.1\nOff\u2212diagonal\u2212block error \u03c1\n\n0.2\n\n0.3\n\n0.4\n\nod\n\n(a) Diagonal-block error vs. off-diagonal-block\nerror. n = 600, r = 4 with equal cluster sizes\n\n(b) Error vulnerability with respect to cluster sizes\n500 vs. 100\n\nFigure 1: Error vulnerability on different parts for cluster matrix. In both cases, for each probability pair, we\ngenerate 10 trials of independent random error matrices and count the number of successes of PCP. We declare\na trial to be successful if the recovered \u02c6L satis\ufb01es k \u02c6L LkF /kLkF \uf8ff 103. Color from white to black\nrepresents the number of successful trials changes from 10 to 0.\n\nBased on Theorem 2, it is clear that diagonal-block entries are more locally coherent and hence are\nmore vulnerable to errors, whereas off-diagonal-block entries are more locally incoherent and hence\nare more robust to errors.\n\n5\n\n\fMoreover, this example also demonstrates the necessity of \u00b51 in the robust PCA problem. [4] showed\nthat \u00b51 is not necessary for matrix completion and argued informally that \u00b51 is necessary for robust\nPCA by connecting the robust PCA problem to hardness of \ufb01nding a small clique in a large random\ngraph. Here, the above example provides an evidence for such a fact. In the example, \u00b50ij are the\nsame over the entire matrix, and hence it is \u00b51ij that differentiates incoherence between diagonal\nblocks and off-diagonal blocks, and thus differentiates their robustness to errors.\nWe then consider the case with two clusters that have different sizes: cluster1 size 500 versus cluster2\nsize 100. Hence, r = 2. We apply errors to block diagonal entries corresponding to clusters 1\nand 2 respectively with the probabilities \u21e21 and \u21e22. In Fig. 1b, we plot the recovery accuracy of\nPCP for each pair of (\u21e21,\u21e2 2). It is clear from the \ufb01gure that failure occurs for larger \u21e21 than \u21e22,\nwhich thus implies that entries corresponding to the larger cluster are more robust to errors than\nentries corresponding to smaller clusters. This can be explained by Theorem 2 because the local\nincoherence of a block diagonal entry is given by \u00b5ij = n2\nrK2 , where K is the corresponding cluster\npn\nK log n for correct\nsize, and hence the error corruption probability should satisfy 1 2\u21e2ij > C0\nrecovery. Thus, a larger cluster can resist denser errors. This also coincides with the results on graph\nclustering in [13, 16].\n2.4 Outline of the Proof of Theorem 1\nThe proof of Theorem 1 follows the idea established in [1] and further developed in [3, 12]. Our\nmain technical development lies in analysis of non-uniform error corruption based on local incoher-\nence parameters, for which we introduce a new weighted norm lw(1), and establish concentration\nproperties and bounds associated with this norm. As a generalization of matrix in\ufb01nity norm, lw(1)\nincorporates both \u00b50ij and \u00b51ij, and is hence different from the weighted norms l\u00b5(1) and l\u00b5(1,2)\nin [9] by its role in the analysis for the robust PCA problem. We next outline the proof here and the\ndetailed proofs are provided in Appendix A.\nWe \ufb01rst introduce some notations. We de\ufb01ne the subspace T := {U X\u21e4 + Y V \u21e4 : X, Y 2 Rn\u21e5r},\nwhere U, V are left and right singular matrix of L. Then T induces a projection operator PT given\nby PT (M ) = U U\u21e4M + M V V \u21e4 U U\u21e4M V V \u21e4. Moreover, T ?, the complement subspace to T , in-\nduces an orthogonal projection operator PT ? with PT ?(M ) = (IU U\u21e4)M (IV V \u21e4). We further\nde\ufb01ne two operators associated with Bernoulli sampling. Let \u23260 denote a generic subset of [n]\u21e5[n].\nWe de\ufb01ne a corresponding projection operator P\u23260 as P\u23260(M ) = Pij I{(i,j)2\u23260}hM, eie\u21e4jieie\u21e4j ,\nwhere I{\u00b7} is the indicator function. If \u23260 is a random set generated by Bernoulli sampling with\nP((i, j) 2 \u23260) = tij with 0 < tij \uf8ff 1 for all i, j 2 [n], we further de\ufb01ne a linear operator R\u23260 as\nR\u23260(M ) =Pij\nWe further note that throughout this paper \u201cwith high probability\u201d means \u201cwith probability at least\n1 cn10\u201d, where the constant c may be different in various contexts.\nOur proof includes two main steps: establishing that existence of a certain dual certi\ufb01cate is suf\ufb01-\ncient to guarantee correct recovery and constructing such a dual certi\ufb01cate. For the \ufb01rst step, we\nestablish the following proposition.\n\ntij I{(i,j)2\u23260}hM, eie\u21e4jieie\u21e4j .\n\n1\n\nwith the correct (L, S) with high probability if there exists a dual certi\ufb01cate Y obeying\n\n,\n\nP\u2326Y = 0,\n\nkY k1 \uf8ff\n4\nkPT ?( sgn(S) + Y )k \uf8ff\nkPT (Y + sgn(S) U V \u21e4)kF \uf8ff\n\n1\n4\n\n,\n\n(9)\n\n(10)\n\n(11)\n\n(12)\n\n\nn2\n\nwhere =\n\n1\n\n32pn log n.\n\nThe proof of the above proposition adapts the idea in [1,12] for uniform errors to non-uniform errors.\nIn particular, the proof exploits the properties of R\u2326 associated with non-uniform errors, which are\npresented as Lemma 1 (established in [9]) and Lemma 2 in Appendix A.1.\n\n6\n\nProposition 1. If 1 \u21e2ij maxnC0q \u00b5ij r\n\nn log n, 1\n\nn3o, PCP yields a unique solution which agrees\n\n\fProposition 1 suggests that it suf\ufb01ces to prove Theorem 1 if we \ufb01nd a dual certi\ufb01cate Y that satis\ufb01es\nthe dual certi\ufb01cate conditions (9)-(12). Thus, the second step is to construct Y via the gol\ufb01ng\nscheme. Although we adapt the steps in [12] to construct the dual certi\ufb01cate Y , our analysis requires\nnew technical development based on local incoherence parameters. Recall the following de\ufb01nitions\nin Section 2.1: P((i, j) 2 \u2326) = \u21e2ij and P((i, j) 2 ) = pij, where =\u2326 c and pij = 1 \u21e2ij.\nConsider the gol\ufb01ng scheme with nonuniform sizes as suggested in [12] to establish bounds with\nfewer log factors. Let = 1 [ 2 [\u00b7\u00b7\u00b7[ l, where {k} are independent random sets given by\n\nP((i, j) 2 1) =\n\n,\n\nP((i, j) 2 k) = qij,\n\nfor k = 2,\u00b7\u00b7\u00b7 , l.\n\npij\n6\n\nThus, if \u21e2ij = (1 pij\nbetween {k}, we have qij 5\nthe following iterative way:\n\n6\n\npij\n\n6 )(1 qij)l1, the two sampling strategies are equivalent. Due to the overlap\nl1. We set l = b5 log n + 1c and construct a dual certi\ufb01cate Y in\n(13)\n(14)\n\nfor k = 1,\u00b7\u00b7\u00b7 , l\n\nZ0 = PT (U V \u21e4 sgn(S))\nZk = (PT P TRkPT )Zk1,\nY =\n\nRk Zk1.\n\nlXk=1\n\n(15)\n\nIt is then suf\ufb01cient to show that such constructed Y satis\ufb01es the dual certi\ufb01cate conditions (9)-(12).\nCondition (9) is due to the construction of Y . Condition (12) can be shown by a concentration\nproperty of each iteration step (14) with k\u00b7k F characterized in Lemma 3 in Appendix A.1. In order\nto show that Y satis\ufb01es conditions (10) and (11), we introduce the following weighted norm. Let\n\u02c6wij =q \u00b5ij r\nn2 and wij = max{ \u02c6wij,\u270f}, where \u270f is the smallest nonzero \u02c6wij. Here \u270f is introduced to\n\navoid singularity. Then for any matrix Z, de\ufb01ne\n\nkZkw(1) = max\n\ni,j\n\n|Zij|\nwij\n\n.\n\n(16)\n\nIt is easy to verify k\u00b7k w(1) is a well de\ufb01ned norm. We can then show that each iteration step (14)\nwith k\u00b7k and k\u00b7k w(1) norms satis\ufb01es two concentration properties characterized respectively in\nLemmas 4 and 5, which are essential to prove conditions (10) and (11).\n3 Numerical Experiments\nIn this section, we provide numerical experiments to demonstrate our theoretical results. In these\nexperiments, we adopt an augmented Lagrange multiplier algorithm in [17] to solve the PCP. We\nset = 1/pn log n. A trial of PCP (for a given realization of error locations) is declared to be\nsuccessful if \u02c6L recovered by PCP satis\ufb01es k \u02c6L LkF /kLkF \uf8ff 103.\nWe apply the following three models to construct the low rank matrix L.\n\n\u2022 Bernoulli model: L = XX\u21e4 where X is n\u21e5 r matrix with entries independently taking values\n+1/pn and 1/pn equally likely.\n\u2022 Gaussian model: L = XX\u21e4, where X is n \u21e5 r matrix with entries independently sampled\nfrom Gaussian distribution N (0, 1/n).\n\u2022 Cluster model: L is a block diagonal matrix with r equal-size blocks containing all \u20181\u2019s.\n\nIn order to demonstrate that the local incoherence parameter affects local robustness to error corrup-\ntions, we study the following two types of error corruption models.\n\nS = sgn(S).\n\n\u2022 Uniform error corruption: sgn(Sij) is generated as (6) with \u21e2ij = \u21e2 for all i, j 2 [n], and\n\u2022 Adaptive error corruption: sgn(Sij) is generated as (6) with \u21e2ij = \u21e2 n2p1/\u00b5ij\nfor all i, j 2\nPij p1/\u00b5ij\n\n[n], and S = sgn(S).\n\nIt is clear in both cases, the error matrix has the same average error corruption percentage \u21e2, but in\nadaptive error corruption, the local error corruption probability is adaptive to the local incoherence.\nOur \ufb01rst experiment demonstrates that robustness of PCP to error corruption not only depends on\nthe number of errors but also depends on how errors are distributed over the matrix. For all three\n\n7\n\n\f \n\n \n\n \n\nuniform error\nadaptive error\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\ny\nc\nn\ne\nu\nq\ne\nr\nf\n \n\ne\nr\nu\n\nl\ni\n\na\nF\n\nuniform error\nadaptive error\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\ny\nc\nn\ne\nu\nq\ne\nr\nf\n \n\ne\nr\nu\n\nl\ni\n\na\nF\n\nuniform noise\nadaptive noise\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\ny\nc\nn\ne\nu\nq\ne\nr\nf\n \n\ne\nr\nu\n\nl\ni\n\na\nF\n\n0\n \n0\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\n0.6\n\n0.7\n\n0\n \n0\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\n0.6\n\n0.7\n\n0\n \n0\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\n0.6\n\n0.7\n\nError percentage \u03c1\n\n(a) Bernoulli model\n\nError percentage \u03c1\n\n(b) Gaussian model\n\nError percentage \u03c1\n\n(c) Cluster model\n\nFigure 2: Recovery failure of PCP versus error corruption percentage.\n\nlow rank matrix models, we set n = 1200 and rank r = 10. For each low rank matrix model,\nwe apply the uniform and adaptive error matrices, and plot the failure frequency of PCP versus the\nerror corruption percentage \u21e2 in Fig. 2. For each value of \u21e2, we perform 50 trials of independent\nerror corruption and count the number of failures of PCP. Each plot of Fig. 2 compares robustness\nof PCP to uniform error corruption (the red square line) and adaptive error corruption (the blue\ncircle line). We observe that PCP can tolerate more errors in the adaptive case. This is because the\nadaptive error matrix is distributed based on the local incoherence parameter, where error density is\nhigher in areas where matrices can tolerate more errors. Furthermore, comparison among the three\nplots in Fig. 2 illustrates that the gap between uniform and adaptive error matrices is the smallest\nfor Bernoulli model and the largest for cluster model. Our theoretic results suggest that the gap is\ndue to the variation of the local incoherence parameter across the matrix, which can be measured\nby the variance of \u00b5ij. Larger variance of \u00b5ij should yield larger gap. Our numerical calculation\nof the variances for three models yield Var(\u00b5Bernoulli) = 1.2109, Var(\u00b5Gaussian) = 2.1678, and\nVar(\u00b5cluster) = 7.29, which con\ufb01rms our explanation.\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n\u03c1\n \n \ne\ng\na\nt\nn\ne\nc\nr\ne\np\n \nr\no\nr\nr\n\nE\n\n \n\n0\n0\n\n20\n\n40\n\n \n\nuniform error\nadaptive error\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n\u03c1\n \n \ne\ng\na\nt\nn\ne\nc\nr\ne\np\n \nr\no\nr\nr\n\nE\n\n60\n\n80\n\n100\n\n \n\n0\n0\n\n10\n\n20\n\nrank r\n\n(a) Bernoulli model\n\n30\n\nrank r\n\n(b) Gaussian model\n\n \n\nuniform error\nadative error\n\n40\n\n50\n\n60\n\n\u03c1\n \ne\ng\na\nt\nn\ne\nc\nr\ne\np\n \nr\no\nr\nr\n\nE\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n \n\n0\n2\n\n \n\nuniform error\nadaptive error\n\n4\n\n6\n\n8\n\n10\n\n12\n\n14\n\nrank r\n\n(c) Cluster model\n\nFigure 3: Largest allowable error corruption percentage versus rank of L so that PCP yields correct\nrecovery.\nWe next study the phase transition in rank and error corruption probability. For the three low-rank\nmatrix models, we set n = 1200. In Fig. 3, we plot the error corruption percentage versus the rank\nof L for both uniform and adaptive error corruption models. Each point on the curve records the\nmaximum allowable error corruption percentage under the corresponding rank such that PCP yields\ncorrection recovery. We count a (r, \u21e2) pair to be successful if nine trials out of ten are successful.\nWe \ufb01rst observe that in each plot of Fig. 3, PCP is more robust in adaptive error corruption due to\nthe same reason explained above. We further observe that the gap between the uniform and adaptive\nerror corruption changes as the rank changes. In the low-rank regime, the gap is largely determined\nby the variance of incoherence parameter \u00b5ij as we argued before. As the rank increases, the gap is\nmore dominated by the rank and less affected by the local incoherence. Eventually for large enough\nrank, no error can be tolerated no matter how errors are distributed.\n4 Conclusion\nWe characterize re\ufb01ned conditions under which PCP succeeds to solve the robust PCA problem.\nOur result shows that the ability of PCP to correctly recover a low-rank matrix from errors is related\nnot only to the total number of corrupted entries but also to locations of corrupted entries, more\nessentially to the local incoherence of the low rank matrix. Such result is well supported by our\nnumerical experiments. Moreover, our result has rich implication when the low rank matrix is a\ncluster matrix, and our result coincides with state-of-the-art studies on clustering problems via low\nrank cluster matrix. Our result may motivate the development of weighted PCP to improve recovery\nperformance similar to the weighted algorithms developed for matrix completion in [9, 18].\n\n8\n\n\fReferences\n[1] E. J. Cand`es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM\n\n(JACM), 58(3):11, 2011.\n\n[2] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix\n\ndecomposition. SIAM Journal on Optimization, 21(2):572\u2013596, 2011.\n\n[3] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis. Low-rank matrix recovery from errors and erasures.\n\nIEEE Transactions on Information Theory, 59(7):4324\u20134337, 2013.\n\n[4] Y. Chen.\n\n61(5):2909\u20132923, May 2015.\n\nIncoherence-optimal matrix completion.\n\nIEEE Transactions on Information Theory,\n\n[5] E. J. Cand`es and B. Recht. Exact matrix completion via convex optimization. Foundations of Computa-\n\ntional mathematics, 9(6):717\u2013772, 2009.\n\n[6] E. J. Cand`es and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Trans-\n\nactions on Information Theory, 56(5):2053\u20132080, 2010.\n\n[7] D. Gross. Recovering low-rank matrices from few coef\ufb01cients in any basis.\n\nInformation Theory, 57(3):1548\u20131566, 2011.\n\nIEEE Transactions on\n\n[8] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via\n\nnuclear norm minimization. SIAM Review, 52(3):471\u2013501, 2010.\n\n[9] Y. Chen, S. Bhojanapalli, S. Sanghavi, and R. Ward. Completing any low-rank matrix, provably. arXiv\n\npreprint arXiv:1306.2979, 2013.\n\n[10] D. Hsu, S. M. Kakade, and T. Zhang. Robust matrix decomposition with sparse corruptions.\n\nTransactions on Information Theory, 57(11):7221\u20137234, 2011.\n\nIEEE\n\n[11] A. Ganesh, J. Wright, X. Li, E. J. Candes, and Y. Ma. Dense error correction for low-rank matrices\nvia principal component pursuit. In IEEE International Symposium on Information Theory (ISIT), pages\n1513\u20131517, Austin, TX, US, June 2010.\n\n[12] X. Li. Compressed sensing and matrix completion with constant proportion of corruptions. Constructive\n\nApproximation, 37(1):73\u201399, 2013.\n\n[13] S. Oymak and B. Hassibi. Finding dense clusters via \u201clow rank+ sparse\u201d decomposition. arXiv preprint\n\narXiv:1104.5186, 2011.\n\n[14] Y. Chen, S. Sanghavi, and H. Xu. Clustering sparse graphs. In Advances in Neural Information Processing\n\nSystems (NIPS), pages 2204\u20132212, Lake Tahoe, Nevada, US, December 2012.\n\n[15] Y. Chen, S. Sanghavi, and H. Xu. Improved graph clustering. IEEE Transactions on Information Theory,\n\n60(10):6440\u20136455, Oct 2014.\n\n[16] Y. Chen, A. Jalali, S. Sanghavi, and H. Xu. Clustering partially observed graphs via convex optimization.\n\nJournal of Machine Learning Research, 15(1):2213\u20132238, 2014.\n\n[17] Z. Lin, M. Chen, and Y. Ma. The augmented lagrange multiplier method for exact recovery of corrupted\n\nlow-rank matrices. arXiv preprint arXiv:1009.5055, 2010.\n\n[18] N. Srebro and R. R. Salakhutdinov. Collaborative \ufb01ltering in a non-uniform world: Learning with the\nweighted trace norm. In Advances in Neural Information Processing Systems (NIPS), pages 2056\u20132064,\nHyatt Regency, Vancouver, Canada, 2010. December.\n\n[19] R. Vershynin.\n\nIntroduction to the non-asymptotic analysis of random matrices.\n\narXiv:1011.3027, 2010.\n\narXiv preprint\n\n[20] J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational\n\nMathematics, 12(4):389\u2013434, 2012.\n\n9\n\n\f", "award": [], "sourceid": 1079, "authors": [{"given_name": "Huishuai", "family_name": "Zhang", "institution": "Syracuse University"}, {"given_name": "Yi", "family_name": "Zhou", "institution": "Syracuse University"}, {"given_name": "Yingbin", "family_name": "Liang", "institution": "Syracuse Univeristy"}]}