{"title": "A New Theory for Matrix Completion", "book": "Advances in Neural Information Processing Systems", "page_first": 785, "page_last": 794, "abstract": "Prevalent matrix completion theories reply on an assumption that the locations of the missing data are distributed uniformly and randomly (i.e., uniform sampling). Nevertheless, the reason for observations being missing often depends on the unseen observations themselves, and thus the missing data in practice usually occurs in a nonuniform and deterministic fashion rather than randomly. To break through the limits of random sampling, this paper introduces a new hypothesis called \\emph{isomeric condition}, which is provably weaker than the assumption of uniform sampling and arguably holds even when the missing data is placed irregularly. Equipped with this new tool, we prove a series of theorems for missing data recovery and matrix completion. In particular, we prove that the exact solutions that identify the target matrix are included as critical points by the commonly used nonconvex programs. Unlike the existing theories for nonconvex matrix completion, which are built upon the same condition as convex programs, our theory shows that nonconvex programs have the potential to work with a much weaker condition. Comparing to the existing studies on nonuniform sampling, our setup is more general.", "full_text": "A New Theory for Matrix Completion\n\nGuangcan Liu\u2217\n\nQingshan Liu\u2020\n\nXiao-Tong Yuan\u2021\n\nB-DAT, School of Information & Control, Nanjing Univ Informat Sci & Technol\n\nNO 219 Ningliu Road, Nanjing, Jiangsu, China, 210044\n\n{gcliu,qsliu,xtyuan}@nuist.edu.cn\n\nAbstract\n\nPrevalent matrix completion theories reply on an assumption that the locations of\nthe missing data are distributed uniformly and randomly (i.e., uniform sampling).\nNevertheless, the reason for observations being missing often depends on the unseen\nobservations themselves, and thus the missing data in practice usually occurs in a\nnonuniform and deterministic fashion rather than randomly. To break through the\nlimits of random sampling, this paper introduces a new hypothesis called isomeric\ncondition, which is provably weaker than the assumption of uniform sampling and\narguably holds even when the missing data is placed irregularly. Equipped with\nthis new tool, we prove a series of theorems for missing data recovery and matrix\ncompletion. In particular, we prove that the exact solutions that identify the target\nmatrix are included as critical points by the commonly used nonconvex programs.\nUnlike the existing theories for nonconvex matrix completion, which are built\nupon the same condition as convex programs, our theory shows that nonconvex\nprograms have the potential to work with a much weaker condition. Comparing to\nthe existing studies on nonuniform sampling, our setup is more general.\n\nIntroduction\n\n1\nMissing data is a common occurrence in modern applications such as computer vision and image\nprocessing, reducing signi\ufb01cantly the representativeness of data samples and therefore distorting\nseriously the inferences about data. Given this pressing situation, it is crucial to study the problem\nof recovering the unseen data from a sampling of observations. Since the data in reality is often\norganized in matrix form, it is of considerable practical signi\ufb01cance to study the well-known problem\nof matrix completion [1] which is to \ufb01ll in the missing entries of a partially observed matrix.\nProblem 1.1 (Matrix Completion). Denote the (i, j)th entry of a matrix as [\u00b7]ij. Let L0 \u2208 Rm\u00d7n be\nan unknown matrix of interest. In particular, the rank of L0 is unknown either. Given a sampling of\nthe entries in L0 and a 2D index set \u2126 \u2286 {1, 2,\u00b7\u00b7\u00b7 , m} \u00d7 {1, 2,\u00b7\u00b7\u00b7 , n} consisting of the locations\nof the observed entries, i.e., given\n\n{[L0]ij|(i, j) \u2208 \u2126}\n\nand \u2126,\n\ncan we restore the missing entries whose indices are not included in \u2126, in an exact and scalable\nfashion? If so, under which conditions?\n\nDue to its unique role in a broad range of applications, e.g., structure from motion and magnetic\nresonance imaging, matrix completion has received extensive attentions in the literatures, e.g., [2\u201313].\n\u2217The work of Guangcan Liu is supported in part by national Natural Science Foundation of China (NSFC)\nunder Grant 61622305 and Grant 61502238, in part by Natural Science Foundation of Jiangsu Province of China\n(NSFJPC) under Grant BK20160040.\n\n\u2020The work of Qingshan Liu is supported by NSFC under Grant 61532009.\n\u2021The work of Xiao-Tong Yuan is supported in part by NSFC under Grant 61402232 and Grant 61522308, in\n\npart by NSFJPC under Grant BK20141003.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFigure 1: Left and Middle: Typical con\ufb01gurations for the locations of the observed entries. Right: A\nreal example from the Oxford motion database. The black areas correspond to the missing entries.\n\nIn general, given no presumption about the nature of matrix entries, it is virtually impossible to\nrestore L0 as the missing entries can be of arbitrary values. That is, some assumptions are necessary\nfor solving Problem 1.1. Based on the high-dimensional and massive essence of today\u2019s data-driven\ncommunity, it is arguable that the target matrix L0 we wish to recover is often low rank [23]. Hence,\none may perform matrix completion by seeking a matrix with the lowest rank that also satis\ufb01es the\nconstraints given by the observed entries:\ns.t.\n\n[L]ij = [L0]ij,\u2200(i, j) \u2208 \u2126.\n\nrank (L) ,\n\nmin\n\n(1)\n\nL\n\nUnfortunately, this idea is of little practical because the problem above is NP-hard and cannot be\nsolved in polynomial time [15]. To achieve practical matrix completion, Cand\u00e8s and Recht [4]\nsuggested to consider an alternative that minimizes instead the nuclear norm which is a convex\nenvelope of the rank function [12]. Namely,\n\n(cid:107)L(cid:107)\u2217,\n\n[L]ij = [L0]ij,\u2200(i, j) \u2208 \u2126,\n\nL\n\ns.t.\n\nmin\n\n(2)\nwhere (cid:107) \u00b7 (cid:107)\u2217 denotes the nuclear norm, i.e., the sum of the singular values of a matrix. Rather\nsurprisingly, it is proved in [4] that the missing entries, with high probability, can be exactly restored\nby the convex program (2), as long as the target matrix L0 is low rank and incoherent and the set \u2126 of\nlocations corresponding to the observed entries is a set sampled uniformly at random. This pioneering\nwork provides people several useful tools to investigate matrix completion and many other related\nproblems. Those assumptions, including low-rankness, incoherence and uniform sampling, are now\nstandard and widely used in the literatures, e.g., [14, 17, 22, 24, 28, 33, 34, 36]. In particular, the\nanalyses in [17, 33, 36] show that, in terms of theoretical completeness, many nonconvex optimization\nbased methods are as powerful as the convex program (2). Unfortunately, these theories still depend\non the assumption of uniform sampling, and thus they cannot explain why there are many nonconvex\nmethods which often do better than the convex program (2) in practice.\nThe missing data in practice, however, often occurs in a nonuniform and deterministic fashion instead\nof randomly. This is because the reason for an observation being missing usually depends on the\nunseen observations themselves. For example, in structure from motion and magnetic resonance\nimaging, typically the locations of the observed entries are concentrated around the main diagonal of\na matrix4, as shown in Figure 1. Moreover, as pointed out by [19, 21, 23], the incoherence condition\nis indeed not so consistent with the mixture structure of multiple subspaces, which is also a ubiquitous\nphenomenon in practice. There has been sparse research in the direction of nonuniform sampling,\ne.g., [18, 25\u201327, 31]. In particular, Negahban and Wainwright [26] studied the case of weighted\nentrywise sampling, which is more general than the setup of uniform sampling but still a special\nform of random sampling. Kir\u00e1ly et al. [18] considered deterministic sampling and is most related to\nthis work. However, they had only established conditions to decide whether a particular entry of the\nmatrix can be restored. In other words, the setup of [18] may not handle well the dependence among\nthe missing entries. In summary, matrix completion still starves for practical theories and methods,\nalthough has attained considerable improvements in these years.\nTo break through the limits of the setup of random sampling, in this paper we introduce a new\nhypothesis called isomeric condition, which is a mixed concept that combines together the rank and\ncoherence of L0 with the locations and amount of the observed entries. In general, isomerism (noun\n\n4This statement means that the observed entries are concentrated around the main diagonal after a permutation\n\nof the sampling pattern \u2126.\n\n2\n\n\fof isomeric) is a very mild hypothesis and only a little bit more strict than the well-known oracle\nassumption; that is, the number of observed entries in each row and column of L0 is not smaller than\nthe rank of L0. It is arguable that the isomeric condition can hold even when the missing entries have\nirregular locations. In particular, it is provable that the widely used assumption of uniform sampling\nis suf\ufb01cient to ensure isomerism, not necessary. Equipped with this new tool, isomerism, we prove a\nset of theorems pertaining to missing data recovery [35] and matrix completion. For example, we\nprove that, under the condition of isomerism, the exact solutions that identify the target matrix are\nincluded as critical points by the commonly used bilinear programs. This result helps to explain the\nwidely observed phenomenon that there are many nonconvex methods performing better than the\nconvex program (2) on real-world matrix completion tasks. In summary, the contributions of this\npaper mainly include:\n\n(cid:5) We invent a new hypothesis called isomeric condition, which provably holds given the\nstandard assumptions of uniform sampling, low-rankness and incoherence. In addition,\nwe also exemplify that the isomeric condition can hold even if the target matrix L0 is not\nincoherent and the missing entries are placed irregularly. Comparing to the existing studies\nabout nonuniform sampling, our setup is more general.\n(cid:5) Equipped with the isomeric condition, we prove that the exact solutions that identify L0\nare included as critical points by the commonly used bilinear programs. Comparing to the\nexisting theories for nonconvex matrix completion, our theory is built upon a much weaker\nassumption and can therefore partially reveal the superiorities of nonconvex programs over\nthe convex methods based on (2).\n(cid:5) We prove that the isomeric condition is suf\ufb01cient and necessary for the column and row\nprojectors of L0 to be invertible given the sampling pattern \u2126. This result implies that\nthe isomeric condition is necessary for ensuring that the minimal rank solution to (1) can\nidentify the target L0.\n\nThe rest of this paper is organized as follows. Section 2 summarizes the mathematical notations used\nin the paper. Section 3 introduces the proposed isomeric condition, along with some theorems for\nmatrix completion. Section 4 shows some empirical results and Section 5 concludes this paper. The\ndetailed proofs to all the proposed theorems are presented in the Supplementary Materials.\n\n2 Notations\nCapital and lowercase letters are used to represent matrices and vectors, respectively, except that the\nlowercase letters, i, j, k, m, n, l, p, q, r, s and t, are used to denote some integers, e.g., the location of\nan observation, the rank of a matrix, etc. For a matrix M, [M ]ij is its (i, j)th entry, [M ]i,: is its ith row\nand [M ]:,j is its jth column. Let \u03c91 and \u03c92 be two 1D index sets; namely, \u03c91 = {i1, i2,\u00b7\u00b7\u00b7 , ik} and\n\u03c92 = {j1, j2,\u00b7\u00b7\u00b7 , js}. Then [M ]\u03c91,: denotes the submatrix of M obtained by selecting the rows with\nindices i1, i2,\u00b7\u00b7\u00b7 , ik, [M ]:,\u03c92 is the submatrix constructed by choosing the columns j1, j2,\u00b7\u00b7\u00b7 , js,\nand similarly for [M ]\u03c91,\u03c92. For a 2D index set \u2126 \u2286 {1, 2,\u00b7\u00b7\u00b7 , m} \u00d7 {1, 2,\u00b7\u00b7\u00b7 , n}, we imagine it\nas a sparse matrix and, accordingly, de\ufb01ne its \u201crows\u201d, \u201ccolumns\u201d and \u201ctranspose\u201d as follows: The\nith row \u2126i = {j1|(i1, j1) \u2208 \u2126, i1 = i}, the jth column \u2126j = {i1|(i1, j1) \u2208 \u2126, j1 = j} and the\ntranspose \u2126T = {(j1, i1)|(i1, j1) \u2208 \u2126}.\nThe special symbol (\u00b7)+ is reserved to denote the Moore-Penrose pseudo-inverse of a matrix. More\nprecisely, for a matrix M with Singular Value Decomposition (SVD)5 M = UM \u03a3M V T\nM , its pseudo-\ninverse is given by M + = VM \u03a3\u22121\nM . For convenience, we adopt the conventions of using\nspan{M} to denote the linear space spanned by the columns of a matrix M, using y \u2208 span{M} to\ndenote that a vector y belongs to the space span{M}, and using Y \u2208 span{M} to denote that all the\ncolumn vectors of a matrix Y belong to span{M}.\nCapital letters U, V , \u2126 and their variants (complements, subscripts, etc.) are reserved for left singular\nvectors, right singular vectors and index set, respectively. For convenience, we shall abuse the\nnotation U (resp. V ) to denote the linear space spanned by the columns of U (resp. V ), i.e., the\ncolumn space (resp. row space). The orthogonal projection onto the column space U, is denoted by\nPU and given by PU (M ) = U U T M, and similarly for the row space PV (M ) = M V V T . The same\n5In this paper, SVD always refers to skinny SVD. For a rank-r matrix M \u2208 Rm\u00d7n, its SVD is of the form\n\nM U T\n\nUM \u03a3M V T\n\nM , where UM \u2208 Rm\u00d7r, \u03a3M \u2208 Rr\u00d7r and VM \u2208 Rn\u00d7r.\n\n3\n\n\f(cid:26) [M ]ij,\n\n0,\n\nif (i, j) \u2208 \u2126,\notherwise.\n\n[P\u2126(M )]ij =\n\nnotation is also used to represent a subspace of matrices (i.e., the image of an operator), e.g., we say\nthat M \u2208 PU for any matrix M which satis\ufb01es PU (M ) = M. We shall also abuse the notation \u2126\nto denote the linear space of matrices supported on \u2126. Then the symbol P\u2126 denotes the orthogonal\nprojection onto \u2126, namely,\n\n\u2126 = I, where I is the identity operator.\n\n\u2126 denotes the orthogonal projection onto the complement space of \u2126. That\n\nSimilarly, the symbol P\u22a5\nis, P\u2126 + P\u22a5\nThree types of matrix norms are used in this paper, and they are all functions of the singular values:\n1) The operator norm or 2-norm (i.e., largest singular value) denoted by (cid:107)M(cid:107), 2) the Frobenius norm\n(i.e., square root of the sum of squared singular values) denoted by (cid:107)M(cid:107)F and 3) the nuclear norm\nor trace norm (i.e., sum of singular values) denoted by (cid:107)M(cid:107)\u2217. The only used vector norm is the (cid:96)2\nnorm, which is denoted by (cid:107) \u00b7 (cid:107)2. The symbol | \u00b7 | is reserved for the cardinality of an index set.\n\nIsomeric Condition and Matrix Completion\n\n3\nThis section introduces the proposed isomeric condition and a set of theorems for matrix completion.\nBut most of the detailed proofs are deferred until the Supplementary Materials.\n\nIsomeric Condition\n\n3.1\nIn general cases, as aforementioned, matrix completion is an ill-posed problem. Thus, some assump-\ntions are necessary for studying Problem 1.1. To eliminate the disadvantages of the setup of random\nsampling, we de\ufb01ne and investigate a so-called isomeric condition.\n\n3.1.1 De\ufb01nitions\nFor the ease of understanding, we shall begin with a concept called k-isomerism (or k-isomeric in\nadjective form), which could be regarded as an extension of low-rankness.\nDe\ufb01nition 3.1 (k-isomeric). A matrix M \u2208 Rm\u00d7l is called k-isomeric if and only if any k rows of\nM can linearly represent all rows in M. That is,\n\nrank ([M ]\u03c9,:) = rank (M ) ,\u2200\u03c9 \u2286 {1, 2,\u00b7\u00b7\u00b7 , m},|\u03c9| = k,\n\nwhere | \u00b7 | is the cardinality of an index set.\nIn general, k-isomerism is somewhat similar to Spark [37] which de\ufb01nes the smallest linearly\ndependent subset of the rows of a matrix. For a matrix M to be k-isomeric, it is necessary that\nrank (M ) \u2264 k, not suf\ufb01cient. In fact, k-isomerism is also somehow related to the concept of\ncoherence [4, 21]. When the coherence of a matrix M \u2208 Rm\u00d7l is not too high, the rows of M will\nsuf\ufb01ciently spread, and thus M could be k-isomeric with a small k, e.g., k = rank (M ). Whenever\nthe coherence of M is very high, one may need a large k to satisfy the k-isomeric property. For\nexample, consider an extreme case where M is a rank-1 matrix with one row being 1 and everywhere\nelse being 0. In this case, we need k = m to ensure that M is k-isomeric.\nWhile De\ufb01nition 3.1 involves all 1D index sets of cardinality k, we often need the isomeric property\nto be associated with a certain 2D index set \u2126. To this end, we de\ufb01ne below a concept called\n\u2126-isomerism (or \u2126-isomeric in adjective form).\nDe\ufb01nition 3.2 (\u2126-isomeric). Let M \u2208 Rm\u00d7l and \u2126 \u2286 {1, 2, \u00b7\u00b7\u00b7 , m} \u00d7 {1, 2,\u00b7\u00b7\u00b7 , n}. Suppose\nthat \u2126j (cid:54)= \u2205 (empty set), \u22001 \u2264 j \u2264 n. Then the matrix M is called \u2126-isomeric if and only if\n\n(cid:1) = rank (M ) ,\u2200j = 1, 2,\u00b7\u00b7\u00b7 , n.\n\nrank(cid:0)[M ]\u2126j ,:\n\nNote here that only the number of rows in M is required to coincide with the row indices included in\n\u2126, and thereby l (cid:54)= n is allowable.\nGenerally, \u2126-isomerism is less strict than k-isomerism. Provided that |\u2126j| \u2265 k,\u22001 \u2264 j \u2264 n, a matrix\nM is k-isomeric ensures that M is \u2126-isomeric as well, but not vice versa. For the extreme example\nwhere M is nonzero at only one row, interestingly, M can be \u2126-isomeric as long as the locations of\nthe nonzero elements are included in \u2126.\nWith the notation of \u2126T = {(j1, i1)|(i1, j1) \u2208 \u2126}, the isomeric property could be also de\ufb01ned on\nthe column vectors of a matrix, as shown in the following de\ufb01nition.\n\n4\n\n\fDe\ufb01nition 3.3 (\u2126/\u2126T -isomeric). Let M \u2208 Rm\u00d7n and \u2126 \u2286 {1, 2, \u00b7\u00b7\u00b7 , m}\u00d7{1, 2,\u00b7\u00b7\u00b7 , n}. Suppose\n\u2126i (cid:54)= \u2205 and \u2126j (cid:54)= \u2205, \u2200i = 1,\u00b7\u00b7\u00b7 , m, j = 1,\u00b7\u00b7\u00b7 , n. Then the matrix M is called \u2126/\u2126T -isomeric if\nand only if M is \u2126-isomeric and M T is \u2126T -isomeric as well.\n\nTo solve Problem 1.1 without the imperfect assumption of missing at random, as will be shown later,\nwe need to assume that L0 is \u2126/\u2126T -isomeric. This condition has excluded the unidenti\ufb01able cases\nwhere any rows or columns of L0 are wholly missing. In fact, whenever L0 is \u2126/\u2126T -isomeric, the\nnumber of observed entries in each row and column of L0 has to be greater than or equal to the rank\nof L0; this is consistent with the results in [20]. Moreover, \u2126/\u2126T -isomerism has actually well treated\nthe cases where L0 is of high coherence. For example, consider an extreme case where L0 is 1 at only\none element and 0 everywhere else. In this case, L0 cannot be \u2126/\u2126T -isomeric unless the nonzero\nelement is observed. So, generally, it is possible to restore the missing entries of a highly coherent\nmatrix, as long as the \u2126/\u2126T -isomeric condition is obeyed.\n\n3.1.2 Basic Properties\nWhile its de\ufb01nitions are associated with a certain matrix, the isomeric condition is actually character-\nizing some properties of a space, as shown in the lemma below.\nLemma 3.1. Let L0 \u2208 Rm\u00d7n and \u2126 \u2286 {1, 2,\u00b7\u00b7\u00b7 , m} \u00d7 {1, 2,\u00b7\u00b7\u00b7 , n}. Denote the SVD of L0 as\nU0\u03a30V T\n\n0 . Then we have:\n1. L0 is \u2126-isomeric if and only if U0 is \u2126-isomeric.\n2. LT\n\n0 is \u2126T -isomeric if and only if V0 is \u2126T -isomeric.\n\nProof. It could be manipulated that\n\nSince \u03a30V T\n\n0 is row-wisely full rank, we have\n\n0 ,\u2200j = 1,\u00b7\u00b7\u00b7 , n.\n\n(cid:1) ,\u2200j = 1,\u00b7\u00b7\u00b7 , n.\n\n[L0]\u2126j ,: = ([U0]\u2126j ,:)\u03a30V T\n\nrank(cid:0)[L0]\u2126j ,:\n\n(cid:1) = rank(cid:0)[U0]\u2126j ,:\n\nAs a result, L0 is \u2126-isomeric is equivalent to U0 is \u2126-isomeric. In a similar way, the second claim is\nproved as well.\n\nIt is easy to see that the above lemma is still valid even when the condition of \u2126-isomerism is replaced\nby k-isomerism. Thus, hereafter, we may say that a space is isomeric (k-isomeric, \u2126-isomeric or\n\u2126T -isomeric) as long as its basis matrix is isomeric. In addition, the isomeric property is subspace\nsuccessive, as shown in the next lemma.\nLemma 3.2. Let \u2126 \u2286 {1, 2,\u00b7\u00b7\u00b7 , m} \u00d7 {1, 2,\u00b7\u00b7\u00b7 , n} and U0 \u2208 Rm\u00d7r be the basis matrix of a\nEuclidean subspace embedded in Rm. Suppose that U is a subspace of U0, i.e., U = U0U T\n0 U. If U0\nis \u2126-isomeric then U is \u2126-isomeric as well.\n\nProof. By U = U0U T\n\n0 U and U0 is \u2126-isomeric,\n\nrank(cid:0)[U ]\u2126j ,:\n= rank(cid:0)U0U T\n\n(cid:1) = rank(cid:0)([U0]\u2126j ,:)U T\n0 U(cid:1) = rank (U ) ,\u22001 \u2264 j \u2264 n.\n\n0 U(cid:1) = rank(cid:0)U T\n0 U(cid:1)\n\nThe above lemma states that, in one word, the subspace of an isomeric space is isomeric.\n\n3.1.3 Important Properties\nAs aforementioned, the isometric condition is actually necessary for ensuring that the minimal rank\nsolution to (1) can identify L0. To see why, let\u2019s assume that U0 \u2229 \u2126\u22a5 (cid:54)= {0}, where we denote by\n0 the SVD of L0. Then one could construct a nonzero perturbation, denoted as \u2206 \u2208 U0 \u2229 \u2126\u22a5,\nU0\u03a30V T\nand accordingly, obtain a feasible solution \u02dcL0 = L0 + \u2206 to the problem in (1). Since \u2206 \u2208 U0, we\nhave rank( \u02dcL0) \u2264 rank (L0). Even more, it is entirely possible that rank( \u02dcL0) < rank (L0). Such\na case is unidenti\ufb01able in nature, as the global optimum to problem (1) cannot identify L0. Thus,\n\n5\n\n\fto ensure that the global minimum to (1) can identify L0, it is essentially necessary to show that\nU0 \u2229 \u2126\u22a5 = {0} (resp. V0 \u2229 \u2126\u22a5 = {0}), which is equivalent to the operator PU0P\u2126PU0 (resp.\nPV0P\u2126PV0) is invertible (see Lemma 6.8 of the Supplementary Materials). Interestingly, the isomeric\ncondition is indeed a suf\ufb01cient and necessary condition for the operators PU0P\u2126PU0 and PV0P\u2126PV0\nto be invertible, as shown in the following theorem.\nTheorem 3.1. Let L0 \u2208 Rm\u00d7n and \u2126 \u2286 {1, 2,\u00b7\u00b7\u00b7 , m} \u00d7 {1, 2,\u00b7\u00b7\u00b7 , n}. Let the SVD of L0 be\nU0\u03a30V T\n\n0 . Denote PU0 (\u00b7) = U0U T\n1. The linear operator PU0P\u2126PU0 is invertible if and only if U0 is \u2126-isomeric.\n2. The linear operator PV0P\u2126PV0 is invertible if and only if V0 is \u2126T -isomeric.\n\n0 (\u00b7) and PV0(\u00b7) = (\u00b7)V0V T\n\n0 . Then we have the following:\n\nThe necessity stated above implies that the isomeric condition is actually a very mild hypothesis. In\ngeneral, there are numerous reasons for the target matrix L0 to be isomeric. Particularly, the widely\nused assumptions of low-rankness, incoherence and uniform sampling are indeed suf\ufb01cient (but not\nnecessary) to ensure isomerism, as shown in the following theorem.\nTheorem 3.2. Let L0 \u2208 Rm\u00d7n and \u2126 \u2286 {1, 2,\u00b7\u00b7\u00b7 , m} \u00d7 {1, 2,\u00b7\u00b7\u00b7 , n}. Denote n1 = max(m, n)\nand n2 = min(m, n). Suppose that L0 is incoherent and \u2126 is a 2D index set sampled uniformly\nat random, namely Pr((i, j) \u2208 \u2126) = \u03c10 and Pr((i, j) /\u2208 \u2126) = 1 \u2212 \u03c10. For any \u03b4 > 0, if \u03c10 > \u03b4\nis obeyed and rank (L0) < \u03b4n2/(c log n1) holds for some numerical constant c then, with high\nprobability at least 1 \u2212 n\u221210\nIt is worth noting that the isomeric condition can be obeyed in numerous circumstances other than\nthe case of uniform sampling plus incoherence. For example,\n\n, L0 is \u2126/\u2126T -isomeric.\n\n1\n\n\u2126 = {(1, 1), (1, 2), (1, 3), (2, 1), (3, 1)} and L0 =\n\n(cid:34) 1\n\n0\n0\n\n(cid:35)\n\n,\n\n0\n0\n0\n\n0\n0\n0\n\nwhere L0 is a 3\u00d73 matrix with 1 at (1, 1) and 0 everywhere else. In this example, L0 is not incoherent\nand the sampling is not uniform either, but it could be veri\ufb01ed that L0 is \u2126/\u2126T -isomeric.\n\n3.2 Results\nIn this subsection, we shall show how the isomeric condition can take effect in the context of\nnonuniform sampling, establishing some theorems pertaining to missing data recovery [35] as well\nas matrix completion.\n\n3.2.1 Missing Data Recovery\nBefore exploring the matrix completion problem, for the ease of understanding, we would like\nto consider a missing data recovery problem studied by Zhang [35], which could be described as\nfollows: Let y0 \u2208 Rm be a data vector drawn form some low-dimensional subspace, denoted as\ny0 \u2208 S0 \u2282 Rm. Suppose that y0 contains some available observations in yb \u2208 Rk and some missing\nentries in yu \u2208 Rm\u2212k. Namely, after a permutation,\n\ny0 =\n\n, yb \u2208 Rk, yu \u2208 Rm\u2212k.\n\n(3)\n\n(cid:21)\n\n(cid:20) yb\n\nyu\n\nGiven the observations in yb, we seek to restore the unseen entries in yu. To do this, we consider the\nprevalent idea that represents a data vector as a linear combination of the bases in a given dictionary:\n(4)\nwhere A \u2208 Rm\u00d7p is a dictionary constructed in advance and x0 \u2208 Rp is the representation of y0.\nUtilizing the same permutation used in (3), we can partition the rows of A into two parts according to\nthe indices of the observed and missing entries, respectively:\n\ny0 = Ax0,\n\nA =\n\n, Ab \u2208 Rk\u00d7p, Au \u2208 R(m\u2212k)\u00d7p.\n\n(5)\n\n(cid:21)\n\n(cid:20) Ab\n\nAu\n\nIn this way, the equation in (4) gives that\n\nyb = Abx0\n\nand\n\nyu = Aux0.\n\n6\n\n\fAs we now can see, the unseen data yu could be restored, as long as the representation x0 is retrieved\nby only accessing the available observations in yb.\nIn general cases, there are in\ufb01nitely many\nrepresentations that satisfy y0 = Ax0, e.g., x0 = A+y0, where (\u00b7)+ is the pseudo-inverse of a matrix.\nSince A+y0 is the representation of minimal (cid:96)2 norm, we revisit the traditional (cid:96)2 program:\n\n(cid:107)x(cid:107)2\n2 ,\n\n1\n2\n\nx\n\ns.t.\n\nmin\n\nyb = Abx,\n\n(6)\nwhere (cid:107) \u00b7 (cid:107)2 is the (cid:96)2 norm of a vector. Under some veri\ufb01able conditions, the above (cid:96)2 program\nis indeed consistently successful in a sense as in the following: For any y0 \u2208 S0 with an arbitrary\npartition y0 = [yb; yu] (i.e., arbitrarily missing), the desired representation x0 = A+y0 is the unique\nminimizer to the problem in (6). That is, the unseen data yu is exactly recovered by \ufb01rstly computing\nthe minimizer x\u2217 to problem (6) and then calculating yu = Aux\u2217.\nTheorem 3.3. Let y0 = [yb; yu] \u2208 Rm be an authentic sample drawn from some low-dimensional\nsubspace S0 embedded in Rm, A \u2208 Rm\u00d7p be a given dictionary and k be the number of available\nobservations in yb. Then the convex program (6) is consistently successful, provided that S0 \u2286\nspan{A} and the dictionary A is k-isomeric.\nUnlike the theory in [35], the condition of which is unveri\ufb01able, our k-isomeric condition could be\nveri\ufb01ed in \ufb01nite time. Notice, that the problem of missing data recovery is closely related to matrix\ncompletion, which is actually to restore the missing entries in multiple data vectors simultaneously.\nHence, Theorem 3.3 can be naturally generalized to the case of matrix completion, as will be shown\nin the next subsection.\n\n3.2.2 Matrix Completion\nThe spirits of the (cid:96)2 program (6) can be easily transferred to the case of matrix completion. Follow-\ning (6), one may consider Frobenius norm minimization for matrix completion:\n\n1\n2\n\n(cid:107)X(cid:107)2\n\nF , s.t. P\u2126(AX \u2212 L0) = 0,\n\nmin\nX\n\n(7)\nwhere A \u2208 Rm\u00d7p is a dictionary assumed to be given. As one can see, the problem in (7) is equivalent\nto (6) if L0 is consisting of only one column vector. The same as (6), the convex program (7) can\nalso exactly recover the desired representation matrix A+L0, as shown in the theorem below. The\ndifference is that we here require \u2126-isomerism instead of k-isomerism.\nTheorem 3.4. Let L0 \u2208 Rm\u00d7n and \u2126 \u2286 {1, 2,\u00b7\u00b7\u00b7 , m} \u00d7 {1, 2,\u00b7\u00b7\u00b7 , n}. Suppose that A \u2208 Rm\u00d7p\nis a given dictionary. Provided that L0 \u2208 span{A} and A is \u2126-isomeric, the desired representation\nX0 = A+L0 is the unique minimizer to the problem in (7).\n\nTheorem 3.4 tells us that, in general, even when the locations of the missing entries are interrelated\nand nonuniformly distributed, the target matrix L0 can be restored as long as we have found a proper\ndictionary A. This motivates us to consider the commonly used bilinear program that seeks both A\nand X simultaneously:\n\n1\n2\n\n1\n2\n\n(cid:107)A(cid:107)2\n\n(cid:107)X(cid:107)2\n\nF , s.t. P\u2126(AX \u2212 L0) = 0,\n\nF +\n\nmin\nA,X\n\n(8)\nwhere A \u2208 Rm\u00d7p and X \u2208 Rp\u00d7n. The problem above is bilinear and therefore nonconvex. So, it\nwould be hard to obtain a strong performance guarantee as done in the convex programs, e.g., [4, 21].\nInterestingly, under a very mild condition, the problem in (8) is proved to include the exact solutions\nthat identify the target matrix L0 as the critical points.\nTheorem 3.5. Let L0 \u2208 Rm\u00d7n and \u2126 \u2286 {1, 2,\u00b7\u00b7\u00b7 , m} \u00d7 {1, 2,\u00b7\u00b7\u00b7 , n}. Denote the rank and SVD\n0 , respectively. If L0 is \u2126/\u2126T -isomeric then the exact solution, denoted by\nof L0 as r0 and U0\u03a30V T\n(A0, X0) and given by\n\nA0 = U0\u03a3\n\n1\n2\n\n0 QT , X0 = Q\u03a3\n\nis a critical point to the problem in (8).\n\n1\n2\n\n0 ,\u2200Q \u2208 Rp\u00d7r0, QT Q = I,\n\n0 V T\n\nTo exhibit the power of program (8), however, the parameter p, which indicates the number of\ncolumns in the dictionary matrix A, must be close to the true rank of the target matrix L0. This is\n\n7\n\n\fFigure 2: Comparing the bilinear program (9) (p = m) with the convex method (2). The numbers\nplotted on the above \ufb01gures are the success rates within 20 random trials. The white and black points\nmean \u201csucceed\u201d and \u201cfail\u201d, respectively. Here the success is in a sense that PSNR \u2265 40dB, where\nPSNR standing for peak signal-to-noise ratio.\n\nimpractical in the cases where the rank of L0 is unknown. Notice, that the \u2126-isomeric condition\nimposed on A requires\n\nrank (A) \u2264 |\u2126j|,\u2200j = 1, 2,\u00b7\u00b7\u00b7 , n.\n\nmin\nA,X\n\n1\n2\n\nThis, together with the condition of L0 \u2208 span{A}, essentially need us to solve a low rank matrix\nrecovery problem [14]. Hence, we suggest to combine the formulation (7) with the popular idea of\nnuclear norm minimization, resulting in a bilinear program that jointly estimates both the dictionary\nmatrix A and the representation matrix X by\n(cid:107)X(cid:107)2\n\nF , s.t. P\u2126(AX \u2212 L0) = 0,\n\n(cid:107)A(cid:107)\u2217 +\n\n(9)\n\nwhich, by coincidence, has been mentioned in a paper about optimization [32]. Similar to (8), the\nprogram in (9) has the following theorem to guarantee its performance.\nTheorem 3.6. Let L0 \u2208 Rm\u00d7n and \u2126 \u2286 {1, 2,\u00b7\u00b7\u00b7 , m} \u00d7 {1, 2,\u00b7\u00b7\u00b7 , n}. Denote the rank and SVD\nof L0 as r0 and U0\u03a30V T\n0 , respectively. If L0 is \u2126/\u2126T -isomeric then the exact solution, denoted by\n(A0, X0) and given by\n\nA0 = U0\u03a3\n\n2\n3\n\n0 QT , X0 = Q\u03a3\n\nis a critical point to the problem in (9).\n\n1\n3\n\n0 ,\u2200Q \u2208 Rp\u00d7r0, QT Q = I,\n\n0 V T\n\nUnlike (8), which possesses superior performance only if p is close to rank (L0) and the initial\nsolution is chosen carefully, the bilinear program in (9) can work well by simply choosing p = m\nand using A = I as the initial solution. To see why, one essentially needs to \ufb01gure out the conditions\nunder which a speci\ufb01c optimization procedure can produce an optimal solution that meets an exact\nsolution. This requires extensive justi\ufb01cations and we leave it as future work.\n\n4 Simulations\nTo verify the superiorities of the nonconvex matrix completion methods over the convex program (2),\nwe would like to experiment with randomly generated matrices. We generate a collection of m \u00d7 n\n(m = n = 100) target matrices according to the model of L0 = BC, where B \u2208 Rm\u00d7r0 and\nC \u2208 Rr0\u00d7n are N (0, 1) matrices. The rank of L0, i.e., r0, is con\ufb01gured as r0 = 1, 5, 10,\u00b7\u00b7\u00b7 , 90, 95.\nRegarding the index set \u2126 consisting of the locations of the observed entries, we consider t-\nwo settings: One is to create \u2126 by using a Bernoulli model to randomly sample a subset from\n{1,\u00b7\u00b7\u00b7 , m} \u00d7 {1,\u00b7\u00b7\u00b7 , n} (referred to as \u201cuniform\u201d), the other is as in Figure 1 that makes the\nlocations of the observed entries be concentrated around the main diagonal of a matrix (referred to as\n\u201cnonuniform\u201d). The observation fraction is set to be |\u2126|/(mn) = 0.01, 0.05,\u00b7\u00b7\u00b7 , 0.9, 0.95. For each\npair of (r0,|\u2126|/(mn)), we run 20 trials, resulting in 8000 simulations in total.\nWhen p = m and the identity matrix is used to initialize the dictionary A, we have empirically found\nthat program (8) has the same performance as (2). This is not strange, because it has been proven\nin [16] that (cid:107)L(cid:107)\u2217 = minA,X\nF ), s.t. L = AX. Figure 2 compares the bilinear\n\nF + (cid:107)X(cid:107)2\n\n2 ((cid:107)A(cid:107)2\n\n1\n\n8\n\nconvex (nonuniform)rank(L0)observed entries (%)1153555759595755535151nonconvex (nonuniform)rank(L0)1153555759595755535151convex (uniform)rank(L0)1153555759595755535151nonconvex (uniform)rank(L0)1153555759595755535151\fprogram (9) to the convex method (2). It can be seen that (9) works distinctly better than (2). Namely,\nwhile handling the nonuniformly missing data, the number of matrices successfully restored by the\nbilinear program (9) is 102% more than the convex program (2). Even for dealing with the missing\nentries chosen uniformly at random, in terms of the number of successfully restored matrices, the\nbilinear program (9) can still outperform the convex method (2) by 44%. These results illustrate that,\neven in the cases where the rank of L0 is unknown, the bilinear program (9) can do much better than\nthe convex optimization based method (2).\n\n5 Conclusion and Future Work\nThis work studied the problem of matrix completion with nonuniform sampling, a signi\ufb01cant setting\nnot extensively studied before. To \ufb01gure out the conditions under which exact recovery is possible,\nwe proposed a so-called isomeric condition, which provably holds when the standard assumptions\nof low-rankness, incoherence and uniform sampling arise. In addition, we also exempli\ufb01ed that\nthe isomeric condition can be obeyed in the other cases beyond the setting of uniform sampling.\nEven more, our theory implies that the isomeric condition is indeed necessary for making sure\nthat the minimal rank completion can identify the target matrix L0. Equipped with the isomeric\ncondition, \ufb01nally, we mathematically proved that the widely used bilinear programs can include the\nexact solutions that recover the target matrix L0 as the critical points; this guarantees the recovery\nperformance of bilinear programs to some extend.\nHowever, there still remain several problems for future work. In particular, it is unknown under which\nconditions a speci\ufb01c optimization procedure for (9) can produce an optimal solution that exactly\nrestores the target matrix L0. To do this, one needs to analyze the convergence property as well as\nthe recovery performance. Moreover, it is unknown either whether the isomeric condition suf\ufb01ces\nfor ensuring that the minimal rank completion can identify the target L0. These require extensive\njusti\ufb01cations and we leave them as future work.\n\nAcknowledgment\nWe would like to thanks the anonymous reviewers and meta-reviewers for providing us many valuable\ncomments to re\ufb01ne this paper.\n\nReferences\n[1] Emmanuel Cand\u00e8s and Terence Tao. The power of convex relaxation: Near-optimal matrix completion.\n\nIEEE Transactions on Information Theory, 56(5):2053\u20132080, 2009.\n\n[2] Emmanuel Cand\u00e8s and Yaniv Plan. Matrix completion with noise. In IEEE Proceeding, volume 98, pages\n\n925\u2013936, 2010.\n\n[3] William E. Bishop and Byron M. Yu. Deterministic symmetric positive semide\ufb01nite matrix completion.\n\nIn Neural Information Processing Systems, pages 2762\u20132770, 2014.\n\n[4] Emmanuel Cand\u00e8s and Benjamin Recht. Exact matrix completion via convex optimization. Foundations\n\nof Computational Mathematics, 9(6):717\u2013772, 2009.\n\n[5] Eyal Heiman, Gideon Schechtman, and Adi Shraibman. Deterministic algorithms for matrix completion.\n\nRandom Structures and Algorithms, 45(2):306\u2013317, 2014.\n\n[6] Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few entries.\n\nIEEE Transactions on Information Theory, 56(6):2980\u20132998, 2010.\n\n[7] Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from noisy entries.\n\nJournal of Machine Learning Research, 11:2057\u20132078, 2010.\n\n[8] Akshay Krishnamurthy and Aarti Singh. Low-rank matrix and tensor completion via adaptive sampling.\n\nIn Neural Information Processing Systems, pages 836\u2013844, 2013.\n\n[9] Troy Lee and Adi Shraibman. Matrix completion from any given set of observations. In Neural Information\n\nProcessing Systems, pages 1781\u20131787, 2013.\n\n[10] Rahul Mazumder, Trevor Hastie, and Robert Tibshirani. Spectral regularization algorithms for learning\n\nlarge incomplete matrices. Journal of Machine Learning Research, 11:2287\u20132322, 2010.\n\n[11] Karthik Mohan and Maryam Fazel. New restricted isometry results for noisy low-rank recovery. In IEEE\n\nInternational Symposium on Information Theory, pages 1573\u20131577, 2010.\n\n9\n\n\f[12] B. Recht, W. Xu, and B. Hassibi. Necessary and suf\ufb01cient conditions for success of the nuclear norm\n\nheuristic for rank minimization. Technical report, CalTech, 2008.\n\n[13] Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, and Alex J. Smola. Co\ufb01 rank - maximum margin\n\nmatrix factorization for collaborative ranking. In Neural Information Processing Systems, 2007.\n\n[14] Emmanuel J. Cand\u00e8s, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis?\n\nJournal of the ACM, 58(3):1\u201337, 2011.\n\n[15] Alexander L. Chistov and Dima Grigoriev. Complexity of quanti\ufb01er elimination in the theory of alge-\nbraically closed \ufb01elds. In Proceedings of the Mathematical Foundations of Computer Science, pages\n17\u201331, 1984.\n\n[16] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. A rank minimization heuristic with application to\n\nminimum order system approximation. In American Control Conference, pages 4734\u20134739, 2001.\n\n[17] Rong Ge, Jason D. Lee, and Tengyu Ma. Matrix completion has no spurious local minimum. In Neural\n\nInformation Processing Systems, pages 2973\u20132981, 2016.\n\n[18] Franz J. Kir\u00e1ly, Louis Theran, and Ryota Tomioka. The algebraic combinatorial approach for low-rank\n\nmatrix completion. J. Mach. Learn. Res., 16(1):1391\u20131436, January 2015.\n\n[19] Guangcan Liu and Ping Li. Recovery of coherent data via low-rank dictionary pursuit.\n\nInformation Processing Systems, pages 1206\u20131214, 2014.\n\nIn Neural\n\n[20] Daniel L. Pimentel-Alarc\u00f3n and Robert D. Nowak. The Information-theoretic requirements of subspace\n\nclustering with missing data. In International Conference on Machine Learning, 48:802\u2013810, 2016.\n\n[21] Guangcan Liu and Ping Li. Low-rank matrix completion in the presence of high coherence.\n\nTransactions on Signal Processing, 64(21):5623\u20135633, 2016.\n\nIEEE\n\n[22] Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace\nstructures by low-rank representation. IEEE Transactions on Pattern Recognition and Machine Intelligence,\n35(1):171\u2013184, 2013.\n\n[23] Guangcan Liu, Qingshan Liu, and Ping Li. Blessing of dimensionality: Recovering mixture data via\ndictionary pursuit. IEEE Transactions on Pattern Recognition and Machine Intelligence, 39(1):47\u201360,\n2017.\n\n[24] Guangcan Liu, Huan Xu, Jinhui Tang, Qingshan Liu, and Shuicheng Yan. A deterministic analysis for\n\nLRR. IEEE Transactions on Pattern Recognition and Machine Intelligence, 38(3):417\u2013430, 2016.\n\n[25] Raghu Meka, Prateek Jain, and Inderjit S. Dhillon. Matrix completion from power-law distributed samples.\n\nIn Neural Information Processing Systems, pages 1258\u20131266, 2009.\n\n[26] Sahand Negahban and Martin J. Wainwright. Restricted strong convexity and weighted matrix completion:\n\nOptimal bounds with noise. Journal of Machine Learning Research, 13:1665\u20131697, 2012.\n\n[27] Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, and Rachel Ward. Completing any low-rank matrix,\n\nprovably. Journal of Machine Learning Research, 16: 2999-3034, 2015.\n\n[28] Praneeth Netrapalli, U. N. Niranjan, Sujay Sanghavi, Animashree Anandkumar, and Prateek Jain. Non-\n\nconvex robust PCA. In Neural Information Processing Systems, pages 1107\u20131115, 2014.\n\n[29] Yuzhao Ni, Ju Sun, Xiaotong Yuan, Shuicheng Yan, and Loong-Fah Cheong. Robust low-rank subspace\nsegmentation with semide\ufb01nite guarantees. In International Conference on Data Mining Workshops, pages\n1179\u20131188, 2013.\n\n[30] R. Rockafellar. Convex Analysis. Princeton University Press, Princeton, NJ, USA, 1970.\n[31] Ruslan Salakhutdinov and Nathan Srebro. Collaborative \ufb01ltering in a non-uniform world: Learning with\n\nthe weighted trace norm. In Neural Information Processing Systems, pages 2056\u20132064, 2010.\n\n[32] Fanhua Shang, Yuanyuan Liu, and James Cheng. Scalable algorithms for tractable schatten quasi-norm\n\nminimization. In AAAI Conference on Arti\ufb01cial Intelligence, pages 2016\u20132022, 2016.\n\n[33] Ruoyu Sun and Zhi-Quan Luo. Guaranteed matrix completion via non-convex factorization.\n\nTransactions on Information Theory, 62(11):6535\u20136579, 2016.\n\nIEEE\n\n[34] Huan Xu, Constantine Caramanis, and Sujay Sanghavi. Robust PCA via outlier pursuit. IEEE Transactions\n\non Information Theory, 58(5):3047\u20133064, 2012.\n\n[35] Yin Zhang. When is missing data recoverable? CAAM Technical Report TR06-15, 2006.\n[36] Tuo Zhao, Zhaoran Wang, and Han Liu. A nonconvex optimization framework for low rank matrix\n\nestimation. In Neural Information Processing Systems, pages 559\u2013567, 2015.\n\n[37] David L. Donoho and Michael Elad. Optimally sparse representation in general (nonorthogonal) dic-\ntionaries via (cid:96)1 minimization. Proceedings of the National Academy of Sciences, 100(5): 2197-2202,\n2003.\n\n10\n\n\f", "award": [], "sourceid": 521, "authors": [{"given_name": "Guangcan", "family_name": "Liu", "institution": "Nanjing University of Information Science & Technology"}, {"given_name": "Qingshan", "family_name": "Liu", "institution": null}, {"given_name": "Xiaotong", "family_name": "Yuan", "institution": null}]}