{"title": "Low-Rank Matrix and Tensor Completion via Adaptive Sampling", "book": "Advances in Neural Information Processing Systems", "page_first": 836, "page_last": 844, "abstract": "We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees for these problems. Our algorithms exploit adaptivity to identify entries that are highly informative for identifying the column space of the matrix (tensor) and consequently, our results hold even when the row space is highly coherent, in contrast with previous analysis of matrix completion. In the absence of noise, we show that one can exactly recover a $n \\times n$ matrix of rank $r$ using $O(r^2 n \\log(r))$ observations, which is better than the best known bound under random sampling. We also show that one can recover an order $T$ tensor using $O(r^{2(T-1)}T^2 n \\log(r))$. For noisy recovery, we show that one can consistently estimate a low rank matrix corrupted with noise using $O(nr \\textrm{polylog}(n))$ observations. We complement our study with simulations that verify our theoretical guarantees and demonstrate the scalability of our algorithms.", "full_text": "Low-Rank Matrix and Tensor Completion via\n\nAdaptive Sampling\n\nAkshay Krishnamurthy\n\nComputer Science Department\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nakshaykr@cs.cmu.edu\n\nAarti Singh\n\nMachine Learning Department\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\naartisingh@cs.cmu.edu\n\nAbstract\n\nWe study low rank matrix and tensor completion and propose novel algorithms\nthat employ adaptive sampling schemes to obtain strong performance guarantees.\nOur algorithms exploit adaptivity to identify entries that are highly informative\nfor learning the column space of the matrix (tensor) and consequently, our results\nhold even when the row space is highly coherent, in contrast with previous analy-\nses. In the absence of noise, we show that one can exactly recover a n \u21e5 n matrix\nof rank r from merely \u2326(nr3/2 log(r)) matrix entries. We also show that one can\nrecover an order T tensor using \u2326(nrT1/2T 2 log(r)) entries. For noisy recov-\nery, our algorithm consistently estimates a low rank matrix corrupted with noise\nusing \u2326(nr3/2polylog(n)) entries. We complement our study with simulations\nthat verify our theory and demonstrate the scalability of our algorithms.\n\n1\n\nIntroduction\n\nRecently, the machine learning and signal processing communities have focused considerable atten-\ntion toward understanding the bene\ufb01ts of adaptive sensing. This theme is particularly relevant to\nmodern data analysis, where adaptive sensing has emerged as an ef\ufb01cient alternative to obtaining\nand processing the large data sets associated with scienti\ufb01c investigation. These empirical observa-\ntions have lead to a number of theoretical studies characterizing the performance gains offered by\nadaptive sensing over conventional, passive approaches. In this work, we continue in that direction\nand study the role of adaptive data acquisition in low rank matrix and tensor completion problems.\nOur study is motivated not only by prior theoretical results in favor of adaptive sensing but also\nby several applications where adaptive sensing is feasible. In recommender systems, obtaining a\nmeasurement amounts to asking a user about an item, an interaction that has been deployed in\nproduction systems. Another application pertains to network tomography, where a network operator\nis interested in inferring latencies between hosts in a communication network while injecting few\npackets into the network. The operator, being in control of the network, can adaptively sample the\nmatrix of pair-wise latencies, potentially reducing the total number of measurements. In particular,\nthe operator can obtain full columns of the matrix by measuring from one host to all others, a\nsampling strategy we will exploit in this paper.\nYet another example centers around gene expression analysis, where the object of interest is a matrix\nof expression levels for various genes across a number of conditions. There are typically two types\nof measurements:\nlow-throughput assays provide highly reliable measurements of single entries\nin this matrix while high-throughput microarrays provide expression levels of all genes of interest\nacross operating conditions, thus revealing entire columns. The completion problem can be seen\nas a strategy for learning the expression matrix from both low- and high-throughput data while\nminimizing the total measurement cost.\n\n1\n\n\f1.1 Contributions\n\nWe develop algorithms with theoretical guarantees for three low-rank completion problems. The\nalgorithms \ufb01nd a small subset of columns of the matrix (tensor) that can be used to reconstruct or\napproximate the matrix (tensor). We exploit adaptivity to focus on highly informative columns, and\nthis enables us to do away with the usual incoherence assumptions on the row-space while achieving\ncompetitive (or in some cases better) sample complexity bounds. Speci\ufb01cally our results are:\n\n1. In the absence of noise, we develop a streaming algorithm that enjoys both low sample\nrequirements and computational overhead. In the matrix case, we show that \u2326(nr3/2 log r)\nadaptively chosen samples are suf\ufb01cient for exact recovery, improving on the best known\nbound of \u2326(nr2 log2 n) in the passive setting [21]. This also gives the \ufb01rst guarantee for\nmatrix completion with coherent row space.\n\n2. In the tensor case, we establish that \u2326(nrT1/2T 2 log r) adaptively chosen samples are\nsuf\ufb01cient for recovering a n \u21e5 . . . \u21e5 n order T tensor of rank r. We complement this\nwith a necessary condition for tensor completion under random sampling, showing that\nour adaptive strategy is competitive with any passive algorithm. These are the \ufb01rst sample\ncomplexity upper and lower bounds for exact tensor completion.\n\n3. In the noisy matrix completion setting, we modify the adaptive column subset selection\nalgorithm of Deshpande et al. [10] to give an algorithm that \ufb01nds a rank-r approximation\nto a matrix using \u2326(nr3/2polylog(n)) samples. As before, the algorithm does not require\nan incoherent row space but we are no longer able to process the matrix sequentially.\n\n4. Along the way, we improve on existing results for subspace detection from missing data,\n\nthe problem of testing if a partially observed vector lies in a known subspace.\n\n2 Related Work\n\nThe matrix completion problem has received considerable attention in recent years. A series of\npapers [6, 7, 13, 21], culminating in Recht\u2019s elegent analysis of the nuclear norm minimization pro-\ngram, address the exact matrix completion problem through the framework of convex optimization,\n1} log2(n2)) randomly drawn samples are suf\ufb01cient to\nestablishing that \u2326((n1 + n2)r max{\u00b50, \u00b52\nexactly identify an n1 \u21e5 n2 matrix with rank r. Here \u00b50 and \u00b51 are parameters characterizing the\nincoherence of the row and column spaces of the matrix, which we will de\ufb01ne shortly. Candes and\nTao [7] proved that under random sampling \u2326(n1r\u00b50 log(n2)) samples are necessary, showing that\nnuclear norm minimization is near-optimal.\nThe noisy matrix completion problem has also received considerable attention [5, 17, 20]. The\nmajority of these results also involve some parameter that quanti\ufb01es how much information a single\nobservation reveals, in the same vein as incoherence.\nTensor completion, a natural generalization of matrix completion, is less studied. One challenge\nstems from the NP-hardness of computing most tensor decompositions, pushing researchers to study\nalternative structure-inducing norms in lieu of the nuclear norm [11, 22]. Both papers derive algo-\nrithms for tensor completion, but neither provide sample complexity bounds for the noiseless case.\nOur approach involves adaptive data acquisition, and consequently our work is closely related to\na number of papers focusing on using adaptive measurements to estimate a sparse vector [9, 15].\nIn these problems, speci\ufb01cally, problems where the sparsity basis is known a priori, we have a\nreasonable understanding of how adaptive sampling can lead to performance improvements. As a\nlow rank matrix is sparse in its unknown eigenbasis, the completion problem is coupled with learning\nthis basis, which poses a new challenge for adaptive sampling procedures.\nAnother relevant line of work stems from the matrix approximations literature. Broadly speaking,\nthis research is concerned with ef\ufb01ciently computing a structured matrix, i.e. sparse or low rank,\nthat serves as a good approximation to a fully observed input matrix. Two methods that apply to\nthe missing data setting are the Nystrom method [12, 18] and entrywise subsampling [1]. While\nthe sample complexity bounds match ours, the analysis for the Nystrom method has focused on\npositive-semide\ufb01nite kernel matrices and requires incoherence of both the row and column spaces.\nOn the other hand, entrywise subsampling is applicable, but the guarantees are weaker than ours.\n\n2\n\n\fIt is also worth brie\ufb02y mentioning the vast body of literature on column subset selection, the task\nof approximating a matrix by projecting it onto a few of its columns. While the best algorithms,\nnamely volume sampling [14] and sampling according to statistical leverages [3], do not seem to be\nreadily applicable to the missing data setting, some algorithms are. Indeed our procedure for noisy\nmatrix completion is an adaptation of an existing column subset selection procedure [10].\nOur techniques are also closely related to ideas employed for subspace detection \u2013 testing whether a\nvector lies in a known subspace \u2013 and subspace tracking \u2013 learning a time-evolving low-dimensional\nsubspace from vectors lying close to that subspace. Balzano et al. [2] prove guarantees for subspace\ndetection with known subspace and a partially observed vector, and we will improve on their result\nen route to establishing our guarantees. Subspace tracking from partial information has also been\nstudied [16], but little is known theoretically about this problem.\n\nrXk=1\n\n3 De\ufb01nitions and Preliminaries\nBefore presenting our algorithms, we clarify some notation and de\ufb01nitions. Let M 2 Rn1\u21e5n2 be a\nrank r matrix with singular value decomposition U \u2303V T . Let c1, . . . cn2 denote the columns of M.\nLet M 2 Rn1\u21e5...\u21e5nT denote an order T tensor with canonical decomposition:\n\nM =\n\na(1)\nk \u2326 a(2)\n\nk \u2326 . . . \u2326 a(T )\n\nk\n\n(1)\n\ni\n\ni\n\nk=1 need not be orthogonal, nor even linearly independent.\n\nwhere \u2326 is the outer product. De\ufb01ne rank(M) to be the smallest value of r that establishes this\nequality. Note that the vectors {a(t)\nk }r\nThe mode-t subtensors of M, denoted M(t)\n, are order T 1 tensors obtained by \ufb01xing the ith\ncoordinate of the t-th mode. For example, if M is an order 3 tensor, then M(3)\nare the frontal slices.\nWe represent a d-dimensional subspace U \u21e2 Rn as a set of orthonormal basis vectors U = {ui}d\nand in some cases as n \u21e5 d matrix whose columns are the basis vectors. The interpretation will be\nclear from context. De\ufb01ne the orthogonal projection onto U as PU v = U (U T U )1U T v.\nFor a set \u2326 \u21e2 [n]1, c\u2326 2 R|\u2326| is the vector whose elements are ci, i 2 \u2326 indexed lexicographically.\nSimilarly the matrix U\u2326 2 R|\u2326|\u21e5d has rows indexed by \u2326 lexicographically. Note that if U is a\northobasis for a subspace, U\u2326 is a |\u2326|\u21e5 d matrix with columns ui\u2326 where ui 2 U, rather than a set\nof orthonormal basis vectors. In particular, the matrix U\u2326 need not have orthonormal columns.\nThese de\ufb01nitions extend to the tensor setting with slight modi\ufb01cations. We use the vec operation\nto unfold a tensor into a single vector and de\ufb01ne the inner product hx, yi = vec(x)T vec(y). For a\nsubspace U \u21e2 R\u2326ni, we write it as a (Q ni) \u21e5 d matrix whose columns are vec(ui), ui 2 U. We\ncan then de\ufb01ne projections and subsampling as we did in the vector case.\nAs in recent work on matrix completion [7, 21], we will require a certain amount of incoherence\nbetween the column space associated with M (M) and the standard basis.\nDe\ufb01nition 1. The coherence of an r-dimensional subspace U \u21e2 Rn is:\n\ni=1\n\n\u00b5(U ) , n\nr\n\nmax\n\n1\uf8ffj\uf8ffn||PU ej||2\n\n(2)\n\nwhere ej denotes the jth standard basis element.\nIn previous analyses of matrix completion, the incoherence assumption is that both the row and col-\numn spaces of the matrix have coherences upper bounded by \u00b50. When both spaces are incoherent,\neach entry of the matrix reveals roughly the same amount of information, so there is little to be\ngained from adaptive sampling, which typically involves looking for highly informative measure-\nments. Thus the power of adaptivity for these problems should center around relaxing the incoher-\nence assumption, which is the direction we take in this paper. Unfortunately, even under adaptive\nsampling, it is impossible to identify a rank one matrix that is zero in all but one entry without ob-\nserving the entire matrix, implying that we cannot completely eliminate the assumption. Instead, we\nwill retain incoherence on the column space, but remove the restrictions on the row space.\n\n1We write [n] for {1, . . . , n}\n\n3\n\n\fAlgorithm 1: Sequential Tensor Completion (M,{mt}T\n\nt=1)\n\nt=1 [nt] uniformly with replacement w. p. mT /QT1\nof M, i 2 [nT ]:\n, {mt}T1\nt=1 )\n\ni\n\nt=1 nt.\n\n1. Let U = ;.\n2. Randomly draw entries \u2326 \u21e2QT1\n3. For each mode-T subtensor M(T )\ni\u2326 P U\u2326M(t)\n(a) If ||M(T )\n2 > 0:\ni\u2326||2\ni recurse on (M(T )\ni. \u02c6M(T )\nii. Ui PU?\n||PU?\n(b) Otherwise \u02c6M(T )\n\n\u02c6M(T )\n\u02c6M(T )\n\ni U (U T\n\n||\n\ni\n\ni\n\ni\n\n. U U[ Ui.\n\n\u2326U\u2326)1U\u2326M(T )\n\ni\u2326\n\n4. Return \u02c6M with mode-T subtensors \u02c6Mi\n\n(T ).\n\n4 Exact Completion Problems\n\nIn the matrix case, our sequential algorithm builds up the column space of the matrix by selecting a\nfew columns to observe in their entirety. In particular, we maintain a candidate column space \u02dcU and\ntest whether a column ci lives in \u02dcU or not, choosing to completely observe ci and add it to \u02dcU if it\ndoes not. Balzano et al. [2] observed that we can perform this test with a subsampled version of ci,\nmeaning that we can recover the column space using few samples. Once we know the column space,\nrecovering the matrix, even from few observations, amounts to solving determined linear systems.\nFor tensors, the algorithm becomes recursive in nature. At the outer level of the recursion, the\nalgorithm maintains a candidate subspace U for the mode T subtensors M(T )\n. For each of these\nsubtensors, we test whether M(T )\nlives in U and recursively complete that subtensor if it does not.\nOnce we complete the subtensor, we add it to U and proceed at the outer level. When the subtensor\nitself is just a column; we observe the columns in its entirety.\nThe pseudocode of the algorithm is given in Algorithm 1. Our \ufb01rst main result characterizes the\nperformance of the tensor completion algorithm. We defer the proof to the appendix.\n\nTheorem 2. Let M = Pr\nspan({a(t)\nj }r\nmt = 36rt1/2\u00b5t1\nrecovers M and has expected sample complexity\n\nbe a rank r order-T tensor with subspaces A(t) =\nj=1). Suppose that all of A(1), . . . A(T1) have coherence bounded above by \u00b50. Set\nlog(2r/) for each t. Then with probability 1 5T rT , Algorithm 1 exactly\n\ni=1 \u2326T\n\nt=1a(t)\n\n0\n\nj\n\ni\n\ni\n\n36(\n\nTXt=1\n\nnt)rT1/2\u00b5T1\n\n0\n\nlog(2r/)\n\n(3)\n\n0\n\nIn the special case of a n \u21e5 . . . \u21e5 n tensor of order T , the algorithm succeeds with high probability\nusing \u2326(nrT1/2\u00b5T1\nT 2 log(T r/)) samples, exhibiting a linear dependence on the tensor dimen-\nt=2 nt\u2318 r\u2318 samples are\n\nsions. In comparison, the only guarantee we are aware of shows that \u2326\u21e3\u21e3QT1\ndimension [23]. In the noiseless scenario, one can unfold the tensor into a n1 \u21e5QT\nstructure, this approach will scale withQT\n\nsuf\ufb01cient for consistent estimation of a noisy tensor, exhibiting a much worse dependence on tensor\nt=2 nt matrix\nand apply any matrix completion algorithm. Unfortunately, without exploiting the additional tensor\nt=2 nt, which is similarly much worse than our guarantee.\nNote that the na\u00a8\u0131ve procedure that does not perform the recursive step has sample complexity scaling\nwith the product of the dimensions and is therefore much worse than the our algorithm.\nThe most obvious specialization of Theorem 2 is to the matrix completion problem:\nCorollary 3. Let M := U \u2303V T 2 Rn1\u21e5n2 have rank r, and \ufb01x > 0. Assume \u00b5(U ) \uf8ff \u00b50. Setting\nm , m2 36r3/2\u00b50 log( 2r\n ), the sequential algorithm exactly recovers M with probability at least\n1 4r + while using in expectation\n\n36n2r3/2\u00b50 log(2r/) + rn1\n\n(4)\n\n4\n\n\fobservations. The algorithm runs in O(n1n2r + r3m) time.\n\nA few comments are in order. Recht [21] guaranteed exact recovery for the nuclear norm minimiza-\n1} log2(2n2)\ntion procedure as long as the number of observations exceeds 32(n1+n2)r max{\u00b50, \u00b52\nwhere controls the probability of failure and ||U V T||1 \uf8ff \u00b51pr/(n1n2) with \u00b51 as another co-\nherence parameter. Without additional assumptions, \u00b51 can be as large as \u00b50pr. In this case, our\nbound improves on his in its the dependence on r, \u00b50 and logarithmic terms.\nThe Nystrom method can also be applied to the matrix completion problem, albeit under non-\nuniform sampling. Given a PSD matrix, one uses a randomly sampled set of columns and the corre-\nsponding rows to approximate the remaining entries. Gittens showed that if one samples O(r log r)\ncolumns, then one can exactly reconstruct a rank r matrix [12]. This result requires incoherence of\nboth row and column spaces, so it is more restrictive than ours. Almost all previous results for exact\nmatrix completion require incoherence of both row and column spaces.\nThe one exception is a recent paper by Chen et al. that we became aware of while preparing the\n\ufb01nal version of this work [8]. They show that sampling the matrix according to statistical leverages\nof the rows and columns can eliminate the need for incoherence assumptions. Speci\ufb01cally, when the\nmatrix has incoherent column space, they show that by \ufb01rst estimating the leverages of the columns,\nsampling the matrix according to this distribution, and then solving the nuclear norm minimization\nprogram, one can recover the matrix with \u2326(nr\u00b50 log2 n) samples. Our result improves on theirs\nwhen r is small compared to n, speci\ufb01cally when pr log r \uf8ff log2 n, which is common.\nOur algorithm is also very computationally ef\ufb01cient. Existing algorithms involve successive singular\nvalue decompositions (O(n1n2r) per iteration), resulting in much worse running times.\nThe key ingredient in our proofs is a result pertaining to subspace detection, the task of testing if\na subsampled vector lies in a subspace. This result, which improves over the results of Balzano et\nal. [2], is crucial in obtaining our sample complexity bounds, and may be of independent interest.\nTheorem 4. Let U be a d-dimensional subspace of Rn and y = x + v where x 2 U and v 2 U?.\n and let \u2326 be an index set with entries sampled uniformly with\nFix > 0, m 8\nreplacement with probability m/n. Then with probability at least 1 4:\n||v||2\n2 \uf8ff (1 + \u21b5)\nWhere \u21b5 = q2 \u00b5(v)\nm log(1/) + 2 \u00b5(v)\n3m log(1/), = 6 log(d/) + 4\nq 8d\u00b5(U )\n3m log(2d/) and \u00b5(v) = n||v||2\n2.\n1/||v||2\nThis theorem shows that if m = \u2326(max{\u00b5(v), d\u00b5(U ), dp\u00b5(U )\u00b5(v)} log d) then the orthogonal\nprojection from missing data is within a constant factor of the fully observed one.\nIn contrast,\nBalzano et al. [2] give a similar result that requires m = \u2326(max{\u00b5(v)2, d\u00b5(U ), d\u00b5(U )\u00b5(v)} log d)\nto get a constant factor approximation. In the matrix case, this improved dependence on incoherence\nparameters brings our sample complexity down from nr2\u00b52\n0 log r to nr3/2\u00b50 log r. We conjecture\nthat this theorem can be further improved to eliminate another pr factor from our \ufb01nal bound.\n\n3 d\u00b5(U ) log 2d\n\n2 \uf8ff|| y\u2326 P U\u2326y\u2326||2\n\nm(1 \u21b5) d\u00b5(U )\n\nm log2(d/), =\n\n(1)\n\nm\nn ||v||2\n\n2\n\n(5)\n\nd\u00b5(v)\n\n3\n\nn\n\n\n\n4.1 Lower Bounds for Uniform Sampling\n\nWe adapt the proof strategy of Candes and Tao [7] to the tensor completion problem and establish\nthe following lower bound for uniform sampling:\nTheorem 5 (Passive Lower Bound). Fix 1 \uf8ff m, r \uf8ff mint nt and \u00b50 > 1. Fix 0 << 1/2 and\nsuppose that we do not have the condition:\n\nQT\nThen there exist in\ufb01nitely many pairs of distinct n1 \u21e5 . . . \u21e5 nT order-T tensors M 6= M0 of rank r\nwith coherence parameter \uf8ff \u00b50 such that P\u2326(M) = P\u2326(M0) with probability at least . Each entry\nis observed independently with probability T = mQT\n\ni=1 ni\n\n.\n\n(6)\n\n\u00b5T1\n0\n\nrT1\ni=2 ni\n\nlog\u21e3 n1\n2\u2318\n\n log 1 \n\nm\n\ni=1 ni! \nQT\n\n5\n\n\fm \uf8ff n1rT1\u00b5T1\n\n0\n\nlog\u21e3 n1\n\n2\u2318 (1 \u270f/2)\n\n(7)\n\nTheorem 5 implies that as long as the right hand side of Equation 6 is at most \u270f< 1, and:\n\nthen with probability at least there are in\ufb01nitely many matrices that agree on the observed entries.\nThis gives a necessary condition on the number of samples required for tensor completion. Note\nthat when T = 2 we recover the known lower bound for matrix completion.\nTheorem 5 gives a necessary condition under uniform sampling. Comparing with Theorem 2 shows\nthat our procedure outperforms any passive procedure in its dependence on the tensor dimensions.\nHowever, our guarantee is suboptimal in its dependence on r. The extra factor of pr would be\neliminated by a further improvement to Theorem 5, which we conjecture is indeed possible.\nFor adaptive sampling, one can obtain a lower bound via a parameter counting argument. Observing\nk (it) = Mi1,...,iT . If\n\nthe (i1, . . . , iT )th entry leads to a polynomial equation of the formPkQt a(t)\nm < r(Pt nt), this system is underdetermined showing that \u2326((Pt nt)r) observations are neces-\n\nsary for exact recovery, even under adaptive sampling. Thus, our algorithm enjoys sample complex-\nity with optimal dependence on matrix dimensions.\n\n5 Noisy Matrix Completion\n\nOur algorithm for noisy matrix completion is an adaptation of the column subset selection (CSS)\nalgorithm analyzed by Deshpande et al. [10]. The algorithm builds a candidate column space in\nrounds; at each round it samples additional columns with probability proportional to their projection\non the orthogonal complement of the candidate column space.\nTo concretely describe the algorithm, suppose that at the beginning of the lth round we have a\ncandidate subspace Ul. Then in the lth round, we draw s additional columns according to the\ndistribution where the probability of drawing the ith column is proportional to ||PU?l\n2. Observing\nthese s columns in full and then adding them to the subspace Ul gives the candidate subspace Ul+1\nfor the next round. We initialize the algorithm with U1 = ;. After L rounds, we approximate each\ncolumn c with \u02c6c = UL(U T\nThe challenge is that the algorithm cannot compute the sampling probabilities without observing\nentries of the matrix. However, our results show that with reliable estimates, which can be computed\nfrom few observations, the algorithm still performs well.\nWe assume that the matrix M 2 Rn1\u21e5n2 can be decomposed as a rank r matrix A and a random\ngaussian matrix R whose entries are independently drawn from N (0, 2). We write A = U \u2303V T\nand assume that \u00b5(U ) \uf8ff \u00b50. As before, the incoherence assumption is crucial in guaranteeing that\none can estimate the column norms, and consequently sampling probabilities, from missing data.\nTheorem 6. Let \u2326 be the set of all observations over the course of the algorithm,\nlet UL\nbe the subspace obtained after L = log(n1n2) rounds and \u02c6M be the matrix whose columns\n\u02c6ci = UL(U T\n\nL\u2326c\u2326 and concatenate these estimates to form \u02c6M.\n\nL\u2326c\u2326i. Then there are constants c1, c2 such that:\n\nL\u2326UL\u2326)1U T\n\nL\u2326UL\u2326)1U T\n\nci||2\n\n||A \u02c6M||2\n\nF \uf8ff\n\nc1\n\n(n1n2)||A||2\n\nF + c2||R\u2326||2\n\nF\n\n\u02c6M can be computed from \u2326((n1 + n2)r3/2\u00b5(U )polylog(n1n2)) observations.\nF = 1 and Rij \u21e0N (0, 2/(n1n2)), then there is a constant c? for which:\n||A||2\n\nIn particular, if\n\n||A \u02c6A||2\n\nF \uf8ff\n\nc?\n\nn1n2\u21e31 + 2\u21e3(n1 + n2)r3/2\u00b5(U )polylog(n1n2)\u2318\u2318\n\nThe main improvement in the result is in relaxing the assumptions on the underlying matrix A.\nExisting results for noisy matrix completion require that the energy of the matrix is well spread\nout across both the rows and the columns (i.e. incoherence), and the sample complexity guarantees\ndeteriorate signi\ufb01cantly without such an assumption [5, 17]. As a concrete example, Negahban and\nWainwright [20] use a notion of spikiness, measured as pn1n2 ||A||1\nwhich can be as large as pn2\n||A||F\nin our setup, e.g. when the matrix is zero except for on one column and constant across that column.\n\n6\n\n\f(a)\n\n(d)\n\n(b)\n\n(e)\n\n(c)\n\n(f)\n\n(g)\n\n(h)\n\nFigure 1: Probability of success curves for our noiseless matrix completion algorithm (top) and\nSVT (middle). Top: Success probability as a function of: Left: p, the fraction of samples per\ncolumn, Center: np, total samples per column, and Right: np log2 n, expected samples per column\nfor passive completion. Bottom: Success probability of our noiseless algorithm for different values\nof r as a function of p, the fraction of samples per column (left), p/r3/2 (middle) and p/r (right).\n\n(i)\n\nF = 1 and noise variance rescaled by\n\nThe choices of ||A||2\nenable us to compare our results\nwith related work [20]. Thinking of n1 = n2 = n and the incoherence parameter as a constant, our\nresults imply consistent estimation as long as 2 = !\u21e3\n\nr2polylog(n)\u2318. On the other hand, thinking\n\nof the spikiness parameter as a constant, [20] show that the error is bounded by 2nr log n\nwhere\nm is the total number of observations. Using the same number of samples as our procedure, their\nresults implies consistency as long as 2 = !(rpolylog(n)). For small r (i.e. r = O(1)), our noise\ntolerance is much better, but their results apply even with fewer observations, while ours do not.\n\nm\n\n1\n\nn1n2\n\nn\n\n6 Simulations\n\nWe verify Corollary 3\u2019s linear dependence on n in Figure 1, where we empirically compute the\nsuccess probability of the algorithm for varying values of n and p = m/n, the fraction of entries\nobserved per column. Here we study square matrices of \ufb01xed rank r = 5 with \u00b5(U ) = 1. Figure 1(a)\nshows that our algorithm can succeed with sampling a smaller and smaller fraction of entries as n\nincreases, as we expect from Corollary 3. In Figure 1(b), we instead plot success probability against\ntotal number of observations per column. The fact that the curves coincide suggests that the samples\nper column, m, is constant with respect to n, which is precisely what Corollary 3 implies. Finally,\nin Figure 1(c), we rescale instead by n/ log2 n, which corresponds to the passive sample complexity\nbound [21]. Empirically, the fact that these curves do not line up demonstrates that our algorithm\nrequires fewer than log2 n samples per column, outperforming the passive bound.\nThe second row of Figure 1 plots the same probability of success curves for the Singular Value\nThresholding (SVT) algorithm [4]. As is apparent from the plots, SVT does not enjoy a linear\ndependence on n; indeed Figure 1(f) con\ufb01rms the logarithmic dependency that we expect for passive\nmatrix completion, and establishes that our algorithm has empirically better performance.\n\n7\n\n\fn\n\n1000\n\n5000\n\n10000\n\nUnknown M\n\nr m/dr m/n2\n0.07\n10\n0.33\n50\n100\n0.61\n0.01\n10\n0.07\n50\n0.14\n100\n10\n0.01\n0.03\n50\n100\n0.07\n\n3.4\n3.3\n3.2\n3.4\n3.5\n3.4\n3.4\n3.5\n3.5\n\nResults\ntime (s)\n\n16\n29\n45\n3\n27\n104\n10\n84\n283\n\nTable 1: Computational results on large low-\nrank matrices. dr = r(2n r) is the degrees of\nfreedom, so m/dr is the oversampling ratio.\n\nFigure 2: Reconstruction error as a function of\nrow space incoherence for our noisy algorithm\n(CSS) and the semide\ufb01nite program of [20].\nIn the third row, we study the algorithm\u2019s dependence on r on 500 \u21e5 500 square matrices. In Fig-\nure 1(g) we plot the probability of success of the algorithm as a function of the sampling probability\np for matrices of various rank, and observe that the sample complexity increases with r. In Fig-\nure 1(h) we rescale the x-axis by r3/2 so that if our theorem is tight, the curves should coincide. In\nFigure 1(i) we instead rescale the x-axis by r1 corresponding to our conjecture about the perfor-\nmance of the algorithm. Indeed, the curves line up in Figure 1(i), demonstrating that empirically, the\nnumber of samples needed per column is linear in r rather than the r3/2 dependence in our theorem.\nTo con\ufb01rm the computational improvement over existing methods, we ran our matrix completion\nalgorithm on large-scale matrices, recording the running time and error in Table 1. To contrast with\nSVT, we refer the reader to Table 5.1 in [4]. As an example, recovering a 10000 \u21e5 10000 matrix of\nrank 100 takes close to 2 hours with the SVT, while it takes less than 5 minutes with our algorithm.\nFor the noisy algorithm, we study the dependence on row-space incoherence. In Figure 2, we plot the\nreconstruction error as a function of the row space coherence for our procedure and the semide\ufb01nite\nprogram of Negahban and Wainwright [20], where we ensure that both algorithms use the same\nnumber of observations. It\u2019s readily apparent that the SDP decays in performance as the row space\nbecomes more coherent while the performance of our procedure is unaffected.\n\n7 Conclusions and Open Problems\nIn this work, we demonstrate how sequential active algorithms can offer signi\ufb01cant improvements\nin time, and measurement overhead over passive algorithms for matrix and tensor completion. We\nhope our work motivates further study of sequential active algorithms for machine learning.\nSeveral interesting theoretical questions arise from our work:\n\n1. Can we tighten the dependence on rank for these problems? In particular, can we bring the\n\ndependence on r down from r3/2 to linear? Simulations suggest this is possible.\n\n2. Can one generalize the nuclear norm minimization program for matrix completion to the\n\ntensor completion setting while providing theoretical guarantees on sample complexity?\n\nWe hope to pursue these directions in future work.\n\nAcknowledgements\n\nThis research is supported in part by AFOSR under grant FA9550-10-1-0382 and NSF under grant\nIIS-1116458. AK is supported in part by a NSF Graduate Research Fellowship. AK would like to\nthank Martin Azizyan, Sivaraman Balakrishnan and Jayant Krishnamurthy for fruitful discussions.\n\nReferences\n[1] Dimitris Achlioptas and Frank Mcsherry. Fast computation of low-rank matrix approximations.\n\nJournal of the ACM (JACM), 54(2):9, 2007.\n\n8\n\n\f[2] Laura Balzano, Benjamin Recht, and Robert Nowak. High-dimensional matched subspace\ndetection when data are missing. In Information Theory Proceedings (ISIT), 2010 IEEE Inter-\nnational Symposium on, pages 1638\u20131642. IEEE, 2010.\n\n[3] Christos Boutsidis, Michael W Mahoney, and Petros Drineas. An improved approximation\nalgorithm for the column subset selection problem. In Proceedings of the twentieth Annual\nACM-SIAM Symposium on Discrete Algorithms, pages 968\u2013977. Society for Industrial and\nApplied Mathematics, 2009.\n\n[4] Jian-Feng Cai, Emmanuel J Cand`es, and Zuowei Shen. A singular value thresholding algorithm\n\nfor matrix completion. SIAM Journal on Optimization, 20(4):1956\u20131982, 2010.\n\n[5] Emmanuel J Candes and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE,\n\n98(6):925\u2013936, 2010.\n\n[6] Emmanuel J Cand`es and Benjamin Recht. Exact matrix completion via convex optimization.\n\nFoundations of Computational mathematics, 9(6):717\u2013772, 2009.\n\n[7] Emmanuel J Cand`es and Terence Tao. The power of convex relaxation: Near-optimal matrix\n\ncompletion. Information Theory, IEEE Transactions on, 56(5):2053\u20132080, 2010.\n\n[8] Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, and Rachel Ward. Coherent matrix\n\ncompletion. arXiv preprint arXiv:1306.2979, 2013.\n\n[9] Mark A Davenport and Ery Arias-Castro. Compressive binary search. In Information Theory\nProceedings (ISIT), 2012 IEEE International Symposium on, pages 1827\u20131831. IEEE, 2012.\n[10] Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang. Matrix approximation\n\nand projective clustering via volume sampling. Theory of Computing, 2:225\u2013247, 2006.\n\n[11] Silvia Gandy, Benjamin Recht, and Isao Yamada. Tensor completion and low-n-rank tensor\n\nrecovery via convex optimization. Inverse Problems, 27(2):025010, 2011.\n\n[12] Alex Gittens. The spectral norm error of the naive nystrom extension.\n\narXiv:1110.5305, 2011.\n\narXiv preprint\n\n[13] David Gross. Recovering low-rank matrices from few coef\ufb01cients in any basis. Information\n\nTheory, IEEE Transactions on, 57(3):1548\u20131566, 2011.\n\n[14] Venkatesan Guruswami and Ali Kemal Sinop. Optimal column-based low-rank matrix re-\nconstruction. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete\nAlgorithms, pages 1207\u20131214. SIAM, 2012.\n\n[15] Jarvis D Haupt, Richard G Baraniuk, Rui M Castro, and Robert D Nowak. Compressive\ndistilled sensing: Sparse recovery using adaptivity in compressive measurements. In Signals,\nSystems and Computers, 2009 Conference Record of the Forty-Third Asilomar Conference on,\npages 1551\u20131555. IEEE, 2009.\n\n[16] Jun He, Laura Balzano, and John Lui. Online robust subspace tracking from partial informa-\n\ntion. arXiv preprint arXiv:1109.3827, 2011.\n\n[17] Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from\n\nnoisy entries. The Journal of Machine Learning Research, 99:2057\u20132078, 2010.\n\n[18] Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Sampling methods for the nystr\u00a8om\n\nmethod. The Journal of Machine Learning Research, 98888:981\u20131006, 2012.\n\n[19] B\u00b4eatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model\n\nselection. The annals of Statistics, 28(5):1302\u20131338, 2000.\n\n[20] Sahand Negahban and Martin J Wainwright. Restricted strong convexity and weighted matrix\n\ncompletion: Optimal bounds with noise. The Journal of Machine Learning Research, 2012.\n\n[21] Benjamin Recht. A simpler approach to matrix completion. The Journal of Machine Learning\n\nResearch, 7777777:3413\u20133430, 2011.\n\n[22] Ryota Tomioka, Kohei Hayashi, and Hisashi Kashima. Estimation of low-rank tensors via\n\nconvex optimization. arXiv preprint arXiv:1010.0789, 2010.\n\n[23] Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, and Hisashi Kashima. Statistical performance of\nconvex tensor decomposition. In Advances in Neural Information Processing Systems, pages\n972\u2013980, 2011.\n\n[24] Roman Vershynin.\n\nIntroduction to the non-asymptotic analysis of random matrices. arXiv\n\npreprint arXiv:1011.3027, 2010.\n\n9\n\n\f", "award": [], "sourceid": 473, "authors": [{"given_name": "Akshay", "family_name": "Krishnamurthy", "institution": "CMU"}, {"given_name": "Aarti", "family_name": "Singh", "institution": "CMU"}]}