{"title": "Matrix Completion from Power-Law Distributed Samples", "book": "Advances in Neural Information Processing Systems", "page_first": 1258, "page_last": 1266, "abstract": "The low-rank matrix completion problem is a fundamental problem with many important applications. Recently, Candes & Recht, Keshavan et al. and Candes & Tao obtained the first non-trivial theoretical results for the problem assuming that the observed entries are sampled uniformly at random. Unfortunately, most real-world datasets do not satisfy this assumption, but instead exhibit power-law distributed samples. In this paper, we propose a graph theoretic approach to matrix completion that solves the problem for more realistic sampling models. Our method is easier to analyze than previous methods with the analysis reducing to computing the threshold for complete cascades in random graphs, a problem of independent interest. By analyzing the graph theoretic problem, we show that our method achieves exact recovery when the observed entries are sampled from the Chung-Lu-Vu model, which can generate power-law distributed graphs. We also hypothesize that our algorithm solves the matrix completion problem from an optimal number of entries for the popular preferential attachment model and provide strong empirical evidence for the claim. Furthermore, our method is easier to implement and is substantially faster than existing methods. We demonstrate the effectiveness of our method on examples when the low-rank matrix is sampled according to the prevalent random graph models for complex networks and also on the Netflix challenge dataset.", "full_text": "Matrix Completion from Power-Law Distributed\n\nSamples\n\nRaghu Meka, Prateek Jain, and Inderjit S. Dhillon\n\nDepartment of Computer Sciences\n\nUniversity of Texas at Austin\n\nAustin, TX 78712\n\n{raghu,pjain,inderjit}@cs.utexas.edu\n\nAbstract\n\nThe low-rank matrix completion problem is a fundamental problem with many\nimportant applications. Recently, [4],[13] and [5] obtained the \ufb01rst non-trivial\ntheoretical results for the problem assuming that the observed entries are sampled\nuniformly at random. Unfortunately, most real-world datasets do not satisfy this\nassumption, but instead exhibit power-law distributed samples. In this paper, we\npropose a graph theoretic approach to matrix completion that solves the problem\nfor more realistic sampling models. Our method is simpler to analyze than previ-\nous methods with the analysis reducing to computing the threshold for complete\ncascades in random graphs, a problem of independent interest. By analyzing the\ngraph theoretic problem, we show that our method achieves exact recovery when\nthe observed entries are sampled from the Chung-Lu-Vu model, which can gener-\nate power-law distributed graphs. We also hypothesize that our algorithm solves\nthe matrix completion problem from an optimal number of entries for the popu-\nlar preferential attachment model and provide strong empirical evidence for the\nclaim. Furthermore, our method is easy to implement and is substantially faster\nthan existing methods. We demonstrate the effectiveness of our method on ran-\ndom instances where the low-rank matrix is sampled according to the prevalent\nrandom graph models for complex networks and present promising preliminary\nresults on the Net\ufb02ix challenge dataset.\n\nIntroduction\n\n1\nCompleting a matrix from a few given entries is a fundamental problem with many applications in\nmachine learning, statistics, and compressed sensing. Since completion of arbitrary matrices is not\na well-posed problem, it is often assumed that the underlying matrix comes from a restricted class.\nHere we address the matrix completion problem under the natural assumption that the underlying\nmatrix is low-rank.\nFormally, for an unknown matrix M \u2208 Rm\u00d7n of rank at most k, given \u2126 \u2286 [m] \u00d7 [n], P\u2126(M )1 and\nk, the low-rank matrix completion problem is to \ufb01nd a matrix X \u2208 Rm\u00d7n such that\n\nrank(X) \u2264 k\n\nand P\u2126(X) = P\u2126(M ).\n\n(1.1)\n\nRecently Candes and Recht [4], Keshavan et.al [13], Candes and Tao [5] obtained the \ufb01rst non-trivial\nguarantees for the above problem under a few additional assumptions on the matrix M and the set of\nknown entries \u2126. At a high level, the assumptions made in the above papers can be stated as follows.\nA1 M is incoherent, in the sense that the singular vectors of M are not correlated with the\n\nstandard basis vectors.\n\n1Throughout this paper P\u2126 : Rm\u00d7n \u2192 Rm\u00d7n will denote the projection of a matrix onto the pairs of\n\nindices in \u2126: (P\u2126(X))ij = Xij for (i, j) \u2208 \u2126 and (P\u2126(X))ij = 0 otherwise.\n\n1\n\n\fA2 The observed entries are sampled uniformly at random.\n\nIn this work we address some of the issues with assumption [A2]. For \u2126 \u2286 [m]\u00d7[n], let the sampling\ngraph G\u2126 = (U, V, \u2126) be the bipartite graph with vertices U = {u1, . . . , um}, V = {v1, . . . , vn}\nand edges given by the ordered pairs in \u2126 2. Then, assumption [A2] can be reformulated as follows:\n\nA3 The sampling graph G\u2126 is an Erd\u02ddos-R\u00b4enyi random graph3.\n\nA prominent feature of Erd\u02ddos-R\u00b4enyi graphs is that the degrees of vertices are Poisson distributed and\nare sharply concentrated about their mean. The techniques of [4, 5], [13], as will be explained later,\ncrucially rely on these properties of Erd\u02ddos-R\u00b4enyi graphs. However, for most large real-world graphs\nsuch as the World Wide Web ([1]), the degree distribution deviates signi\ufb01cantly from the Poisson\ndistribution and has high variance. In particular, most large matrix-completion datasets such as the\nmuch publicized Net\ufb02ix prize dataset and the Yahoo Music dataset exhibit power-law distributed\ndegrees, i.e., the number of vertices of degree d is proportional to d\u2212\u03b2 for a constant \u03b2 (Figure 1).\nIn this paper, we overcome some of the shortcomings of assumption [A3] above by considering\nmore realistic random graph models for the sampling graph G\u2126. We propose a natural graph theo-\nretic approach for matrix completion (referred to as ICMC for informationcascadingmatrixcomple-\ntion) that we prove can handle sampling graphs with power-law distributed degrees. Our approach\nis motivated by the models for information cascading in social networks proposed by Kempe et\nal. [11, 12]. Moreover, the analysis of ICMC reduces to the problem of \ufb01nding density thresholds\nfor completecascadesin random graphs - a problem of independent interest.\n\nBy analyzing the threshold for complete cascades in the random graph model of Chung, Lu & Vu\n[6] (CLV model), we show that ICMC solves the matrix completion problem for sampling graphs\ndrawn from the CLV model. The bounds we obtain for matrix-completion on the CLV model are\nincomparable to the main results of [4, 5, 13]. The methods of the latter papers do not apply to\nmodels such as the CLV model that generate graphs with skewed degrees. On the other hand, for\nErdos-Renyi graphs the density requirements for ICMC are stronger than those of the above papers.\n\nWe also empirically investigate the threshold for complete cascading in other popular random graph\nmodels such as the preferential attachment model [1], the forest-\ufb01re model [17] and the af\ufb01liation\nnetworks model [16]. The empirical estimates we obtain for the threshold for complete cascading\nin the preferential attachment model strongly suggest that ICMC solves the exact matrix-completion\nproblem from an optimal number of entries for sampling procedures with preferential attachment.\n\nOur experiments demonstrate that for sampling graphs drawn from more realistic models such as\nthe preferential attachment, forest-\ufb01re and af\ufb01liation network models, ICMC outperforms - both in\naccuracy and time - the methods of [4, 5, 3, 13] by an order of magnitude.\n\nIn summary, our main contributions are:\n\u2022 We formulate the sampling process in matrix completion as generating random graphs (G\u2126) and\n\ndemonstrate that the sampling assumption [A3] does not hold for real-world datasets.\n\n\u2022 We propose a novel graph theoretic approach to matrix completion (ICMC) that extensively uses\nthe link structure of the sampling graph. We emphasize that previously none of the methods\nexploited the structure of the sampling graph.\n\n\u2022 We prove that our method solves the matrix completion problem exactly for sampling graphs\n\ngenerated from the CLV model which can generate power-law distributed graphs.\n\n\u2022 We empirically evaluate our method on more complex random graph models and on the Net\ufb02ix\n\nChallenge dataset demonstrating the effectiveness of our method over those of [4, 5, 3, 13].\n\n2 Previous Work and Preliminaries\n\nThe Net\ufb02ix challenge has recently drawn much attention to the low-rank matrix completion prob-\nlem. Most methods for matrix completion and the more general rank minimization problem with\naf\ufb01ne constraints are based on either relaxing the non-convex rank function to a convex function\nor assuming a factorization of the matrix and optimizing the resulting non-convex problem using\nalternating minimization and its variants [2, 15, 18].\n\n2We will often abuse notation and identify edges (ui, vj) with ordered pairs (i, j).\n3We consider the Erd\u02ddos-R\u00b4enyi model, where edges (ui, vj) \u2208 E independently with probability for p for\n\n(i, j) \u2208 [m] \u00d7 [n] and p is the density parameter.\n\n2\n\n\fUntil recently, most methods for rank minimization subject to af\ufb01ne constraints were heuristic in\nnature with few known rigorous guarantees. In a recent breakthrough, Recht et.al [20] extend the\ntechniques of compressed sensing to rank minimization with af\ufb01ne constraints. However, the results\nof Recht et.al do not apply to the case of matrix completion as the constraints in matrix completion\ndo not satisfy the restrictedisoperimetrypropertythey assume.\n\nBuilding on the work of Recht et al. [20], Candes and Recht [4] and Candes and Tao [5] showed that\nminimizing the trace-norm recovers the unknown low-rank matrix exactly under certain conditions.\nHowever, these approaches require the observed entries to be sampled uniformly at random and as\nsuggested by our experiments, do not work well when the observed entries are not drawn uniformly.\n\nIndependent of [4, 5], Keshavan et al. [13] also obtained similar results for matrix completion using\ndifferent techniques that generalize the works of Friedman et al. [9], Feige and Ofek [8] on the\nspectrum of random graphs. However, the results of [13], crucially rely on the regularity of Erd\u02ddos-\nR\u00b4enyi graphs and do not extend to sampling graphs with skewed degree distributions even for rank\none matrices. This is mainly because the results of Friedman et al. and Feige and Ofek on the\nspectralgapof Erd\u02ddos-R\u00b4enyi graphs do not hold for graph models with skewed expected degrees (see\n[6, 19]).\n\nWe also remark that several natural variants of the trimming phase of [8] and [13] did not improve\nthe performance in our experiments. A similar observation was made in [19], [10] who address the\nproblem of re-weighting the edges of graphs with skewed degrees in the context of LSA.\n\n2.1 Random Graph Models\nWe focus on four popular models of random graphs all of which can generate graphs with power-law\ndistributed degrees. In contrast to the common descriptions of the models, we need to work with\nbipartite graphs; however, the models we consider generalize naturally to bipartite graphs. Due to\nspace limitations we only give a (brief) description of the Chung et.al [6], and refer to the original\npapers for the preferential attachment [1], forest-\ufb01re [17] and af\ufb01liation networks [16] models.\n\n[6] generates graphs with arbitrary expected degree sequences, p1, . . . , pm,\nThe CLV model\nq1, . . . , qn with p1 + . . . + pm = q1 + . . . + qn = w. In the model, a bipartite graph G = (U, V, E)\nwith U = {u1, . . . , um}, V = {v1, . . . , vn} is generated by independently placing an edge between\nvertices ui, vj with probability piqj/w for all i \u2208 [m], j \u2208 [n]. We de\ufb01ne the densityof an instance\nof CLV model to be the expected average degree (p1 + . . . + pm)/(mn) = w/mn.\nThe CLV model is more general than the standard Erd\u02ddos-R\u00b4enyi model with the case pi = np, qi =\nmp corresponding to the standard Erd\u02ddos-R\u00b4enyi model with density p for bipartite random graphs.\nFurther, by choosing weights that are power-law distributed, the CLV model can generate graphs\nwith power-law distributed degrees, a prominent feature of real-world graphs.\n\n3 Matrix Completion from Information Cascading\nWe now present our algorithm ICMC. Consider the following standard formulation of the low-rank\nmatrix completion problem: Given k, \u2126, P\u2126(M ) for a rank k matrix M, \ufb01nd X, Y such that\n\nP\u2126(XY T ) = P\u2126(M ), X \u2208 Rm\u00d7k, Y \u2208 Rn\u00d7k.\n\n(3.1)\nNote that given X we can \ufb01nd Y and vice versa by solving a linear least squares regression prob-\nlem. This observation is the basis for the popular alternate minimization heuristic and its variants\nwhich outperform most methods in practice. However, analyzing the performance of alternate min-\nimization is a notoriously hard problem. Our algorithm can be seen as a more re\ufb01ned version of the\nalternate minimization heuristic that is more amenable to analysis. We assume that the target matrix\nM is non-degenerate in the following sense.\nDe\ufb01nition 3.1 A rank k matrix Z is non-degenerate if there exist X \u2208 Rm\u00d7k, Y \u2208 Rn\u00d7k,\nZ = XY T such that any k rows of X are linearly independent and any k rows of Y are linearly\nindependent.\nThough reminiscent of the incoherence property used by Candes and Recht, Keshavan et al., non-\ndegeneracy appears to be incomparable to the incoherence property used in the above works. Ob-\nserve that a random low-rank matrix is almost surely non-degenerate.\nOur method progressively computes rows of X and Y so that Equation (3.1) is satis\ufb01ed. Call a\nvertex ui \u2208 U as infectedif the i\u2019th row of X has been computed (the term infected is used to re\ufb02ect\n\n3\n\n\fthat infection spreads by contact as in an epidemic). Similarly, call a vertex vj \u2208 V as infected if\nthe j\u2019th row of Y has been computed. Suppose that at an intermediate iteration, vertices L \u2286 U and\nR \u2286 V are marked as infected. That is, the rows of X with indices in L and rows of Y with indices\nin R have been computed exactly.\nj \u2208 Rk, we only need k\nNow, for an uninfected j \u2208 [n], to compute the corresponding row of Y , y\nT\nindependent linear equations. Thus, if M is non-degenerate, to compute y\nj we only need k entries\nT\nof the j\u2019th column of M with row indices in L. Casting the condition in terms of the sampling graph\nG\u2126, y\nj can be computed and vertex vj \u2208 V be marked as infected if there are at least k edges from\nT\nvj to infected vertices in L. Analogously, x\ni can be computed and the vertex ui \u2208 U be marked as\nT\ninfected if there are at least k edges from ui to previously infected vertices R.\nObserve that M = XY T = XW W \u22121Y T , for any invertible matrix W \u2208 Rk\u00d7k. Thus for non-\ndegenerate M, without loss of generality, a set of k rows of X can be \ufb01xed to be the k \u00d7 k identity\nmatrix Ik. This suggests the following cascading procedure for infecting vertices in G\u2126 and pro-\ngressively computing the rows of X, Y . Here L0 \u2286 U with |L0| = k.\n\nICMC(G\u2126, P\u2126(M ), L0):\n1 Start with initially infected sets L = L0 \u2286 U, R = \u2205. Set the k \u00d7 k sub-matrix of X with rows\n\nin L0 to be Ik.\n\n2 Repeat until convergence:\n\n(a) Mark as infected all uninfected vertices in V that have at least k edges to previously infected\n\nvertices L and add the newly infected vertices to R.\n\n(b) For each newly infected vertex vj \u2208 R, compute the j\u2019th row of Y using the observed\n\nentries of M corresponding to edges from vj to L.\n\n(c) Mark as infected all uninfected vertices in U that have at least k edges to previously infected\n\nvertices R and add the newly infected vertices to L.\n\n(d) For each newly infected vertex ui \u2208 L, compute the i\u2019th row of X using the observed\n\nentries of M corresponding to edges from ui to R\n\n3 Output M \u2032 = XY T .\n\nWe abstract the cascading procedure from above using the framework of Kempe et al. [11] for\ninformation cascades in social networks. Let G = (W, E) be an undirected graph and \ufb01x A \u2286 W ,\nk > 0. De\ufb01ne \u03c3G,k(A, 0) = A and for t > 0 de\ufb01ne \u03c3G,k(A, t + 1) inductively by\n\n\u03c3G,k(A, t + 1) = \u03c3G,k(A, t) \u222a {u \u2208 W : u has at least k edges to \u03c3G,k(A, t) }.\n\nDe\ufb01nition 3.2 The in\ufb02uence of a set A \u2286 W , \u03c3G,k(A), is the number of vertices infected by the\ncascading process upon termination when starting at A. That is, \u03c3G,k(A) = | \u222at \u03c3G,k(A, t)|. We\nsay A is completelycascadingof order k if \u03c3G,k(A) = |W |.\n\nWe remark that using a variant of the standard depth-\ufb01rst search algorithm, the cascading process\nabove can be computed in linear time for any set A. From the discussion preceding ICMC it follows\nthat ICMC recovers M exactly if the cascading process starting at L0 infects all vertices of G\u2126 and\nwe get the following theorem.\n\nTheorem 3.1 Let M be a non-degenerate matrix of rank k. Then, given G\u2126 = (U, V, \u2126), P\u2126(M )\nand L0 \u2286 U with |L0| = k, ICMC(G\u2126, P\u2126(M ), L0) recovers the matrix M exactly if L0 is a\ncompletely cascading set of order k in G\u2126.\n\nThus, we have reduced the matrix-completion problem to the graph-theoretic problem of \ufb01nding\na completely cascading set (if it exists) in a graph. A more general case of the problem \u2013 \ufb01nding\na set of vertices that maximize in\ufb02uence, was studied by Kempe et al. [11] for more general cas-\ncading processes. They show the general problem of maximizing in\ufb02uence to be NP-hard and give\napproximation algorithms for several classes of instances.\n\nHowever, it appears that for most reasonable random graph models, the highest degree vertices have\nlarge in\ufb02uence with high probability.\nIn the following we investigate completely cascading sets\nin random graphs and show that for CLV graphs, the k highest degree vertices form a completely\ncascading set with high probability.\n\n4\n\n\fInformation Cascading in Random Graphs\n\n4\nWe now show that for suf\ufb01ciently dense CLV graphs and \ufb01xed k, the k highest degree vertices form\na completely cascading set with high probability.\n\nTheorem 4.1 For every \u03b3 > 0, there exists a constant c(\u03b3) such that the following holds. Con-\nsider an instance of the CLV model given by weights p1, . . . , pm, q1, . . . , qn with density p and\nmin(pi, qj) \u2265 c(\u03b3)k log n/pk. Then, for G = (U, V, E) generated from the model, the k highest\ndegree vertices of U form a completely cascading set of order k with probability at least 1 \u2212 n\u2212\u03b3.\n\nProof sketch We will show that the highest weight vertices L0 = {u1, . . . , uk} form a completely\ncascading set with high probability; the theorem follows from the above statement and the observa-\ntion that the highest degree vertices of G will almost surely correspond to vertices with large weights\n\nin the model; we omit these details for lack of space. Let w = Pi pi = Pj qj = mnp and m \u2264 n.\nFix a vertex ui /\u2208 L0 and consider an arbitrary vertex vj \u2208 V . Let P i\nj be the indicator variable\nthat is 1 if (ui, vj) \u2208 E and vj is connected to all vertices of L0. Note that vertex ui will be\nj = 1] =\n\nj \u2265 k. Now, Pr[P i\n\ninfected after two rounds by the cascading process starting at L0 ifPj P i\n(piqj/w)Q1\u2264l\u2264k(plqj/w) and\n1 + . . . + P i\n\nE[P i\n\npiqj\n\npi\n\n=\n\nn\n\nn] =\n\nXj=1\n\nw Yl\u2264k\n\nplqj\nw\n\nwk+1 \u00b7 ( Y1\u2264l\u2264k\n\npl) \u00b7\n\nn\n\nXj=1\n\nqk+1\nj\n\n.\n\n(4.1)\n\nObserve that Pi pi = w \u2264 nk + pk(m \u2212 k). Thus, pk \u2265 (w \u2212 nk)/(m \u2212 k). Now, using the\n\npower-mean inequality we get,\n\n1 + qk+1\nqk+1\n\n2 + . . . + qk+1\n\nn \u2265 n\u00b5 q1 + . . . + qn\n\nn\n\n\u00b6k+1\n\n= n \u00b7\u00b3 w\n\nn\u00b4k+1\n\n,\n\nwith equality occurring only if qj = w/n for all j. From Equations (4.1), (4.2) we have\n\nE[P i\n\n1 + . . . + P i\n\nm \u2212 k \u00b6k\nn] \u2265 pi \u00b7\u00b5 w \u2212 nk\nw \u00b6k\n= pi \u00b7\u00b51 \u2212\n\nnk\n\n\u00b7\n\n1\nnk\n\n\u00b7\u00b51 \u2212\n\nk\n\nm\u00b6\u2212k\n\nmn\u00b4k\n\u00b7\u00b3 w\n\n.\n\n(4.2)\n\n(4.3)\n\n1 + . . . + P i\n\nIt is easy to check that under our assumptions, w \u2265 nk2 and m \u2265 k2. Thus, (1 \u2212 nk/w)k \u2265 1/e\nand (1 \u2212 k/m)\u2212k \u2265 1/2e. From Equation (4.3) and our assumption pi \u2265 c(\u03b3)k log n/pk, we get\nn] \u2265 c(\u03b3)k log n/4e2.\nE[P i\nn are independent of each other, using the above lower\nNow, since the indicator variables P i\nbound for the expectation of their sum and Chernoff bounds we get Pr[P i\nn \u2264 k] \u2264\nexp(\u2212\u2126(c(\u03b3) log n)). Thus, for a suf\ufb01ciently large constant c(\u03b3), the probability that the vertex ui\nis uninfected after two rounds Pr[P1 + . . . + Pn \u2264 k] \u2264 1/2m\u03b3+1. By taking a union bound over\nall vertices uk+1, . . . , um, the probability that there is an uninfected vertex in the left partition after\ntwo steps of cascading starting from L0 is at most 1/2m\u03b3. The theorem now follows by observing\nthat if the left partition is completely infected, for a suitably large constant c(\u03b3), all vertices in the\nright will be infected with probability at least 1 \u2212 1/2m\u03b3 as qj \u2265 c(\u03b3)k log n.\u00a4\n\n1 + . . . + P i\n\n1, . . . , P i\n\nCombining the above with Theorem 3.1 we obtain exact matrix-completion for sampling graphs\ndrawn from the CLV model.\n\nTheorem 4.2 Let M be a non-degenerate matrix of rank k. Then, for sampling graphs G\u2126 gen-\nerated from a CLV model satisfying the conditions of Theorem 4.1, ICMC recovers the matrix M\nexactly with high probability.\n\nRemark: The above results show exact-recovery for CLV graphs with densities up to n\u22121/k = o(1).\nAs mentioned in the introduction, the above result is incomparable to the main results of [4, 5], [13].\n\nThe main bottleneck for the density requirements in the proof of Theorem 4.1 is Equation (4.2)\n\nrelating Pj qk+1\n\nj\n\nto (Pj qj)k+1, where we used the power-mean inequality. However, when the\n\n5\n\n\fexpected degrees qj are skewed, say with a power-law distribution, it should be possible to obtain\nmuch better bounds than those of Equation (4.2), hence also improving the density requirements.\nThus, in a sense the Erd\u02ddos-R\u00b4enyi graphs are the worst-case examples for our analysis.\n\nOur empirical simulations also suggest that completely cascading sets are more likely to exist in\nrandom graph models with power-law distributed expected degrees as compared to Erd\u02ddos-R\u00b4enyi\ngraphs. Intuitively, this is because of the following reasons.\n\u2022 In graphs with power-law distributed degrees, the high degree vertices have much higher degrees\nthan the average degree of the graph. So, infecting the highest degree vertices is more likely to\ninfect more vertices in the \ufb01rst step.\n\n\u2022 More importantly, as observed in the seminal work of Kleinberg [14] in most real-world graphs\nthere are a small number of vertices (hubs) that have much higher connectivity than most ver-\ntices. Thus, infecting the hubsis likely to infect a large fraction of vertices.\n\nThus, we expect ICMC to perform better on models that are closer to real-world graphs and have\npower-law distributed degrees. In particular, as strongly supported by experiments (see Figure 3),\nwe hypothesize that ICMC solves exact matrix completion from an almost optimal number of entries\nfor sampling graphs drawn from the preferential attachment model.\nConjecture 4.3 There exists a universal constant C such that for all k \u2265 1, k1, k2 \u2265 Ck the fol-\nlowing holds. For G = (U, V, E) generated from the preferential attachment model with parameters\nm, n, k1, k2, the k highest degree vertices of U form a completely cascading set of order k with high\nprobability.\nIf true, the above combined with Theorem 3.1 would imply the following.\nConjecture 4.4 Let M be a non-degenerate matrix of rank k. Then, for sampling graphs G\u2126 gen-\nerated from a PA model with parameters k1, k2 \u2265 Ck, ICMC recovers the matrix M exactly with\nhigh probability.\nRemark: To solve the matrix completion problem we need to sample at least (m + n)k entries.\nThus, the bounds above are optimal up to a constant factor. Moreover, the bounds above are stronger\nthan those obtainable - even information theoretically - for Erd\u02ddos-R\u00b4enyi graphs, as for Erd\u02ddos-R\u00b4enyi\ngraphs we need to sample \u2126(n log n) entries even for k = 1.\n\n5 Experimental Results\nWe \ufb01rst demonstrate that for many real-world matrix completion datasets, the observed entries are\nfar from being sampled uniformly with the sampling graph having power-law distributed degrees.\nWe then use various random graph models to compare our method against the trace-norm based\nsingular value thresholding algorithm of [3], the spectral matrix completion algorithm (SMC) of\n[13] and the regularized alternating least squares minimization (ALS) heuristic. Finally, we present\nempirical results on the Net\ufb02ix challenge dataset. For comparing with SVT and SMC, we use the\ncode provided by the respective authors; while we use our own implementation for ALS. Below we\nprovide a few implementation details for our algorithm ICMC.\nImplementation Details\nConsider step 2(b) of our algorithm ICMC. Let Lj be the set of vertices in L that have an edge to\nvj, Lk\nj , :) be the sub-matrix of X containing rows corre-\nj . If the underlying matrix is indeed low-rank and there is no noise in the\nsponding to vertices in Lk\nj , can be com-\nobserved entries, then for a newly infected vertex vj, the corresponding row of Y , y\nT\nputed by solving the following linear system of equations: M (Lk\nj , :)yj. To account for\nnoise in measurements, we compute yj by solving the following regularized least squares problem:\n2, where \u03bb is a regularization parameter. Similarly,\nyj = argminy kM (Lj, j)\u2212X(Lj, :)yk2\nwe compute x\nNote that if ICMC fails to infect all the vertices, i.e. L ( U or R ( V , then rows of X and Y\nwill not be computed for vertices in U \\L and V \\R. Let X = [XL, X \u02dcL], where XL is the set of\ncomputed rows of X (for vertices in L) and X \u02dcL denotes the remaining rows of X. Similarly, let\nY = [YR, Y \u02dcR]. We estimate X \u02dcL and Y \u02dcR using an alternating least squares based heuristic that solves\nthe following:\n\ni by solving: xi = argminx kM (i, Ri)T \u2212 Y (Ri, :)xk2\nT\n\nj be any size k subset of Lj, and let X(Lk\n\nj , j) = X(Lk\n\n2.\n2 + \u03bbkxk2\n\n2 +\u03bbkyk2\n\n2\n\n\u02dcR ]\u00b6\u00af\u00af\u00af\u00af\n\n\u00af\u00af\u00af\u00af\n\n6\n\nP\u2126\u00b5M \u2212\u00b7XL\n\nX \u02dcL\u00b8 [Y T\n\nR Y T\n\n+ \u00b5kX \u02dcLk2\n\nF + \u00b5kY \u02dcRk2\nF ,\n\nF\n\nmin\n\nX \u02dcL,Y \u02dcR\u00af\u00af\u00af\u00af\n\n\u00af\u00af\u00af\u00af\n\n\f10\u22125\n\n)\nx\n \n\u2265\nX\n\n(\nr\n\nP\n\n10\u221210\n\n10\u221215\n\n \n\nNetflix Dataset (Movies)\n\n \n\n100\n\nNetflix Dataset (Users)\n\nYahoo Music Dataset (Artists)\n\n \n\n \n\nYahoo Music Dataset (Users)\n\n \n\nEmpirical Distribution\nPoisson Distribution\nPower\u2212law Distribution\n\n)\nx\n \n\u2265\nX\n\n(\nr\n\nP\n\n10\u221210\n\n104\n\nx (Number of Users)\n\n105\n\n \n\n103\n\n(a)\n\n)\nx\n \n\u2265\nX\n\n(\nr\n\n10\u221210\n\nEmpirical distribution\nPoisson distribution\nPower\u2212law distribution\n\nP\n\nx (Number of movies)\n\n104\n\n(b)\n\nEmpirical Distribution\nPoisson Distribution\nPower\u2212law Distribution\n\n \n\n104\n\nx (Number of users)\n\n105\n\n(c)\n\n10\u22125\n\n10\u221210\n\n)\nx\n \n\u2265\nX\n\n(\nr\n\nP\n\n10\u221215\n\n \n\nEmpirical Distribution\nPoisson Distribution\nPower\u2212law Distribution\n\n103\n\nx (Number of artists)\n\n104\n\n(d)\n\nFigure 1: Cumulative degree distribution of (a) movies, (b) users (Net\ufb02ix dataset) and (c) artists,\n(d) users (Yahoo Music dataset). Note that degree distributions in all the four cases closely follow\npower-law distribution and deviate heavily from Poisson-distribution, which is assumed by SVT [3]\nand SMC [13].\n\n1.5\n\nE\nS\nM\nR\n\n1\n\n0.5\n\n0\n\n \n\n500\n\nErdos\u2212Renyi Model\n\n1000\n\nn (Size of Matrix)\n\n1500\n\n \n\nICMC\nALS\nSVT\nSMC\n\n2000\n\n2.5\n\n2\n\nE\nS\nM\nR\n\n1.5\n\n1\n\n0.5\n\n \n\n0\n0\n\nChung\u2212Lu\u2212Vu Model\n\n \n\nICMC\nALS\nSVT\nSMC\n\n500\n\n1000\n\nn (Size of Matrix)\n\n1500\n\n2000\n\nE\nS\nM\nR\n\n6\n\n4\n\n2\n\n0\n\n \n\n500\n\nPA Model\n\nForest\u2212Fire Model\n\n \n\nICMC\nALS\nSVT\nSMC\n\n2000\n\nE\nS\nM\nR\n\n3\n\n2\n\n1\n\n0\n\n \n\n500\n\n \n\nICMC\nALS\nSVT\nSMC\n\n2000\n\n1000\n\nn (Size of Matrix)\n\n1500\n\n1000\n\nn (Size of Matrix)\n\n1500\n\nFigure 2: Results on synthetic datasets for \ufb01xed sampling density with sampling graph coming from\ndifferent Graph Models: (a) Erd\u02ddos-R\u00b4enyi model, (b) Chung-Lu-Vu model, (c) Preferential attach-\nment model, and (d) Forest-\ufb01re model. Note that for the three power-law distribution generating\nmodels our method (ICMC) achieves considerably lower RMSE than the existing method.\n\n(a) Erd\u02ddos-R\u00b4enyi Graphs\n\nn/Method\n500\n1000\n1500\n2000\n(c) Preferential Attachment Graphs\n\nSVT ALS\n1.09\n8.88\n17.07\n2.39\n4.85\n38.81\n59.88\n7.20\n\nSMC\n45.51\n93.85\n214.65\n343.76\n\nICMC\n1.28\n3.30\n6.28\n9.89\n\nn/Method\n500\n1000\n1500\n2000\n\n(b) Chung-Lu-Vu Graphs\nSVT ALS\n1.24\n14.69\n17.55\n2.24\n3.89\n30.99\n46.69\n5.67\n\nSMC\n35.32\n144.19\n443.48\n836.99\n(d)Forest-\ufb01re Graphs\n\nn/Method\n500\n1000\n1500\n2000\n\nSMC\n15.05\n67.96\n178.35\n417.54\n\nSVT\n14.40\n16.49\n24.48\n32.06\n\nALS\n3.97\n5.06\n9.83\n15.07\n\nICMC\n1.94\n2.01\n3.65\n7.46\n\nn/Method\n500\n1000\n1500\n2000\n\nSMC\n22.63\n85.26\n186.81\n350.98\n\nSVT ALS\n5.53\n0.57\n1.75\n11.32\n3.30\n21.39\n27.37\n4.84\n\nICMC\n0.49\n2.02\n3.91\n5.50\n\nICMC\n0.39\n1.23\n2.99\n5.06\n\nTable 1: Time required (in seconds) by various methods on synthetic datasets for \ufb01xed sampling\ndensity with sampling graph coming from different Graph Models: (a) Erd\u02ddos-R\u00b4enyi model, (b)\nChung-Lu-Vu model, (c) Preferential attachment model, and (d) Forest-\ufb01re model. Note that our\nmethod (ICMC) is signi\ufb01cantly faster than SVT and SMC, and has similar run-time to that of ALS.\n\nwhere \u00b5 \u2265 0 is the regularization parameter.\nSampling distribution in Net\ufb02ix and Yahoo Music Datasets\nThe Net\ufb02ix challenge dataset contains the incomplete user-movie ratings matrix while the Yahoo\nMusic dataset contains the incomplete user-artist ratings matrix. For both datasets we form the cor-\nresponding bipartite sampling graphs and plot the left (users) and right (movies/artists) cumulative\ndegree distributions of the bipartite sampling graphs.\n\nFigure 1 shows the cumulative degree distributions of the bipartite sampling graphs, the best power-\nlaw \ufb01t computed using the code provided by Clauset et.al [7] and the best Poisson distribution \ufb01t.\nThe \ufb01gure clearly shows that the sampling graphs for the Net\ufb02ix and Yahoo Music datasets are far\nfrom regular as assumed in [4],[5],[13] and have power-law distributed degrees.\nExperiments using Random Graph Models\nTo compare various methods, we \ufb01rst generate random low-rank matrices X \u2208 Rn\u00d7n for varying n,\nand sample from the generated matrices using Erd\u02ddos-R\u00b4enyi, CLV, PA and forest-\ufb01re random graph\nmodels. We omit the results for the af\ufb01liation networks model from this paper due to lack of space;\nwe observed similar trends on the af\ufb01liation networks model.\n\n7\n\n\f100\n\n10\u22122\n\nl\n\ns\nn\nm\nu\no\nC\n/\ns\nw\no\nR\nd\ne\n\n \n\nErdos\u2212Renyi\nChung\u2212Lu\nPref. Attachment\n\nt\nc\ne\nn\n\nf\n\nI\n\n \n\n10\u22122\n\n10\u22121\n\np (Sampling Density)\n\nSampling Density Threshold\n\nPreferential Attachment Model (m vs k)\n\n \n\n300\n\n)\ns\ne\ng\nd\ne\n\n \nf\n\no\n\n \nr\ne\nb\nm\nu\nN\n\n(\n \n\nm\n\n200\n\n100\n\n100\n\n \n\n0\n10\n\n \n\nCOMBMC\nm=Ck+C0\n\n20\n\nk (Rank of the Matrix)\n\n30\n\n40\n\n50\n\nk\n5\n10\n20\n25\n30\n\nFraction of infected RMSE\n\nrows & columns\n\n0.98\n0.95\n0.87\n.84\n\n0.46 \u00d7 10\u22125\n\n0.9603\n0.9544\n0.9437\n0.9416\n0.9602\n\nFigure 3: Left: Fraction of infected nodes as edge density increases. Note the existence of a clear\nthreshold. The threshold is quite small for CLV and PA suggesting good performance of ICMC for\nthese models. Middle: Threshold for parameters k1, k2 (the number of edges per node) in PA as\nk increases. The threshold varies linearly with k supporting Conjecture 4.3. Right: Fraction of\ninfected rows and columns using ICMC for the Net\ufb02ix challenge dataset.\n\nFor each random graph model we compare the relative mean square error (RMSE) on the unknown\nentries achieved by our method ICMC against several existing methods. We also compare the total\ntime taken by each of the methods. All results represent the average over 20 runs.\nFigure 2 compares the RMSE achieved by ICMC to that of SVT, SMC and ALS when rank k is \ufb01xed\nto be 10, sampling density p = 0.1, and the sampling graphs are generated from the four random\ngraph models. Note that for the more-realistic CLV, PA, forest-\ufb01re three models ICMC outperforms\nboth SVT and SMC signi\ufb01cantly and performs noticeably better than ALS. Table 1 compares the\ncomputational time taken by each of the methods. The table shows that for all three models, ICMC\nis faster than SVT and SMC by an order of magnitude and is also competitive to ALS. Note that\nthe performance of our method for Erdos-Renyi graphs (Figure 2 (a)) is poor, with other methods\nachieving low RMSE. This is expected as the Erdos-Renyi graphs are in a sense the worst-case\nexamples for ICMC as explained in Section 4.\nThreshold for Complete Cascading\nHere we investigate the threshold for complete cascading in the random graph models. Besides\nbeing interesting on its own, the existence of completely cascading sets is closely tied to the success\nof ICMC by Theorem 3.1. Figure 3 shows the fraction of vertices infected by the cascading process\nstarting from the k highest degree vertices for graphs generated from the random graph models as\nthe edge density increases.\nThe left plot of Figure 3 shows the existence of a clear threshold for the density p, beyond which\nthe fraction of infected vertices is almost surely one. Note that the threshold is quite small for the\nCLV, PA and forest-\ufb01re models, suggesting good performance of ICMC on these models. As was\nexplained in Section 4, the threshold is bigger for the Erd\u02ddos-R\u00b4enyi graph model.\n\nThe right plot of Figure 3 shows the threshold value (the minimum value above which the infected\nfraction is almost surely one) for k1, k2 as a function of k in the PA model. The plot shows that the\nthreshold is of the form Ck for a universal constant C, strongly supporting Conjectures 4.3, 4.4.\nNet\ufb02ix Challenge Dataset\nFinally, we evaluate our method on the Net\ufb02ix Challenge dataset which contains an incomplete\nmatrix with about 100 million ratings given by 480,189 users for 17,770 movies. The rightmost\ntable in Figure 3 shows the fraction of rows and columns infected by ICMC on the dataset for\nseveral values of the rank parameter k. Note that even for a reasonably high rank of 25, ICMC\ninfects a high percentage (84%) of rows and columns. Also, for rank 30 the fraction of infected\nrows and columns drops to almost zero, suggesting that the sampling density of the matrix is below\nthe sampling threshold for rank 30.\nFor rank k = 20, the RMSE incurred over the probe set (provided by Net\ufb02ix) is 0.9437 which is\ncomparable to the RMSE=0.9404 achieved by the regularized Alternating Least Squares method.\nMore importantly, the time required by our method is 1.59 \u00d7 103 seconds compared to 6.15 \u00d7 104\nseconds required by ALS. We remark that noise (or higher rank of the underlying matrix) can offset\nour method leading to somewhat inferior results. In such a case, our method can be used for a good\ninitialization of the ALS method and other state-of-the-art collaborative \ufb01ltering methods to achieve\nbetter RMSE.\n\n8\n\n\fReferences\n[1] Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286:509,\n\n[2] Matthew Brand. Fast online svd revisions for lightweight recommender systems. In SDM, 2003.\n[3] Jian-Feng Cai, Emmanuel J. Candes, and Zuowei Shen. A singular value thresholding algorithm for\n\n[4] Emmanuel J. Cand`es and Benjamin Recht. Exact matrix completion via convex optimization. CoRR,\n\n[5] Emmanuel J. Cand`es and Terence Tao. The power of convex relaxation: Near-optimal matrix completion.\n\n[6] Fan R. K. Chung, Linyuan Lu, and Van H. Vu. The spectra of random graphs with given expected degrees.\n\n[7] A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-law distributions in empirical data. SIAM Review,\n\n[8] Uriel Feige and Eran Ofek. Spectral techniques applied to sparse random graphs. Random Struct. Algo-\n\n[9] Joel Friedman, Jeff Kahn, and Endre Szemer\u00b4edi. On the second eigenvalue in random regular graphs. In\n\n1999.\n\nmatrix completion, 2008.\n\nabs/0805.4471, 2008.\n\nCoRR, abs/0903.1476, 2009.\n\nInternet Mathematics, 1(3), 2003.\n\npage to appear, 2009.\n\nrithms, 27(2):251\u2013275, 2005.\n\nSTOC, pages 587\u2013598, 1989.\n\nINFOCOM, 2003.\n\n[10] Christos Gkantsidis, Milena Mihail, and Ellen W. Zegura. Spectral analysis of internet topologies. In\n\n[11] David Kempe, Jon M. Kleinberg, and \u00b4Eva Tardos. Maximizing the spread of in\ufb02uence through a social\n\n[13] Raghunandan H. Keshavan, Sewoong Oh, and Andrea Montanari. Matrix completion from a few entries.\n\n[14] Jon M. Kleinberg. Hubs, authorities, and communities. ACM Comput. Surv., 31(4es):5, 1999.\n[15] Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative \ufb01ltering model. In\n\nCoRR, abs/0901.3150, 2009.\n\nKDD, pages 426\u2013434, 2008.\n\n[16] Silvio Lattanazi and D. Sivakumar. Af\ufb01liation networks. In STOC, 2009.\n[17] Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. Graph evolution: Densi\ufb01cation and shrinking\n\n[18] Yehuda Koren M. Bell. Scalable collaborative \ufb01ltering with jointly derived neighborhood interpolation\n\n[19] Milena Mihail and Christos H. Papadimitriou. On the eigenvalue power law. In RANDOM, pages 254\u2013\n\n[20] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimum-rank solutions of linear\n\nmatrix equations via nuclear norm minimization, 2007.\n\ndiameters. TKDD, 1(1), 2007.\n\nweights. In ICDM, pages 43\u201352, 2007.\n\n262, 2002.\n\nnetwork. In KDD, pages 137\u2013146, 2003.\n\n[12] David Kempe, Jon M. Kleinberg, and \u00b4Eva Tardos.\n\nnetworks. In ICALP, pages 1127\u20131138, 2005.\n\nIn\ufb02uential nodes in a diffusion model for social\n\n9\n\n\f", "award": [], "sourceid": 864, "authors": [{"given_name": "Raghu", "family_name": "Meka", "institution": null}, {"given_name": "Prateek", "family_name": "Jain", "institution": null}, {"given_name": "Inderjit", "family_name": "Dhillon", "institution": null}]}