{"title": "Thy Friend is My Friend: Iterative Collaborative Filtering for Sparse Matrix Estimation", "book": "Advances in Neural Information Processing Systems", "page_first": 4715, "page_last": 4726, "abstract": "The sparse matrix estimation problem consists of estimating the distribution of an $n\\times n$ matrix $Y$, from a sparsely observed single instance of this matrix where the entries of $Y$ are independent random variables. This captures a wide array of problems; special instances include matrix completion in the context of recommendation systems, graphon estimation, and community detection in (mixed membership) stochastic block models. Inspired by classical collaborative filtering for recommendation systems, we propose a novel iterative, collaborative filtering-style algorithm for matrix estimation in this generic setting. We show that the mean squared error (MSE) of our estimator converges to $0$ at the rate of $O(d^2 (pn)^{-2/5})$ as long as $\\omega(d^5 n)$ random entries from a total of $n^2$ entries of $Y$ are observed (uniformly sampled), $\\E[Y]$ has rank $d$, and the entries of $Y$ have bounded support. The maximum squared error across all entries converges to $0$ with high probability as long as we observe a little more, $\\Omega(d^5 n \\ln^5(n))$ entries. Our results are the best known sample complexity results in this generality.", "full_text": "Thy Friend is My Friend: Iterative Collaborative\n\nFiltering for Sparse Matrix Estimation\n\nChristian Borgs\nborgs@microsoft.com\n\nJennifer Chayes\n\njchayes@microsoft.com\nMicrosoft Research New England\n\nOne Memorial Drive, Cambridge MA, 02142\n\nChristina E. Lee\ncelee@mit.edu\n\nDevavrat Shah\n\ndevavrat@mit.edu\n\nMassachusetts Institute of Technology\n\n77 Massachusetts Ave, Cambridge, MA 02139\n\nAbstract\n\nThe sparse matrix estimation problem consists of estimating the distribution of\nan n \u00d7 n matrix Y , from a sparsely observed single instance of this matrix where\nthe entries of Y are independent random variables. This captures a wide array\nof problems; special instances include matrix completion in the context of rec-\nommendation systems, graphon estimation, and community detection in (mixed\nmembership) stochastic block models. Inspired by classical collaborative \ufb01ltering\nfor recommendation systems, we propose a novel iterative, collaborative \ufb01ltering-\nstyle algorithm for matrix estimation in this generic setting. We show that the mean\nsquared error (MSE) of our estimator converges to 0 at the rate of O(d2(pn)\u22122/5)\nas long as \u03c9(d5n) random entries from a total of n2 entries of Y are observed\n(uniformly sampled), E[Y ] has rank d, and the entries of Y have bounded support.\nThe maximum squared error across all entries converges to 0 with high probability\nas long as we observe a little more, \u2126(d5n ln5(n)) entries. Our results are the best\nknown sample complexity results in this generality.\n\n1\n\nIntroduction\n\nIn this work, we propose and analyze an iterative similarity-based collaborative \ufb01ltering algorithm\nfor the sparse matrix completion problem with noisily observed entries. As a prototype for such a\nproblem, consider a noisy observation of a social network where observed interactions are signals\nof true underlying connections. We might want to predict the probability that two users would\nchoose to connect if recommended by the platform, e.g. LinkedIn. As a second example, consider\na recommendation system where we observe movie ratings provided by users, and we may want\nto predict the probability distribution over ratings for speci\ufb01c movie-user pairs. The classical\ncollaborative \ufb01ltering approach is to compute similarities between pairs of users by comparing their\ncommonly rated movies. For a social network, similarities between users would be computed by\ncomparing their sets of friends. We will be particularly interested in the very sparse case where most\npairs of users have no common friends, or most pairs of users have no commonly rated movies; thus\nthere is insuf\ufb01cient data to compute the traditional similarity metrics.\nTo overcome this limitation, we propose a novel algorithm which computes similarities iteratively,\nincorporating information within a larger radius neighborhood. Whereas traditional collaborative\n\ufb01ltering learns the preferences of a user through the ratings of her/his \u201cfriends\u201d, i.e. users who share\nsimilar ratings on commonly rated movies, our algorithm learns about a user through the ratings of\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fthe friends of her/his friends, i.e. users who may be connected through an indirect path in the data.\nFor a social network, this intuition translates to computing similarities of two users by comparing\nthe boundary of larger radius neighborhoods of their connections in the network. While an actual\nimplementation of our algorithm will bene\ufb01t from modi\ufb01cations to make it practical, we believe\nthat our approach is very practical; indeed, we plan to implement it in a corporate setting. Like all\nsuch nearest-neighbor style algorithms, our algorithm can be accelerated and scaled to large datasets\nin practice by using a parallel implementation via an approximate nearest neighbor data structure.\nIn this paper, however, our goal is to describe the basic setting and concept of the algorithm, and\nprovide clear mathematical foundation and analysis. The theoretical results indicate that this method\nachieves consistency (i.e. guaranteed convergence to the correct solution) for very sparse datasets for\na reasonably general Latent Variable Model with bounded entries.\nThe problems discussed above can be mathematically formulated as a matrix estimation problem,\nwhere we observe a sparse subset of entries in an m \u00d7 n random matrix Y , and we wish to complete\nor de-noise the matrix by estimating the probability distribution of Yij for all (i, j). Suppose that Yij\nis categorical, taking values in [k] according to some unknown distribution. The task of estimating the\ndistribution of Yij can be reduced to k \u2212 1 smaller tasks of estimating the expectation of a binary data\nij] = P(Yij = t). If the matrix that we would like to\nmatrix, e.g. Y t where Y t\nlearn is asymmetric, we can transform it to an equivalent symmetric model by de\ufb01ning a new data\n\n(cid:3). Therefore, for the remainder of the paper, we will assume a n \u00d7 n symmetric\n\nij = I(Yij = t) and E[Y t\n\nmatrix Y (cid:48) =(cid:2) 0 Y\n\nY T 0\n\nmatrix which takes values in [0, 1] (real-valued or binary), but as argued above, our results apply\nmore broadly to categorical-valued asymmetric matrices. We assume that the data is generated from\na Latent Variable Model in which latent variables \u03b81, . . . , \u03b8n are sampled independently from U [0, 1],\nand the distribution of Yij is such that E[Yij|\u03b8i, \u03b8j] = f (\u03b8i, \u03b8j) \u2261 Fij for some latent function f. Our\ngoal is to estimate the matrix F . It is worth remarking that the Latent Variable Model is a canonical\nrepresentation for exchangeable arrays as shown by Aldous and Hoover [5, 25, 7].\nWe present a novel algorithm for estimating F = [Fij] from a sparsely sampled dataset {Yij}(i,j)\u2208E\nwhere E \u2282 [n] \u00d7 [n] is generated by assuming each entry is observed independently with probability\np. We require that the latent function f when regarded as an integral operator has \ufb01nite spectrum with\nrank d. We prove that the mean squared error (MSE) of our estimates converges to zero at a rate of\nO(d2(pn)\u22122/5) as long as the sparsity p = \u03c9(d5n\u22121) (i.e. \u03c9(d5n) total observations). In addition,\nwith high probability, the maximum squared error converges to zero at a rate of O(d2(pn)\u22122/5) as\nlong as the sparsity p = \u2126(d5n\u22121 ln5(n)). Our analysis applies to a generic noise setting as long as\nYij has bounded support.\nOur work takes inspiration from [1, 2, 3], which estimates clusters of the stochastic block model by\ncomputing distances from local neighborhoods around vertices. We improve upon their analysis to\nprovide MSE bounds for the general latent variable model with \ufb01nite spectrum, which includes a\nlarger class of generative models such as mixed membership stochastic block models, while they\nconsider the stochastic block model with non-overlapping communities. We show that our results\nhold even when the rank d increases with n, as long as d = o((pn)1/5). As compared to spectral\nmethods such as [28, 39, 20, 19, 18], our analysis handles the general bounded noise model and holds\nfor sparser regimes, only requiring p = \u03c9(n\u22121).\n\nRelated work. The matrix estimation problem introduced above includes as speci\ufb01c cases problems\nfrom different areas of literature: matrix completion popularized in the context of recommendation\nsystems, graphon estimation arising from the asymptotic theory of graphs, and community detection\nusing the stochastic block model or its generalization known as the mixed membership stochastic\nblock model. The key representative results for each of these are mentioned in Table 1. We discuss\nthe scaling of the sample complexity with respect to d (model complexity, usually rank) and n\nfor polynomial time algorithms, including results for both mean squared error convergence, exact\nrecovery in the noiseless setting, and convergence with high probability in the noisy setting. As\ncan be seen from Table 1, our result provides the best sample complexity with respect to n for the\ngeneral matrix estimation problem with bounded entries noise model and rank d, as the other models\neither require extra log(n) factors, or impose additional requirements on the noise model or the\nexpected matrix. Similarly, ours is the best known sample complexity for high probability max-error\nconvergence to 0 for the general rank d bounded entries setting, as other results either assume block\nconstant or noiseless.\n\n2\n\n\fTable 1: Sample Complexity of Related Literature grouped in sections according to the following\nareas \u2014matrix completion, 1-bit matrix completion, stochastic block model, mixed membership\nstochastic block model, graphon estimation, and our results\n\nSample Complexity\n\u03c9(dn)\n\u2126(dn max(log n, d)), \u03c9(dn)\n\u03c9(dn log n)\n\u2126(n max(d, log2 n))\n\u03c9(dn log6 n)\n\u2126(n3/2)\n\u2126(dn log2 n max(d, log4 n))\n\u2126(dn max(d, log n))\n\u2126(dn log2 n)\n\u2126(n max(d log n, log2 n, d2))\n\u2126(n max(d, log n)), \u03c9(dn)\n\u03c9(n)\u2217\n\u2126(n log n)\u2217\n\u2126(n log n)\u2217\n\u2126(d2n polylog n)\n\u2126(d2n)\n\u2126(n2)\n\u2126(n2)\n\u03c9(n)\n\u03c9(d5n)\n\nPaper\n[27]\n[28]\n[37]\n[19]\n[18]\n[32]\n[17]\n[27]\n[39]\n[19]\n[20]\n[1, 3]\n[1]\n[43]\n[6]\n[40]\n[4]\n[44]\n[10]\nthis\nwork \u2126(d5n log5 n)\n\nData/Noise\nnoiseless\niid Gaussian\niid Gaussian\niid Gaussian\nindep bounded\niid bounded\nnoiseless\nnoiseless\nnoiseless\nbinary entries\nbinary entries\nbinary entries\nbinary entries\nbinary entries\nbinary entries\nbinary entries\nbinary entries\nbinary entries\nbinary entries\nindep bounded\nindep bounded\n\n*result does not indicate dependence on d.\n\nGuarantee\nExpected matrix\nMSE\u2192 0\nrank d\nMSE\u2192 0\nrank d\nMSE\u2192 0\nrank d\nMSE\u2192 0\nrank d\nMSE\u2192 0\nrank d\nMSE\u2192 0\nLipschitz\nexact recovery\nrank d\nexact recovery\nrank d\nexact recovery\nrank d\nMSE\u2192 0\nrank d\nMSE\u2192 0\nrank d\npartial recovery\nd blocks\nexact recovery\nd blocks (SBM)\nMSE\u2192 0\nrank d\nwhp error \u2192 0\nrank d\ndetection\nrank d\nmonotone row sum MSE\u2192 0\npiecewise Lipschitz MSE\u2192 0\nmonotone row sum MSE\u2192 0\nMSE\u2192 0\nrank d, Lipschitz\nwhp error \u2192 0\nrank d, Lipschitz\n\nIt is worth comparing our results with the known lower bounds on the sample complexity. For the\nspecial case of matrix completion with an additive noise model, i.e. Yij = E[Yij] + \u03b7ij and \u03b7ij are\ni.i.d. zero mean, [16, 20] showed that \u03c9(dn) samples are needed for a consistent estimator, i.e. MSE\nconvergence to 0, and [17] showed that dn log n samples are needed for exact recovery. There is a\nconjectured computational lower bound for the mixed membership stochastic block model of d2n\neven for detection, which is weaker than MSE going to 0. Recently, [40] showed a partial result that\nthis computational lower bound holds for algorithms that rely on \ufb01tting low-degree polynomials to\nthe observed data. Given that these lower bounds apply to special cases of our setting, it seems that\nour result is optimal in terms of its dependence on n for MSE convergence as well as high probability\n(near) exact recovery.\nNext we provide a brief overview of prior works reported in the Tables 1. In the context of matrix\ncompletion, there has been much progress under the low-rank assumption. Most theoretically founded\nmethods are based on spectral decompositions or minimizing a loss function with respect to spectral\nconstraints [27, 28, 15, 17, 39, 37, 20, 19, 18]. A work that is closely related to ours is by [32]. It\nproves that a similarity based collaborative \ufb01ltering-style algorithm provides a consistent estimator\nfor matrix completion under the generic model when the latent function is Lipschitz, not just low\nrank; however, it requires \u02dcO(n3/2) samples. In a sense, ours can be viewed as an algorithmic\ngeneralization of [32] that handles the sparse sampling regime and a generic noise model. Most of\nthe results in matrix completion require additive noise models, which do not extend to setting when\nthe observations are binary or quantized. The USVT estimator is able to handle general bounded\nnoise, although it requires a few log factors more in its sample complexity [18]. Our work removes\nthe extra log factors while still allowing for general bounded noise.\nThere is also a signi\ufb01cant amount of literature which looks at the estimation problem when the data\nmatrix is binary, also known as 1-bit matrix completion, stochastic block model (SBM) parameter\nestimation, or graphon estimation. The latter two terms are found within the context of community\n\n3\n\n\fdetection and network analysis, as the binary data matrix can alternatively be interpreted as the\nadjacency matrix of a graph \u2013 which are symmetric, by de\ufb01nition. Under the SBM, each vertex is\nassociated to one of d community types, and the probability of an edge is a function of the community\ntypes of both endpoints. Estimating the n \u00d7 n parameter matrix becomes an instance of matrix\nestimation. In SBM, the expected matrix is at most rank d due to its block structure. Precise thresholds\nfor cluster detection (better than random) and estimation have been established by [1, 2, 3]. Our\nwork, both algorithmically and technically, draws insight from this sequence of works, extending\nthe analysis to a broader class of generative models through the design of an iterative algorithm, and\nimproving the technical results with precise MSE bounds.\nThe mixed membership stochastic block model (MMSBM) allows each vertex to be associated to\na length d vector, which represents its weighted membership in each of the d communities. The\nprobability of an edge is a function of the weighted community memberships vectors of both endpoints,\nresulting in an expected matrix with rank at most d. Recent work by [40] provides an algorithm for\nweak detection for MMSBM with sample complexity d2n, when the community membership vectors\nare sparse and evenly weighted. They provide partial results to support a conjecture that d2n is a\ncomputational lower bound, separated by a gap of d from the information theoretic lower bound of\ndn. This gap was \ufb01rst shown in the simpler context of the stochastic block model [21]. [43] proposed\na spectral clustering method for inferring the edge label distribution for a network sampled from a\ngeneralized stochastic block model. When the expected function has a \ufb01nite spectrum decomposition,\ni.e. low rank, then they provide a consistent estimator for the sparse data regime, with \u2126(n log n)\nsamples.\nGraphon estimation extends SBM and MMSBM to the generic Latent Variable Model where the\nprobability of an edge can be any measurable function f of real-valued types (or latent variables)\nassociated to each endpoint. Graphons were \ufb01rst de\ufb01ned as the limiting object of a sequence of large\ndense graphs [14, 22, 34], with recent work extending the theory to sparse graphs [12, 13, 11, 41].\nIn the graphon estimation problem, we would like to estimate the function f given an instance of\na graph generated from the graphon associated to f. [23, 29] provide minimax optimal rates for\ngraphon estimation; however a majority of the proposed estimators are not computable in polynomial\ntime, since they require optimizing over an exponentially large space (e.g. least squares or maximum\nlikelihood) [42, 10, 9, 23, 29]. [10] provided a polynomial time method based on degree sorting in\nthe special case when the expected degree function is monotonic. To our knowledge, existing positive\nresults for sparse graphon estimation require either strong monotonicity assumptions [10], or rank\nconstraints as assumed in the SBM, the 1-bit matrix completion, and in this work.\nWe call special attention to the similarity based methods which are able to bypass the rank constraints,\nrelying instead on smoothness properties of the latent function f (e.g. Lipschitz) [44, 32]. They\nhinge upon computing similarities between rows or columns by comparing commonly observed\nentries. Similarity based methods, also known in the literature as collaborative \ufb01ltering, have been\nsuccessfully employed across many large scale industry applications (Net\ufb02ix, Amazon, Youtube) due\nto its simplicity and scalability [24, 33, 30, 38]; however the theoretical results have been relatively\nsparse. These recent results suggest that the practical success of these methods across a variety of\napplications may be due to its ability to capture local structure. A key limitation of this approach is\nthat it requires a dense dataset with suf\ufb01cient entries in order to compute similarity metrics, requiring\nthat each pair of rows or columns has a growing number of overlapped observed entries, which does\nnot hold when p = o(n\u22121/2). This work overcomes this limitation in an intuitive and simple way;\nrather than only considering directly overlapped entries, we consider longer \u201cpaths\u201d of data associated\nto each row, expanding the set of associated datapoints until there is suf\ufb01cient overlap. Although we\nmay initially be concerned that this would introduce bias and variance due to the sparse sampling,\nour analysis shows that in fact the estimate does converge to the true solution.\nThe idea of comparing vertices by looking at larger radius neighborhoods was introduced in [1], and\nhas connections to belief propagation [21, 3] and the non-backtracking operator [31, 26, 36, 35, 8].\nThe non-backtracking operator was introduced to overcome the issue of sparsity. For sparse graphs,\nvertices with high-degree dominate the spectrum, such that the informative components of the\nspectrum get hidden behind the high degree vertices. The non-backtracking operator avoids paths\nthat immediately return to the previously visited vertex in a similar manner as belief propagation,\nand its spectrum has been shown to be more well-behaved, perhaps adjusting for the high degree\nvertices, which get visited very often by paths in the graph. In our algorithm, the neighborhood paths\nare de\ufb01ned by \ufb01rst selecting a rooted tree at each vertex, thus enforcing that each vertex along a path\n\n4\n\n\fin the tree is unique. This is important in our analysis, as it guarantees that the distribution of vertices\nat the boundary of each subsequent depth of the neighborhood is unbiased, since the sampled vertices\nare freshly visited.\n\n2 Model\n\nWe shall use graph and matrix notations in an interchangeable manner. For each pair of vertices (i.e.\nrow or column indices) u, v \u2208 [n], let Yuv \u2208 [0, 1] denote its random realization. Let E denote the\nedges. If (u, v) \u2208 E, Yuv is observed; otherwise it is unknown.\n\u2022 Each vertex u \u2208 [n] is associated to a latent variable \u03b8u \u223c U [0, 1] sampled i.i.d.\n\u2022 For each (u, v) \u2208 [n] \u00d7 [n], Yuv = Yvu \u2208 [0, 1] is a bounded random variable. Conditioned on\n\n{\u03b8i}i\u2208[n], the random variables {Yuv}1\u2264u 0. Let Q denote the d \u00d7 n matrix where Q(k, u) = qk(\u03b8u). Since\nQ is a random matrix depending on the sampled \u03b8, it is not guaranteed to be an orthonormal matrix\n(even though qk are orthonormal functions). By de\ufb01nition, it follows that F = QT \u039bQ. Let d(cid:48) be the\nnumber of distinct valued eigenvalues. Let \u02dc\u039b denote be the d \u00d7 d(cid:48) matrix where \u02dc\u039b(a, b) = \u03bba\u22121\n\n.\n\nb\n\nDiscussing Assumptions. The latent variable model imposes a natural and mild assumption, as\nAldous and Hoover proved that if the network is exchangeable, i.e. the distribution over edges is\ninvariant under permutations of vertex labels, then the network can be equivalently represented by a\nlatent variable model [5, 25, 7]. Exchangeability is reasonable for anonymized datasets for which\nthe identity of entities can be easily renamed. Our model additionally requires that the function is\nL-Lipschitz and has \ufb01nite spectrum when regarded as an integral operator, i.e. F is low rank; this\nincludes interesting scenarios such as the mixed membership stochastic block model and \ufb01nite degree\npolynomials. We can also relax the condition to piecewise Lipschitz, as we only need to ensure that\nfor every vertex u there are suf\ufb01ciently many vertices v which are similar in function value to u. We\nassume observations are sampled independently with probability p; however, we discuss a possible\nsolution for dealing with non-uniform sampling in Section 5.\n\n3 Algorithm\n\nThe algorithm that we propose uses the concept of local approximation, \ufb01rst determining which\ndatapoints are similar in value, and then computing neighborhood averages for the \ufb01nal estimate. All\nsimilarity-based collaborative \ufb01ltering methods have the following basic format:\n\nwhere Euv := {(a, b) \u2208 E s.t. dist(u, a) < \u03b7n, dist(v, b) < \u03b7n}.\n\n5\n\n1. Compute distances between pairs of vertices, e.g.,\n\ndist(u, a) \u2248(cid:82) 1\n\n2. Form estimate by averaging over \u201cnearby\u201d datapoints,\n\n0 (f (\u03b8u, t) \u2212 f (\u03b8a, t))2dt.\n(cid:80)\n\n(a,b)\u2208Euv\n\nMab,\n\n\u02c6Fuv = 1|Euv|\n\n(1)\n\n(2)\n\n\fThe choice of \u03b7n = \u0398(d(c1pn)\u22122/5) will be small enough to drive the bias to zero, ensuring the\nincluded datapoints are close in value, yet large enough to reduce the variance, ensuring |Euv|\ndiverges.\n\n(cid:80)\n\nInutition. Various similarity-based algorithms differ in the distance computation (Step 1). For\ndense datasets, i.e. p = \u03c9(n\u22121/2), previous works have proposed and analyzed algorithms which\napproximate the L2 distance of (1) by using variants of the \ufb01nite sample approximation,\n\n(Fuy \u2212 Fay)2,\n\ny\u2208Xua\n\ndist(u, a) = 1|Xua|\n\n(3)\nwhere y \u2208 Xua iff (u, y) \u2208 E and (a, y) \u2208 E [4, 44, 32]. For sparse datasets, with high probability,\nXua = \u2205 for almost all pairs (u, a), such that this distance cannot be computed.\nIn this paper we are interested in the sparse setting when p is signi\ufb01cantly smaller than n\u22121/2, down\nto the lowest threshold of p = \u03c9(n\u22121). If we visualize the data via a graph with edge set E, then (3)\ncorresponds to comparing common neighbors of vertices u and a. A natural extension when u and\na have no common neighbors, is to instead compare the r-hop neighbors of u and a, i.e. vertices y\nwhich are at distance exactly r from both u and a. We compare the product of weights along edges in\nthe path from u to y and a to y respectively, which in expectation approximates\n\n[0,1]r\u22121 f (\u03b8u, t1)((cid:81)r\u22122\n\ns=1 f (ts, ts+1))f (tr\u22121, \u03b8y)d(cid:126)t =(cid:80)\n\nkqk(\u03b8u)qk(\u03b8y) = eT\n\nu QT \u039brQey.\n\nk \u03bbr\n\n(cid:82)\n\n(4)\n\nWe choose a large enough r such that there are suf\ufb01ciently many \u201ccommon\u201d vertices y which have\npaths to both u and a, guaranteeing that our distance can be computed from a sparse dataset.\n\nAlgorithm Details. We present and discuss details of each step of the algorithm, which primarily\ninvolves computing pairwise distances (or similarities) between vertices.\n\nStep 1: Sample Splitting. We partition the datapoints into disjoint sets, which are used in different\nsteps of the computation to minimize correlation across steps for the analysis. Each edge in E is\nindependently placed into E1, E2, or E3, with probabilities c1, c2, and 1 \u2212 c1 \u2212 c2 respectively.\nMatrices M1, M2, and M3 contain information from the subset of the data in M associated to E1,E2,\nand E3 respectively. M1 is used to de\ufb01ne local neighborhoods of each vertex, M2 is used to compute\nsimilarities of these neighborhoods, and M3 is used to average over datapoints for the \ufb01nal estimate\nin (2).\n\nStep 2: Expanding the Neighborhood. We \ufb01rst expand local neighborhoods of radius r around each\nvertex. Let Su,s denote the set of vertices which are at distance s from vertex u in the graph de\ufb01ned\nby edge set E1. Speci\ufb01cally, i \u2208 Su,s if the shortest path in G1 = ([n],E1) from u to i has a length\nof s. Let Tu denote a breadth-\ufb01rst tree in G1 rooted at vertex u. The breadth-\ufb01rst property ensures\nthat the length of the path from u to i within Tu is equal to the length of the shortest path from u\nto i in G1. If there is more than one valid breadth-\ufb01rst tree rooted at u, choose one uniformly at\nrandom. Let Nu,r \u2208 [0, 1]n denote the following vector with support on the boundary of the r-radius\nneighborhood of vertex u (we also call Nu,r the neighborhood boundary):\nif i \u2208 Su,r,\nif i /\u2208 Su,r,\n\n(cid:40)(cid:81)\n\n(u,i) M1(a, b)\n\n(a,b)\u2208pathTu\n\nNu,r(i) =\n\n0\n\n(u, i) denotes the set of edges along the path from u to i in the tree Tu. The sparsity of\nwhere pathTu\nNu,r(i) is equal to Su,r, and the value of the coordinate Nu,r(i) is equal to the product of weights\nalong the path from u to i. Let \u02dcNu,r denote the normalized neighborhood boundary such that\n\u02dcNu,r = Nu,r/|Su,r|. We will choose radius r to be r = 6 ln(1/p)\n8 ln(c1pn).\n\nStep 3: Computing the distances. For each vertex, we present two variants for estimating the distance.\n\n1. For each pair (u, v), compute dist1(u, v) according to\n\n(cid:0) 1\u2212c1p\n\nc2p\n\n(cid:1)(cid:0) \u02dcNu,r \u2212 \u02dcNv,r\n\n(cid:1)T\n\n(cid:0) \u02dcNu,r+1 \u2212 \u02dcNv,r+1\n\n(cid:1).\n\nM2\n\n6\n\n\f2. For each pair (u, v), compute distance according to\n\ndist2(u, v) =(cid:80)\n(cid:1)(cid:0) \u02dcNu,r \u2212 \u02dcNv,r\n(cid:1)T\n\n(cid:0) 1\u2212c1p\n\nc2p\n\ni\u2208[d(cid:48)] zi\u2206uv(r, i),\n\n(cid:0) \u02dcNu,r+i \u2212 \u02dcNv,r+i\n\n(cid:1),\n\nM2\n\nwhere \u2206uv(r, i) is de\ufb01ned as\n\nand z \u2208 Rd(cid:48)\nbecause \u02dc\u039bT is a Vandermonde matrix, and \u039b\u22122r1 lies within the span of its columns.\n\nis a vector that satis\ufb01es \u039b2r+2 \u02dc\u039bT z = \u039b21. z always exists and is unique\n\nComputing dist1 does not require knowledge of the spectrum of f. In our analysis we prove that\nthe expected squared error of the estimate computed in (2) using dist1 converges to zero with n\nfor p = \u03c9(n\u22121+\u0001) for some \u0001 > 0 and constant rank d, i.e. p must be polynomially larger than\nn\u22121. Although computing dist2 requires knowledge of the spectrum of f to determine the vector\nz, the expected squared error of the estimate computed in (2) using dist2 conveges to zero for\np = \u03c9(n\u22121) and constant rank d, which includes the sparser settings when p is only larger than n\u22121\nby polylogarithmic factors. We also will show the dependence on d allowing for it to grow slowly\nwith pn. It seems plausible that the technique employed by [2] could be used to design a modi\ufb01ed\nalgorithm which does not need to have prior knowledge of the spectrium. They achieve this for\nthe stochastic block model case by bootstrapping the algorithm with a method which estimates the\nspectrum \ufb01rst and then computes pairwise distances with the estimated eigenvalues.\n\nStep 4: Averaging datapoints to produce \ufb01nal estimate. The estimate \u02c6F (u, v) is computed by\naveraging over nearby points de\ufb01ned by the distance estimates dist1 (or dist2). Recall that B \u2265 1\nwas assumed in the model de\ufb01nition to upper bound supy\u2208[0,1] |qk(y)|.\nLet Euv1 denote the set of undirected edges (a, b) such that (a, b) \u2208 E3 and both dist1(u, a) and\ndist1(v, b) are less than \u03b71(n) = 33Bd|\u03bb1|2r+1(c1pn)\u22122/5. The \ufb01nal estimate \u02c6F (u, v) produced\nby using dist1 is computed by averaging over the undirected edge set Euv1,\n\n\u02c6F (u, v) =\n\n1\n\n|Euv1|\n\n(a,b)\u2208Euv1\n\nM3(a, b).\n\n(5)\n\nLet Euv2 denote the set of undirected edges (a, b) such that (a, b) \u2208 E3, and both dist2(u, a) and\ndist2(v, b) are less than \u03be2(n) = 33Bd|\u03bb1|(c1pn)\u22122/5. The \ufb01nal estimate \u02c6F (u, v) produced by\nusing dist2 is computed by averaging over the undirected edge set Euv2,\n\n(cid:88)\n\n(cid:88)\n\n\u02c6F (u, v) =\n\n1\n\n|Euv2|\n\n4 Main Results\n\n(a,b)\u2208Euv2\n\nM3(a, b).\n\n(6)\n\nWe prove bounds on the estimation error of our algorithm in terms of the mean squared error (MSE),\n\nMSE := E(cid:104)\n(cid:82) 1\n0 (f (\u03b8u, y) \u2212 f (\u03b8v, y))2dy =(cid:80)d\n\n1\n\nn(n\u22121)\n\nu(cid:54)=v( \u02c6Fuv \u2212 Fuv)2(cid:105)\n(cid:80)\n\n,\n\nwhich averages the squared error over all edges. It follows from the model that\n\nk=1 \u03bb2\n\nk(qk(\u03b8u) \u2212 qk(\u03b8v))2 = (cid:107)\u039bQ(eu \u2212 ev)(cid:107)2\n2.\n\nThe key part of the analysis is to show that the computed distances are in fact good estimates of\n(cid:107)\u039bQ(eu \u2212 ev)(cid:107)2\n2. The analysis essentially relies on showing that the neighborhood growth around a\nvertex behaves according to its expectation, according to some properly de\ufb01ned notion. The radius\nr must be small enough to guarantee that the growth of the size of the neighborhood boundary\nis exponential, increasing at a factor of approximately c1pn. However, if the radius is too small,\nthen the boundaries of the respective neighborhoods of the two chosen vertices would have a small\nintersection, so that estimating the similarities based on the small intersection of datapoints would\n\n7\n\n\f(cid:16) ln(1/p)\n\n(cid:17)\n\n(cid:16) ln(1/c1p)\n\n(cid:17)\n\nresult in high variance. Therefore, the choice of r is critical to the algorithm and analysis. We are\nable to prove bounds on the squared error when r is chosen to satisfy the following conditions:\n\n\u2265\n\n1\n2\n\nr + d(cid:48) \u2264 7 ln(1/c1p)\n\n6 ln(1/p)\n\n.\n\n,\n\nr +\n\nln(c1pn)\n\nln(c1pn)\n\n8 ln(9c1pn/8) = \u0398\n\n8 ln(7|\u03bbd|2c1pn/8|\u03bb1|) = \u0398\n\n(7)\nThe parameter d(cid:48) denotes the number of distinct valued eigenvalues in the spectrum of f, (\u03bb1 . . . \u03bbd),\nand determines the number of different radius \u201cmeasurements\u201d involved in computing dist2(u, v).\nComputing dist1(u, v) only involves a single measurement, thus the left hand side of (7) can be\nreduced to r + 1 instead of r + d(cid:48). When p is above a threshold, we choose c1 to decrease with n to\nensure (7) can be satis\ufb01ed, sparsifying the edge set E1 used for expanding the neighborhood around\na vertex . When the sample probability is polynomially larger than n\u22121, i.e. p = n\u22121+\u0001 for some\n\u0001 > 0, these constraints imply that r is a constant with respect to n. However, if p = \u02dcO(n\u22121), we\nwill need r to grow with n according to a rate of 6 ln(1/p)/8 ln(c1pn).\nTheorem 4.1. If p = n\u22121+\u0001 for some \u0001 \u2208 (0, 1\n\u0398\np = \u03c9(n\u22121d5) and |\u03bbd| = \u03c9((c1pn)\u2212 1\neter r achieves\n\n6 ), with a choice of c1 such that c1pn =\nIf\n4 ), then the estimate computed using dist1 with param-\n\n, there exists a constant r (with respect to n) which satis\ufb01es (7).\n\nmax(pn, (p6n7) 1\n\n(cid:16)\n\n(cid:17)\n\n19 )\n\n(cid:32)(cid:18)|\u03bb1|\n\n|\u03bbd|\n\nMSE = O\n\n(cid:33)\n(cid:19)2r B3d2|\u03bb1|\n(cid:16)\n(cid:1)(cid:17)\nd exp(cid:0) \u2212 (c1pn)1/5\n(cid:19)1/2(cid:33)\n(cid:32)(cid:18)|\u03bb1|\n(cid:19)r(cid:18) B3d2|\u03bb1|\n\n(c1pn)2/5\n\n9B2d\n\n.\n\n|\u03bbd|\n\n(c1pn)2/5\n\n.\n\nIf p = \u03c9(n\u22121d5 ln5(n)), with probability greater than 1 \u2212 O\nsatis\ufb01es\n\n(cid:107) \u02c6F \u2212 F(cid:107)max := max\n\ni,j\n\n| \u02c6Fij \u2212 Fij| = O\n\n, the estimate\n\nTheorem 4.1 proves that the mean squared error (MSE) of the estimate computed with dist1\nis bounded by O((|\u03bb1|/|\u03bbd|)2rd2(c1pn)\u22122/5). Therefore, our algorithm with dist1 provides a\nconsistent estimate when r is constant with respect to n, which occurs for p = n\u22121+\u0001 for some\n\u0001 > 0. In fact, the reason why the error blows up with a factor of (|\u03bb1|/|\u03bbd|)\u22122r is because we\ncompute the distance by summing product of weights over paths of length 2r. From (4), we see\nthat in expectation, when we take the product of edge weights over a path of length r from u to y,\ninstead of computing f (\u03b8u, \u03b8y) = eT\nu Q\u039brQey, which\ncontains extra factors of \u039br\u22121. Therefore, by computing over a radius r, the calculation in dist1 will\napproximate (cid:107)\u039br+1Q(eu \u2212 ev)(cid:107)2\n2, thus leading to an error\nfactor of (|\u03bb1|/|\u03bbd|)2r. It turns out that dist2 adjusts for this bias, as the multiple measurements\n\u2206uv(r, i) with different length paths allows us to separate out ek\u039bQ(eu \u2212 ev) for all k with distinct\nvalues of \u03bbk.\nTheorem 4.2. If p = o(n\u22125/6), with a choice of c1 such that c1pn = \u0398\nthere exists a value for r which satis\ufb01es (7). If p = \u03c9(n\u22121d5), |\u03bbd| = \u03c9((c1pn)\u2212 1\nthen the estimate computed using dist2 with parameter r achieves\n\n2 rather than our intended (cid:107)\u039bQ(eu \u2212 ev)(cid:107)2\n(cid:16)\n\nu Q\u039bQey, the expression concentrates around eT\n\n,\n4 ), and d = o(r),\n\nmax(pn, (p6n7)\n\n(8d(cid:48)+11) )\n\n(cid:17)\n\n1\n\nMSE = O\n\nIf p = \u03c9(n\u22121d5 ln5(n)), with probability 1 \u2212 O\n\n(cid:107) \u02c6F \u2212 F(cid:107)max := max\n\ni,j\n\n| \u02c6Fij \u2212 Fij| = O\n\n.\n\n(cid:19)\n\n(c1pn)2/5\n\n(cid:18) B3d2|\u03bb1|\n(cid:1)(cid:17)\n(cid:16)\nd exp(cid:0) \u2212 (c1pn)1/5\n(cid:32)(cid:18) B3d2|\u03bb1|\n\n9B2d\n\n(c1pn)2/5\n\n, the estimate satis\ufb01es\n\n(cid:19)1/2(cid:33)\n\n.\n\nTheorem 4.2 proves that the mean squared error (MSE) of the estimate computed using dist2 is\nbounded by O(d2(c1pn)\u22122/5); and thus the estimate is consistent in the ultra sparse sampling regime\nof p = \u03c9(d5n\u22121).\n\n8\n\n\f5 Discussion\n\nIn this work we presented a similarity based collaborative \ufb01ltering algorithm which is provably\nconsistent in sparse sampling regimes, as long as the sample probability p = \u03c9(n\u22121). The algorithm\ncomputes similarity between two users by comparing their local neighborhoods. Our model assumes\nthat the data matrix is generated according to a latent variable model, in which the weight on an\nobserved edge (u, v) is equal in expectation to a function f over associated latent variables \u03b8u and \u03b8v.\nWe presented two variants for computing similarities (or distances) between vertices. Computing\ndist1 does not require knowledge of the spectrum of f, but the estimate requires p to be polynomially\nlarger than n in order to guarantee the expected squared error converges to zero. Computing dist2\nuses the knowledge of the spectrum of f, but it provides an estimate that is provably consistent\nfor a signi\ufb01cantly sparse regime, only requiring that p = \u03c9(n\u22121). The mean squared error of both\nalgorithms is bounded by O((pn)\u22121/5). Since the computation is based on of comparing local\nneighborhoods within the graph, the algorithm can be easily implemented for large scale datasets\nwhere the data may be stored in a distributed fashion optimized for local graph computations.\n\nPractical implementation. In practice, we do not know the model parameters, and we would use\ncross validation to tune the radius r and threshold \u03b7n. If r is either too small or too large, then the\nvector Nu,r will be too sparse. The threshold \u03b7n trades off between bias and variance of the \ufb01nal\nestimate. Since we do not know the spectrum, dist1 may be easier to compute, and still enjoys good\nproperties as long as r is not too large. When the sampled observations are not uniform across entries,\nthe algorithm may require more modi\ufb01cations to properly normalize for high degree hub vertices, as\nthe optimal choice of r may differ depending on the local sparsity. The key computational step of\nour algorithm involves comparing the expanded local neighborhoods of pairs of vertices to \ufb01nd the\n\u201cnearest neighbors\u201d. The local neighborhoods can be computed in parallel, as they are independent\ncomputations. Furthermore, the local neighborhood computations are suitable for systems in which\nthe data is distributed across different machines in a way that optimizes local neighborhood queries.\nThe most expensive part of our algorithm involves computing similarities for all pairs of vertices in\norder to determine the set of nearest neighbors. However, it would be possible to use approximate\nnearest neighbor techniques to greatly reduce the computation such that approximate nearest neighbor\nsets could be computed with signi\ufb01cantly fewer than n2 pairwise comparisons.\n\nNon-uniform sampling. In reality, the probability that entries are observed is not be uniform across\nall pairs (i, j). However, we believe that an extension of our result can also handle variations in\nthe sample probability as long as the sample probability is a function of the latent variables and\nscales in the same way with respect to n across all entries. Suppose that the probability of observing\n(i, j) is given by pg(\u03b8i, \u03b8j), where p is the scaling factor (contains the dependence upon n), and g\nallows for constant factor variations in the sample probability across entries as a function of the latent\nvariables. If we let matrix X indicate the presence of an observation or not, then we can apply our\nalgorithm twice, \ufb01rst on matrix X to estimate function g, and then on data matrix M to estimate f\ntimes g. We can simply divide by the estimate for g to obtain the estimate for f. The limitation is that\nif g(\u03b8i, \u03b8j) is very small, then the error in estimating the corresponding f (\u03b8i, \u03b8j) will have higher\nvariance. However, it is expected that error increases for edge types with fewer samples.\n\nAcknowledgments\n\nThis work is supported in parts by NSF under grants CMMI-1462158 and CMMI-1634259, by\nDARPA under grant W911NF-16-1-0551, and additionally by a NSF Graduate Fellowship and\nClaude E. Shannon Research Assistantship.\n\nReferences\n[1] Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models:\nFundamental limits and ef\ufb01cient algorithms for recovery. In Foundations of Computer Science\n(FOCS), 2015 IEEE 56th Annual Symposium on, pages 670\u2013688. IEEE, 2015.\n\n[2] Emmanuel Abbe and Colin Sandon. Recovering communities in the general stochastic block\nmodel without knowing the parameters. In Advances in neural information processing systems,\n2015.\n\n9\n\n\f[3] Emmanuel Abbe and Colin Sandon. Detection in the stochastic block model with multiple\nclusters: proof of the achievability conjectures, acyclic bp, and the information-computation\ngap. Advances in neural information processing systems, 2016.\n\n[4] Edo M Airoldi, Thiago B Costa, and Stanley H Chan. Stochastic blockmodel approximation of\na graphon: Theory and consistent estimation. In Advances in Neural Information Processing\nSystems, pages 692\u2013700, 2013.\n\n[5] D.J. Aldous. Representations for partially exchangeable arrays of random variables. J. Multi-\n\nvariate Anal., 11:581 \u2013 598, 1981.\n\n[6] Animashree Anandkumar, Rong Ge, Daniel Hsu, and Sham Kakade. A tensor spectral approach\nto learning mixed membership community models. In Conference on Learning Theory, pages\n867\u2013881, 2013.\n\n[7] Tim Austin. Exchangeable random arrays. Technical Report, Notes for IAS workshop., 2012.\n\n[8] Charles Bordenave, Marc Lelarge, and Laurent Massouli\u00e9. Non-backtracking spectrum of\nrandom graphs: community detection and non-regular ramanujan graphs. In Foundations of\nComputer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 1347\u20131357. IEEE,\n2015.\n\n[9] Christian Borgs, Jennifer Chayes, and Adam Smith. Private graphon estimation for sparse\n\ngraphs. In Advances in Neural Information Processing Systems, pages 1369\u20131377, 2015.\n\n[10] Christian Borgs, Jennifer T Chayes, Henry Cohn, and Shirshendu Ganguly. Consistent nonpara-\n\nmetric estimation for heavy-tailed sparse graphs. arXiv preprint arXiv:1508.06675, 2015.\n\n[11] Christian Borgs, Jennifer T Chayes, Henry Cohn, and Nina Holden. Sparse exchangeable graphs\n\nand their limits via graphon processes. arXiv preprint arXiv:1601.07134, 2016.\n\n[12] Christian Borgs, Jennifer T Chayes, Henry Cohn, and Yufei Zhao. An Lp theory of sparse graph\nconvergence I: limits, sparse random graph models, and power law distributions. arXiv preprint\narXiv:1401.2906, 2014.\n\n[13] Christian Borgs, Jennifer T Chayes, Henry Cohn, and Yufei Zhao. An Lp theory of sparse\ngraph convergence II: Ld convergence, quotients, and right convergence. arXiv preprint\narXiv:1408.0744, 2014.\n\n[14] Christian Borgs, Jennifer T Chayes, L\u00e1szl\u00f3 Lov\u00e1sz, Vera T S\u00f3s, and Katalin Vesztergombi.\nConvergent sequences of dense graphs I: Subgraph frequencies, metric properties and testing.\nAdvances in Mathematics, 219(6):1801\u20131851, 2008.\n\n[15] Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization.\n\nCommunications of the ACM, 55(6):111\u2013119, 2009.\n\n[16] Emmanuel J Candes and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE,\n\n98(6):925\u2013936, 2010.\n\n[17] Emmanuel J Cand\u00e8s and Terence Tao. The power of convex relaxation: Near-optimal matrix\n\ncompletion. IEEE Transactions on Information Theory, 56(5):2053\u20132080, 2010.\n\n[18] Sourav Chatterjee. Matrix estimation by universal singular value thresholding. The Annals of\n\nStatistics, 43(1):177\u2013214, 2015.\n\n[19] Yudong Chen and Martin J Wainwright. Fast low-rank estimation by projected gradient descent:\n\nGeneral statistical and algorithmic guarantees. arXiv preprint arXiv:1509.03025, 2015.\n\n[20] Mark A Davenport, Yaniv Plan, Ewout van den Berg, and Mary Wootters. 1-bit matrix\n\ncompletion. Information and Inference, 3(3):189\u2013223, 2014.\n\n[21] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborov\u00e1. Asymptotic\nanalysis of the stochastic block model for modular networks and its algorithmic applications.\nPhys. Rev. E, 84:066106, Dec 2011.\n\n10\n\n\f[22] Persi Diaconis and Svante Janson. Graph limits and exchangeable random graphs. Rendiconti\n\ndi Matematica, VII(28):33\u201361, 2008.\n\n[23] Chao Gao, Yu Lu, and Harrison H Zhou. Rate-optimal graphon estimation. The Annals of\n\nStatistics, 43(6):2624\u20132652, 2015.\n\n[24] David Goldberg, David Nichols, Brian M. Oki, and Douglas Terry. Using collaborative \ufb01ltering\n\nto weave an information tapestry. Commun. ACM, 1992.\n\n[25] D.N. Hoover. Row-column exchangeability and a generalized model for probability.\n\nExchangeability in Probability and Statistics (Rome, 1981), pages 281 \u2013 291, 1981.\n\nIn\n\n[26] Brian Karrer, M. E. J. Newman, and Lenka Zdeborov\u00e1. Percolation on sparse networks. Phys.\n\nRev. Lett., 113:208702, Nov 2014.\n\n[27] Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a\n\nfew entries. IEEE Transactions on Information Theory, 56(6):2980\u20132998, 2010.\n\n[28] Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from noisy\n\nentries. Journal of Machine Learning Research, 11(Jul):2057\u20132078, 2010.\n\n[29] Olga Klopp, Alexandre B Tsybakov, and Nicolas Verzelen. Oracle inequalities for network\n\nmodels and sparse graphon estimation. To appear in Annals of Statistics, 2015.\n\n[30] Yehuda Koren and Robert Bell. Advances in collaborative \ufb01ltering. In Recommender Systems\n\nHandbook, pages 145\u2013186. Springer US, 2011.\n\n[31] Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zde-\nborov\u00e1, and Pan Zhang. Spectral redemption in clustering sparse networks. Proceedings of the\nNational Academy of Sciences, 110(52):20935\u201320940, 2013.\n\n[32] Christina E. Lee, Yihua Li, Devavrat Shah, and Dogyoon Song. Blind regression: Nonpara-\nmetric regression for latent variable models via collaborative \ufb01ltering. In Advances in Neural\nInformation Processing Systems 29, pages 2155\u20132163, 2016.\n\n[33] Greg Linden, Brent Smith, and Jeremy York. Amazon.com recommendations: Item-to-item\n\ncollaborative \ufb01ltering. IEEE Internet Computing, 7(1):76\u201380, 2003.\n\n[34] L\u00e1szl\u00f3 Lov\u00e1sz. Large networks and graph limits, volume 60. American Mathematical Society\n\nProvidence, 2012.\n\n[35] Laurent Massouli\u00e9. Community detection thresholds and the weak ramanujan property. In\nProceedings of the Forty-sixth Annual ACM Symposium on Theory of Computing, STOC \u201914,\npages 694\u2013703, New York, NY, USA, 2014. ACM.\n\n[36] Elchanan Mossel, Joe Neeman, and Allan Sly. A proof of the block model threshold conjecture.\n\nCombinatorica, Aug 2017.\n\n[37] Sahand Negahban and Martin J Wainwright. Estimation of (near) low-rank matrices with noise\n\nand high-dimensional scaling. The Annals of Statistics, pages 1069\u20131097, 2011.\n\n[38] Xia Ning, Christian Desrosiers, and George Karypis. Recommender Systems Handbook, chapter\nA Comprehensive Survey of Neighborhood-Based Recommendation Methods, pages 37\u201376.\nSpringer US, 2015.\n\n[39] Benjamin Recht. A simpler approach to matrix completion. Journal of Machine Learning\n\nResearch, 12(Dec):3413\u20133430, 2011.\n\n[40] David Steurer and Sam Hopkins. Bayesian estimation from few samples: community detection\n\nand related problems. https://arxiv.org/abs/1710.00264, 2017.\n\n[41] Victor Veitch and Daniel M Roy. The class of random graphs arising from exchangeable random\n\nmeasures. arXiv preprint arXiv:1512.03099, 2015.\n\n11\n\n\f[42] Patrick J Wolfe and So\ufb01a C Olhede. Nonparametric graphon estimation. arXiv preprint\n\narXiv:1309.5936, 2013.\n\n[43] Jiaming Xu, Laurent Massouli\u00e9, and Marc Lelarge. Edge label inference in generalized\nIn Conference on\n\nstochastic block models: from spectral theory to impossibility results.\nLearning Theory, pages 903\u2013920, 2014.\n\n[44] Yuan Zhang, Elizaveta Levina, and Ji Zhu. Estimating network edge probabilities by neighbor-\n\nhood smoothing. arXiv preprint arXiv:1509.08588, 2015.\n\n12\n\n\f", "award": [], "sourceid": 2470, "authors": [{"given_name": "Christian", "family_name": "Borgs", "institution": "Microsoft Research New England"}, {"given_name": "Jennifer", "family_name": "Chayes", "institution": "Microsoft Research"}, {"given_name": "Christina", "family_name": "Lee", "institution": "Microsoft Research"}, {"given_name": "Devavrat", "family_name": "Shah", "institution": "Massachusetts Institute of Technology"}]}