{"title": "Binary Rating Estimation with Graph Side Information", "book": "Advances in Neural Information Processing Systems", "page_first": 4272, "page_last": 4283, "abstract": "Rich experimental evidences show that one can better estimate users' unknown ratings with the aid of graph side information such as social graphs. However, the gain is not theoretically quantified. In this work, we study the binary rating estimation problem to understand the fundamental value of graph side information. Considering a simple correlation model between a rating matrix and a graph, we characterize the sharp threshold on the number of observed entries required to recover the rating matrix (called the optimal sample complexity) as a function of the quality of graph side information (to be detailed). To the best of our knowledge, we are the first to reveal how much the graph side information reduces sample complexity. Further, we propose a computationally efficient algorithm that achieves the limit. Our experimental results demonstrate that the algorithm performs well even with real-world graphs.", "full_text": "Binary Rating Estimation\n\nwith Graph Side Information\n\nKwangjun Ahn\u2217\n\n142nd Military Police Company\n\nKorean Augmentation To the United States Army\n\nKangwook Lee\n\nSchool of Electrical Engineering\n\nKAIST\n\nkjahnkorea@kaist.ac.kr\n\nkw1jjang@kaist.ac.kr\n\nHyunseung Cha\n\nKakao Brain\n\ntony.cha@kakaobrain.com\n\nChangho Suh\n\nSchool of Electrical Engineering\n\nKAIST\n\nchsuh@kaist.ac.kr\n\nAbstract\n\nRich experimental evidences show that one can better estimate users\u2019 unknown\nratings with the aid of graph side information such as social graphs. However,\nthe gain is not theoretically quanti\ufb01ed. In this work, we study the binary rating\nestimation problem to understand the fundamental value of graph side information.\nConsidering a simple correlation model between a rating matrix and a graph, we\ncharacterize the sharp threshold on the number of observed entries required to\nrecover the rating matrix (called the optimal sample complexity) as a function of\nthe quality of graph side information (to be detailed). To the best of our knowledge,\nwe are the \ufb01rst to reveal how much the graph side information reduces sample\ncomplexity. Further, we propose a computationally ef\ufb01cient algorithm that achieves\nthe limit. Our experimental results demonstrate that the algorithm performs well\neven with real-world graphs.\n\n1\n\nIntroduction\n\nRecommender systems provide users with appropriate items based on their revealed preference such\nas ratings and like/dislikes. Due to their wide applicability, recommender systems have received\nsigni\ufb01cant attention in machine learning and data mining societies [46, 41, 50, 48, 5, 8, 22, 31]. In\naddition to the revealed preferences, modern recommender systems also make use of graph side\ninformation to further improve the performance. For instance, Ma et al. [37] view social networks as\nuser-to-user similarity graph and propose an algorithm that makes use of both revealed ratings and\ngraph side information. As a result, they show that the algorithm can achieve superior performances\nover those that do not employ social networks. Jamali and Ester [26] demonstrate that an algorithm\nwith graph information can make recommendations for cold start users, whose lack of available\nrating information precludes the traditional approaches. Similarly, Kalofolias et al. [29] construct\nan item-to-item similarity graph whose edge weights are computed from the item features, and\ndemonstrate the bene\ufb01ts of exploiting such information.\nApart from the aforementioned, there have been a lot more works that incorporate social graph\ninformation in recommender systems; however, few works have been devoted to the theoretical\nunderstanding of this problem. In particular, it is widely open as to by how much one can improve\nthe performance with the aid of graph side information. This precisely sets the goal of this paper: we\n\n\u2217This work was done when Kwangjun Ahn was with KAIST as a student.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00b4eal, Canada.\n\n\faim to quantify the gain due to social network information. Speci\ufb01cally, we intend to characterize\nthe optimal sample complexity needed for rating matrix recovery in the presence of the graph side\ninformation.\nAs an initial effort, we focus on a simple setting in which the entries to be estimated are binary.\nWe consider a scenario where one-sided graph information is available, i.e., either user-to-user or\nitem-to-item graph is given. Without loss of generality, we will assume that a user-to-user similarity\ngraph (or so called a social graph) is available.\nConsider n users and m items, where each user rates each item either +1 (like) or \u22121 (dislike). The\nn users are divided into two clusters, and it is assumed that users from the same cluster share their\nratings over items. Under this setting, the two types of measurements are available. The \ufb01rst is a\npartial observation of noisy ratings, and the other is a social graph among the n users, generated as\nper a celebrated model for random clustered graphs called the stochastic block model (SBM) [21].\nGiven these, the task is to estimate the ground truth ratings (See Sec. 2 for details). Denote by the\noptimal sample complexity the minimum number of observed ratings required for reliable recovery.\nMain contributions. The main contributions of this paper are two-fold. First, under the model\nof interest, we characterize the optimal sample complexity as a function of the quality of graph\ninformation; see Sec. 3 for the quanti\ufb01cation of the quality. In particular, we quantify by how much\nthe social network information can reduce sample complexity. Our result demonstrates that the social\ngraph can be as signi\ufb01cant as resulting in an order-wise reduction in sample complexity.\nThe second contribution of this work is to develop an ef\ufb01cient algorithm. The algorithm operates in\nthree stages, and the two types of measurements are used separately during the \ufb01rst two stages and\nthen together in the last stage. We provide a theoretical performance guarantee of this algorithm under\nthe model of interest: we prove that the algorithm reliably recovers the ratings as soon as sample\ncomplexity exceeds the optimal sample complexity. We also test the empirical performance of the\nalgorithm to show that it achieves better performances than the state-of-the-art approaches [37, 19]\neven with real-world graphs, including political blog network [4] and Facebook networks [51].\nRelated works. Graph side information has been widely used to improve the performance of\nrecommender systems. To begin with, it has been used in matrix factorization-based (MF) approach,\nwhich trains the rating matrix from data by assuming that the rating matrix is of low rank. Most\nworks modify the training procedure by adding some regularization terms inspired by the graph\nside information [26, 37, 34, 10, 29]. Other than regularization techniques, several works modify\nthe existing rating matrix models using graph information [36, 35, 60, 19]. An online version of\nthis problem is also studied [16]. Another popular approach is the one called neighborhood-based\napproach, in which user\u2019s rating information is predicted based on his/her neighborhoods. Several\nworks [38, 18, 54, 24, 25, 55] improve the performance by properly de\ufb01ning neighborhoods using\nsocial graphs. Lastly, some recent works propose deep-learning based approaches in which graph\ninformation is incorpated into a framework called graph convolutional network [42, 52].\nRecently, few works come up with theoretical guarantees for usuefulness of graph side information.\nRao et al. [45] provide a consistency guarantee for regularization techniques, demonstrating a gain\ndue to the side information. On the \ufb02ip side, Chiang et al. [14] consider a model called dirty inductive\nmatrix completion, which incorporates noisy feature-based side information on top of usual low rank\nmatrix completion model. The theoretical guarantee therein show the ef\ufb01cacy of side information\nunless it is too noisy. However, whether the gains characterized in both works are the maximally\nachievable remains open. In this work, focusing on a special case, we are able to characterize optimal\nsample complexity, the maximum gain due to side information.\nMoreover, there have been several recent studies that explore the value of side information in the\ncontext of clustering. In [39], it is proved that similarity information between data points reduces the\nadaptive query complexity in exact clustering. A feature-based side information was also considered\non top of stochastic block model [47, 59]. A different setting is considered in [6] which demonstrates\nthat the k-means problem, which is NP-hard in general, can be solved ef\ufb01ciently with few pairwise\nqueries. In fact, by switching the goal to exactly recover the clusters instead of the ratings, our model\ncan be also seen as a clustering problem with side information. As a consequence of our main result\n(Theorem 1), we also address this setting; see Corollary 1.\nNotation. Let [n] := {1, 2, . . . , n}; let 1k,(cid:96) be the k \u00d7 (cid:96) all-one matrix; for a graph and two subsets\nX and Y , let e(X, Y ) be the number of edges between X and Y .\n\n2\n\n\f2 Model\n\ni.i.d.\n\nConsider n users and m items, where m can scale with n. Each user rates each item either +1 (like) or\n\u22121 (dislike). In real life, people from the same group tend to share their preferences, recommending\ncommodities with each other.2 In an effort to capture this, we consider a simple setting in which there\nare two clusters of the same size, and the users from the same cluster share the ratings over items.\nMore precisely, we consider a binary rating matrix R \u2208 {+1,\u22121}n\u00d7m such that half of its rows are\nuR and the other half are vR, for some uR and vR, which we call rating vectors of R. We denote\nthese two exclusive sets (or clusters) of row indices by AR and BR, respectively.\nObservation model. Given R, we consider two types of measurements.\n1. Partial observation of noisy ratings (N \u2126) : We observe each entry of R independently with\nprobability p, where 0 \u2264 p \u2264 1. We further assume that our observed entries are noisy: the\nvalue of an observed entry can be \ufb02ipped with probability \u03b8 \u2208 (0, 1\n2 ). We denote the subset of\nobserved entries by \u2126 \u2282 [n] \u00d7 [m]. We represent this with an observation matrix N \u2126 of size\nn \u00d7 m, whose (i, j)-th entry is the noisy observation if (i, j) \u2208 \u2126 and 0 otherwise. In other words,\nN \u2126\nij\n2. Social graph information (G): The social graph on n users is generated as per the stochastic block\nmodel (SBM) [21]. That is, given the two clusters of users AR and BR, an edge between each pair\nof two users i, j is placed with probability \u03b1, independently of the others, if they are from the same\ncluster. If not, the probability of having an edge between them is \u03b2, where \u03b1 \u2265 \u03b2. Let G = ([n], E)\ndenote the social graph.\nPerformance metric. Given N \u2126 and G, the task of interest is to recover R. The performance of\nan estimator is measured by the probability that the output of estimator does not coincide with R,\nnamely the probability of error. Concretely, we assume that the worst-case matrix is chosen from a\ncollection of rating matrices R(cid:48) with (cid:107)uR(cid:48) \u2212 vR(cid:48)(cid:107)0 = (cid:100)\u03b3m(cid:101), where \u03b3 \u2208 (0, 1) is a \ufb01xed constant.\nDe\ufb01nition 1. For a \ufb01xed constant \u03b3 \u2208 (0, 1) and an estimator \u03c8 that outputs a binary rating\nmatrix based on N \u2126 and G, the worst-case probability of error P (\u03b3)\n\n\u223c Rij \u00b7 Bern(p) \u00b7 (1 \u2212 2Bern(\u03b8)).\n\nmaxR(cid:48):(cid:107)uR(cid:48)\u2212vR(cid:48)(cid:107)0=(cid:100)\u03b3m(cid:101) Pr(cid:0)\u03c8(N \u2126, G) (cid:54)= R(cid:48)(cid:1) .\n\n(\u03c8) is de\ufb01ned as P (\u03b3)\n\n(\u03c8) :=\n\ne\n\ne\n\nGoal. We aim to characterize p(cid:63) such that (i) when p exceeds p(cid:63), P (\u03b3)\n(\u03c8) \u2192 0 as n \u2192 \u221e for some\nestimator \u03c8; (ii) when p is less than p(cid:63), P (\u03b3)\n(\u03c8) (cid:54)\u2192 0 for any \u03c8. In particular, we aim to characterize\nnmp(cid:63) that we call the optimal sample complexity. Given the fact that nmp is the expected number of\nobserved entries, the optimal sample complexity can be seen as the minimum number of observed\nentries required for rating recovery in the limit of n.\n\ne\n\ne\n\n3 Optimal sample complexity\n\nWe characterize the optimal sample complexity as a function of n, m, \u03b8, \u03b3, \u03b1, and \u03b2.\nTheorem 1. Let \u03b3 \u2208 (0, 1) be \ufb01xed. Assume that m = \u03c9(log n) and log m = o(n).3 Then, the\nfollowing holds for any constant \u0001 > 0: if\n\nmax\n\n(\u221a1 \u2212 \u03b8 \u2212 \u221a\u03b8)2\n\np \u2265\n(\u03c8) \u2192 0 as n \u2192 \u221e for some \u03c8 that outputs a binary rating matrix based on N \u2126 and G ;\n\nn\n\n,\n\n,\n\nthen P (\u03b3)\nconversely, if\n\ne\n\n2 (\u221a\u03b1 \u2212 \u221a\u03b2)2\n\n(1 + \u0001)2 log m\n\n(cid:27)\n\n(cid:27)\n\n1\n\n1\n\n(cid:26) (1 + \u0001) log n \u2212 n\n(cid:26) (1 \u2212 \u0001) log n \u2212 n\n\n\u03b3m\n\n\u03b3m\n\nmax\n\n(\u221a1 \u2212 \u03b8 \u2212 \u221a\u03b8)2\n\np \u2264\n(\u03c8) (cid:54)\u2192 0 as n \u2192 \u221e for any \u03c8.\n\nthen P (\u03b3)\n\ne\n\n2 (\u221a\u03b1 \u2212 \u221a\u03b2)2\n\n,\n\n(1 \u2212 \u0001)2 log m\n\nn\n\n,\n\n2This tendency, called homophily, has been extensively studied in sociology and psychology [40].\n3We employ these conditions to obtain the large deviation results in the proof (See the supplemental material).\n\nIntuitively, these conditions rule out tall and fat matrices, respectively.\n\n3\n\n\fProof: See Sec. 5 for the proof sketch; and see the supplemental material for the full proof .\nLet us interpret Theorem 1. See Table 1 for a summary. In essence, Theorem 1 asserts that the rating\nrecovery is possible if and only if\n\n(cid:110)\n(\u03b3m)\u22121 log n \u2212 (2\u03b3m)\u22121n(\u221a\u03b1 \u2212\n\n(cid:111)\n(cid:112)\u03b2)2, n\u221212 log m\n\n.\n\np > p(cid:63) :=\n\n1\n\n(\u221a1 \u2212 \u03b8 \u2212 \u221a\u03b8)2\n\nmax\n\nFor illustrative purpose, we introduce a notation: Is := (\u221a\u03b1 \u2212 \u221a\u03b2)2. One can interpret Is as\nthe quality of social graph information. This is because the two-cluster structure becomes more\ntransparent as the gap between \u03b1 and \u03b2 gets larger. For instance, when \u03b1 = \u03b2, i.e., Is = 0, there is\nno way to distinguish the two clusters. On the other hand, when \u03b1 = 1 and \u03b2 = 0 (or \u03b1 = 0 and\n\u03b2 = 1), i.e., Is = 1, the cluster structure is straightforward from G. Note that the notation Is is also\nemployed in the context of community recovery under SBM, in which the fundamental limit for exact\nrecovery is shown to be Is > 2 log n\nFirst, consider Is = 0. In this case, the optimal sample complexity nmp(cid:63) is\nmax{\u03b3\u22121n log n, 2m log m}.\n\nn [1].\n\n(1)\n\n1\n\n(\u221a1 \u2212 \u03b8 \u2212 \u221a\u03b8)2\n\nRemark 1. Note that the fundamental limit decreases in \u03b3. This tendency is intuitive. To see this,\nfocus on the noiseless setting (\u03b8 = 0). Let us classify entries of the rating matrix into two types: (i)\nthe entries on the columns where the ratings of two groups coincide; and (ii) the other entries. Unlike\nthe \ufb01rst type of entries, the second type of entries have possibility to reveal signi\ufb01cant information\non clusters: when we observe two users\u2019 different ratings on the same item, it can be immediately\nconcluded that the two users belong to different clusters. Hence, the second type of entries are\nmore informative. As there are \u03b3-fraction of entries of the second type, the chance of getting more\ninformative entries increases in \u03b3. Thus, the required sample complexity decreases in \u03b3.\nWe now turn to the case Is (cid:54)= 0. In this case, the optimal sample complexity nmp(cid:63) is\n\n2, i.e., (1) is equal to 2n log n.\n\nOn the one hand, this result suggests that the social graph information does not help rating recovery\nwhen the number of users is relatively smaller than that of items. More precisely, when 2n log n \u2264\n4\u03b3m log m, both (1) and (2) are equal to 2m log m/(\u221a1 \u2212 \u03b8 \u2212 \u221a\u03b8)2.\nOn the other hand, when 2n log n > 4\u03b3m log m, i.e., (1) is equal to \u03b3\u22121n log n/(\u221a1 \u2212 \u03b8 \u2212 \u221a\u03b8)2,\n\nthe result implies that the social graph information does help rating recovery. Below, we examine the\namount of reduction in sample complexity as a function of Is. For simplicity, we focus on a setting in\nwhich \u03b8 = 0 and \u03b3 = 1\nWe \ufb01rst consider the case of n2Is < 2n log n \u2212 2m log m, i.e., (2) being equal to 2n log n \u2212 n2Is.\nIn this case, sample complexity is reduced by n2Is. That being said, there is no asymptotical gain\nunless n2Is = \u2126(n log n). On the other hand, when n2Is = \u2126(n log n), this reduction can be as\nsigni\ufb01cant as resulting in an order-wise reduction in sample complexity. To see this clearly, consider\ntwo scenarios: n = 2m and n = m2. Also see Fig. 1.\nExample 1 (n = 2m). Note that n2Is < 2n log n \u2212 2m log m if and only if Is < log(2n)/n.\nWhen Is = c log(2n)/n for 0 < c < 1, the optimal sample complexity is 2n log n \u2212 cn log 2n =\n(2 \u2212 c)n log n \u2212 cn log 2, which is (asymptotically) lower than (1) by a multiplicative factor of c\n2 .\nExample 2 (n = m2). Note that n2Is < 2n log n \u2212 2m log m if and only if Is < 2 log n/n \u2212\nlog n/n1.5. When Is = 2 log n/n \u2212 log n/nc for 1 < c < 1.5, the optimal sample complexity is\nn2\u2212c log n, which shows an order-wise reduction in sample complexity.\nRemark 2. Example 2 justi\ufb01es the observation made by Jamali and Ester [26] that graph side\ninformation can help predict ratings for cold start users. More speci\ufb01cally, note that when c = 1.4,\nmost users are cold start users as the average number of observed ratings per user is n\u22120.4 log n.\nFor the case of n2Is \u2265 2n log n\u2212 2m log m, (2) = 2m log m no matter how large Is is. This implies\nthat the gain due to side information is saturated.\n\n1\n\n(\u221a1 \u2212 \u03b8 \u2212 \u221a\u03b8)2\n\nmax{\u03b3\u22121n log n \u2212 (2\u03b3)\n\n\u22121n2Is, 2m log m}.\n\n(2)\n\n4\n\n\f\u221a\n\n\u03b1 \u2212 \u221a\n\nTable 1: Summary of the gain in sample complexity. Depending on the quality of graph information\n\u03b2)2, the gain in sample complexity can be summarized as follows. First, graph information\nIs := (\ndoes not help unless the number of users is relatively larger than that of items (2n log n \u2265 4\u03b3m log m). When\nit helps, the ef\ufb01cacy of the information depends on its quality: When n2Is = o(n log n), social network is too\nnoisy, and hence does not help rating recovery. When Is = \u2126(log n/n), the minimum sample complexity is\na decreasing function in Is. In other words, as the quality of social network increases, the minimum sample\ncomplexity decreases. However, the gain of side-information is characterized as diminishing returns: When Is is\nlarger than a certain threshold, the minimum sample complexity stops decreasing. Note that this does not mean\nthat graph information does not help: It helps but its gain is saturated.\n\no(n log n)\n\nValue of n2Is\n\n< 2n log n \u2212 4\u03b3m log m \u2265 2n log n \u2212 4\u03b3m log m\n\nno asymptotical gain\n\ngain increases in Is\n\ngain is saturated.\n\n(a) n = 2m\n\n(b) n = m2\n\nFigure 1: Signi\ufb01cant reduction in sample complexity due to social graph. We illustrate Theorem 1 for two\ncases: n = 2m and n = m2. For n = 2m, the optimal sample complexity is reduced by some multiplicative\nfactors, and for n = m2, there is an order-wise reduction in sample complexity. See Example 1 and 2 for details.\n\nImplications on community recovery. Switching the goal of our model to exactly recover AR and\nBR instead of R, our model can cover a community recovery problem with some side information\nwith respect to ratings. The proof of Theorem 1 suggests the fundamental limit on Is for exact\nrecovery:\nCorollary 1. Suppose we wish to exactly recover the clusters AR and BR (up to a \ufb02ip) instead of\nR (we should also modify the de\ufb01nition of P (\u03b3)\n(\u03c8) accordingly). Then, the following holds for any\nconstant \u0001 > 0: P (\u03b3)\n(\u03c8) \u2192 0 as n \u2192 \u221e for some \u03c8 that outputs two equal-sized clusters based on\n\u221a\nN \u2126 and G whenever Is + 2\u03b3mp(\n(\u03c8) (cid:54)\u2192 0 as n \u2192 \u221e\n1\u2212\u03b8\u2212\u221a\n\u221a\n\u03b8)2\nfor any \u03c8 if Is + 2\u03b3mp(\nThis result implies that when p (cid:54)= 0, the amount of graph information Is required for cluster recovery\nreduces from 2 log n\n\n1\u2212\u03b8\u2212\u221a\n\u2264 (1 \u2212 \u0001) 2 log n\nn .\n(cid:110) 2 log n\n1\u2212\u03b8\u2212\u221a\n\u221a\nn \u2212 2\u03b3mp(\n\nn ; conversely, P (\u03b3)\n\n\u2265 (1 + \u0001) 2 log n\n\n[1] to max\n\n\u03b8)2\n\n, 0\n\n.\n\ne\n\n\u03b8)2\n\ne\n\nn\n\nn\n\nn\n\ne\n\n(cid:111)\n\nn\n\n4 Proposed algorithm\n\nIn this section, we propose an ef\ufb01cient rating estimation algorithm, which achieves the fundamental\nlimit characterized in Theorem 1. We note that our algorithm can also be applied to a general setting\n(not limited to the simple two-cluster setting described earlier), although the theoretical guarantee for\nthe setting is not provided. This will be clearer while describing our algorithm.\nAlgorithm description. The algorithm works in three stages. The inputs of the algorithm are N \u2126,\nG, the number of clusters k and hyperparameters c1, c2 > 0.4\n\n4 One can always use validation data to tune c1 and c2. For the case of two equal-sized communities, we\n\ncharacterized the optimal choice of c1 and c2 in Theorem 2.\n\n5\n\n0100200300400500n0100020003000400050006000OptimalsamplecomplexityIs=0Is=0.2log(2n)/nIs=0.4log(2n)/nIs=0.6log(2n)/nIs=0.8log(2n)/nIs\u22651.0log(2n)/n0100200300400500n02004006008001000OptimalsamplecomplexityIs=0Is=2log(n)/n\u2212log(n)/n1.00Is=2log(n)/n\u2212log(n)/n1.20Is=2log(n)/n\u2212log(n)/n1.40Is=2log(n)/n\u2212log(n)/n1.45Is\u22652log(n)/n\u2212log(n)/n1.50\f1 , A(0)\n\n2 , . . . , A(0)\n\nStage 1 (Partial recovery of clusters): First, we run a spectral method [17] on G.\nLet\nA(0)\nk be the output of the clustering. Note that other clustering algorithms such\nas other spectral methods [2, 15, 33], nonbacktracking matrix based methods [32], semide\ufb01nite\nprogramming (SDP) [27], and belief propagation (BP) variants [43] can be employed for this stage.\nStage 2 (Recovery of rating vectors): Next, for each j, we recover the rating vector of the cluster A(0)\nusing the observed ratings. For each item i, we collect the observed ratings of the item by the users in\nA(0)\n. Among the collected ratings, we \ufb01nd the one that appears most frequently; let the rating be\nj\nu(j)\n. The rating vector is then de\ufb01ned as u(j) =\ni\n\n(cid:105)m\n\nu(j)\ni\n\n(cid:104)\n\n.\n\nj\n\ni=1\n\nStage 3 (Local re\ufb01nement of clusters): The last stage iteratively re\ufb01nes the clusters A(0)\n\u2019s using G,\nN \u2126 and u(j)\u2019s. This stage consists of T times of re\ufb01nement steps, and in each step, the clusters are\nupdated as follows. Let A(t\u22121)\nbe the outcome of (t \u2212 1)-th re\ufb01nement step for\nsome t = 1, 2, . . . , T . Then, for each i = 1, 2, . . . , n, we put user i to A(t)\nj\u2217 , which is the one among\n(cid:16)\nA(t\u22121)\n\n\u2019s that gives the maximum value of\n\n, . . . , A(t\u22121)\n\n, A(t\u22121)\n\n(cid:17)\n\n(cid:17)\n\nk\n\n1\n\n2\n\nj\n\nj\n\n{i}, A(t\u22121)\n\nj\n\n+ \u03a0i(u(j)).\n\n(cid:16)\nc1 \u00b7 e\n) := |A(t\u22121)\n\nj\n\nj\n\n{i}, A(t\u22121)\n\u2212 c2 \u00b7 \u00afe\n| \u2212 e({i}, A(t\u22121)\n\nj\n\nj\n\nj\n\nj\n\nj\n\nj\n\nj\n\n.\n\n(cid:17)\n\n) indicates the number of non-edges between i and\n\n; \u03a0i(u(j)) is the number of user i\u2019s observed ratings which coincide with that of u(j).\n\nHere, \u00afe({i}, A(t\u22121)\nA(t\u22121)\nLastly, the algorithm outputs (cid:98)R where i-th row is u(j) whenever i \u2208 A(T )\n(cid:16)\n{i}, A(t\u22121)\n\nRemark 3 (Update rule in Stage 3). For a cluster A(t\u22121)\nand its rating vector u(j), the term\n+ \u03a0i(u(j)), which is the sum of the \ufb01rst and the third terms of the update rule,\nc1e\ncan be seen as a measure of \ufb01tness of user i with respect to the cluster A(t\u22121)\n. This is because i is\nmore likely to belong to A(t\u22121)\nwhen there are more edges between i and A(t\u22121)\nand more observed\nratings consistent with u(j). The number of non-edges is subtracted from the term to minimize the\neffect of cluster size as a large cluster tends to have more edges to users.\nRemark 4 (Non-agnostic re\ufb01nement rule). When the algorithm knows the size of each cluster, we\nmodify the de\ufb01nition of \u00afe by replacing the terms |A(t\u22121)\n(cid:104)\n|\u2019s with the actual sizes |Aj|\u2019s. In particular,\nwhen the algorithm knows that the clusters are of equal size, we obtain the following re\ufb01nement\nj\u2217 , where j\u2217 = arg maxj\u2208[k]\nc(cid:48)\nrule: we put user i to A(t)\nfor some\n1 \u00b7 e\nhyperparameter c(cid:48)\nRemark 5. Note that this algorithm can be applied to a general setting where (i) rating type is not\nlimited to binary; (ii) the number of clusters can be larger than 2; and (iii) the clusters are of unequal\nsizes.\nRemark 6. The proposed algorithm is inspired by a general paradigm in solving non-convex\nproblems: \ufb01rst obtain a decent initial estimate and iteratively re\ufb01ne the estimate to reach the global\noptimum. This paradigm has been employed in various contexts, including matrix completion [30, 23],\nphase retrieval [44, 11], robust PCA [56], community recovery [2, 17, 57, 12], EM-algorithm [7],\nand rank aggregation [13]. Moreover, we note that spectral algorithms have been also used in rating\nestimation in the context of crowdsourcing [49, 53].\n\n{i}, A(t\u22121)\n\n+ \u03a0i(u(j))\n\n1 > 0.\n\n(cid:16)\n\n(cid:17)\n\n(cid:105)\n\nj\n\nj\n\nj\n\nj\n\nOne important aspect of this algorithm is its low computational complexity. Spectral methods can\nbe run within O(|E| log n) time using the power method [9]. Stage 2 requires a single pass of all\nobserved ratings, which amounts to O(|\u2126|) time. As for Stage 3, a single individual update of user i\nentails reading of i-th row and edges connected to user i. Assuming a proper data structure, each\niteration requires O(|E| + |\u2126|). Hence, Stage 3 can be done within O(|E|T + |\u2126|T ). Overall, the\nproposed algorithm can be performed within O(|E|T + |E| log n + |\u2126|T ).\nTheoretical performance guarantee. To investigate theoretical guarantees of the proposed algo-\nrithm, we focus on the model in Sec. 2.\n\n6\n\n\fTheorem 2. Let R be any binary rating matrix with (cid:107)uR \u2212 vR(cid:107)0 = \u03b3m for some \u03b3 \u2208 (0, 1). In\naddition to the assumptions in Theorem 1, assume further that m = O(n) and Is = \u03c9(1/n).5\nSuppose that there exists \u0001 > 0 such that the suf\ufb01cient condition in Theorem 1 holds. Then with\nprobability approaching 1 as n \u2192 \u221e, the proposed algorithm with the knowledge that the two clusters\nare of equal size (i.e., using the re\ufb01nement rule in Remark 4) exactly recovers R under the settings\nT = O(log n) and c(cid:48)\nand\n\u02c6\u03b2 := 4 e(A(0)\nare estimations of model parameters \u03b1 and \u03b2 after Stage 1, and \u02c6\u03b8 is an estimation\nof \u03b8 after Stage 2 de\ufb01ned by the fraction of observed ratings which are different from the corresponding\nentries of the rating matrix de\ufb01ned by clusters A(0)\n\n. Here, \u02c6\u03b1 := e(A1(0),A1(0))+e(A2(0),A2(0))\n\n(cid:16) \u02c6\u03b1(1\u2212 \u02c6\u03b2)\n\n2 and rating vectors u(1), u(2).\n\n(cid:16) 1\u2212\u02c6\u03b8\n\n1 ,A(0)\n2 )\nn2\n\n1 , A(0)\n\n1 = log\n\n\u02c6\u03b2(1\u2212 \u02c6\u03b1)\n\n2(n/2\n2 )\n\n(cid:17)\n\n(cid:17)\n\n/ log\n\n\u02c6\u03b8\n\nProof: Due to space limitation, we defer the proof to the supplemental material.\nTheorem 2 implies that the proposed algorithm with proper hyperparameter choices achieves the\nfundamental limit in Theorem 1 except for the case of scarce social graph information (Is = O(1/n)).\n\n5 Proof outline of Theorem 1\nWe sketch the proof while deferring the full proof to the supplemental material. Let Ir := p(\u221a1 \u2212 \u03b8\u2212\n\u221a\u03b8)2. Using the notations of Ir and Is, one can succinctly represent the suf\ufb01cient condition and the\nnecessary condition claimed in Theorem 1. For instance, the suf\ufb01cient condition reads\n\n1\n2\n\nnIs + \u03b3mIr \u2265 (1 + \u0001) log n\n\nand\n\n1\n2\n\nnIr \u2265 (1 + \u0001) log m .\n\n2 ], BR(\u03b3) = [n] \\ [ n\n\n(cid:20) +1n/2,(1\u2212\u03b3)m +1n/2,\u03b3m\n\nWe next introduce a few more notations that will be used throughout the proof. Let C(\u03b3) be the\ncollection of rating matrices R such that (cid:107)uR \u2212 vR(cid:107)0 = \u03b3m (Here and below, we treat \u03b3m as\nan integer for notational simplicity); let R(\u03b3) :=\n\u2208 C(\u03b3) (i.e.,\n2 ], uR(\u03b3) = +11,m and vR(\u03b3) = [+11,(1\u2212\u03b3)m | \u2212 11,\u03b3m]). Lastly, let\nAR(\u03b3) = [ n\n\u03c8ML be the maximum likelihood estimator (output being not constrained to C(\u03b3)) and L(\u00b7) be the\nlikelihood function.\nAchievability: It is enough to show that P (\u03b3)\n(\u03c8ML) \u2192 0. By symmetry, we \ufb01x the ground truth\n(cid:16)\nrating matrix to be R(\u03b3). Note that the event \u201c\u03c8ML(N \u2126, G) (cid:54)= R(\u03b3)\u201d happens only if L(X) \u2264 L(R(\u03b3))\nfor some binary rating matrix X. Hence, by the union bound,\n\n+1n/2,(1\u2212\u03b3)m \u22121n/2,\u03b3m\n\n(cid:21)\n\ne\n\n(cid:88)\n\n(cid:17)\nL(X) \u2264 L(R(\u03b3))\n\n.\n\nP (\u03b3)\n\ne\n\n(\u03c8ML) \u2264\n\nPr\n\nX(cid:54)=M\n\n(3)\n\nTo enumerate all rating matrices different from R(\u03b3), we de\ufb01ne X (k, a1, a2, b1, b2) to be the class of\nrating matrices X\u2019s such that (i) |AX \\ AR(\u03b3)| = |BX \\ BR(\u03b3)| =: k; (ii) uX differs from uR(\u03b3) at\na1 many coordinates among the \ufb01rst (1 \u2212 \u03b3)m coordinates and at a2 many coordinates among the\nnext \u03b3m coordinates; and (iii) vX differs from vR(\u03b3) at b1 many coordinates among the \ufb01rst (1\u2212 \u03b3)m\ncoordinates and at b2 many coordinates among the next \u03b3m coordinates. Note that if X1 and X2\nbelong to the same class, then\n\nPr (L(X1) \u2264 L(M )) = Pr (L(X2) \u2264 L(M ))\n\nhand side of (3) becomes(cid:80)\n\nas the two events are statistically identical. Let I be the range of index, i.e., collection of tuples\n(k, a1, a2, b1, b2) (cid:54)= (0, 0, 0, 0, 0) such that 0 \u2264 k \u2264 n/4 and 0 \u2264 a1, b1 \u2264 (1 \u2212 \u03b3)m and 0 \u2264\na2, b2 \u2264 \u03b3m. Note that k \u2264 n/4 is suf\ufb01cient as one can switch the role of uX and vX.\nFor each 5-tuple z \u2208 I, let Xz be a binary rating matrix in X (z). With this enumeration, the right\nLet z = (k, a1, a2, b1, b2). To upper bound Pr (L(Xz) \u2264 L(M )), we developed a large deviation\nresult building upon the techniques in [28, 58]: for z = (k, a1, a2, b1, b2),\n2 \u2212k)kIs\u2212DzIr ,\n(4)\n\nz\u2208I |X(z)| Pr (L(Xz) \u2264 L(M )) .\n\nPr (L(Xz) \u2264 L(M )) \u2264 e\u22122( n\n\n5Note that the condition m = O(n) is for reliable estimation of parameters \u03b1, \u03b2, \u03b8, and hence can be\n\nremoved when the parameters are known.\n\n7\n\n\fwhere Dz := k \u00b7 {a1 + b1 + (\u03b3m \u2212 a2) + (\u03b3m \u2212 b2)} +(cid:0) n\n2 \u2212 k(cid:1)\nlast upper bound is bounded by(cid:80)\nreaders to the supplemental material for details. Let S(z) := |X(z)|e\u22122( n\n\u03b4 is a suf\ufb01ciently small quantity),(cid:80)I\\L S(z) is negligible compared to(cid:80)I\u2229L S(z). The rationale\nIn the supplemental material, we show that S(z)\u2019s for z with at least one large coordinate are\nnegligible. More precisely, for L := {(k, a1, a2, b1, b2) : k < \u03b4n and a1, a2, b1, b2 < \u03b4m} (where\nHence, it suf\ufb01ces to focus on(cid:80)I\u2229L S(z). As k (cid:28) n and a1, a2, b1, b2 (cid:28) m when z \u2208 I \u2229 L, one\nbehind this is that when z \u2208 I \\ L, S(z) becomes a very small quantity as either 2( n\n2 \u2212 k)k or Dz\nbecomes very large.\n\n\u00b7 (a1 + a2 + b1 + b2); we refer\n2 \u2212k)kIs\u2212DzIr . Then, the\n\nz\u2208I S(z).\n\n2 (a1 + a2 + b1 + b2). By de\ufb01nition,\n\n(cid:19)(cid:18)\u03b3m\n\n(cid:19)(cid:18)\u03b3m\n(cid:19)\n\na2\n\nb2\n\n\u2264 n2kma1+a2+b1+b2 .\n\n(cid:19)(cid:18)(1 \u2212 \u03b3)m\n\n2 \u2212 k)k \u2248 nk and Dz \u2248 2\u03b3km + n\n\n(cid:18) n\n\ncan approximate 2( n\n\n|X (k, a1, a2, b1, b2)| =\nThis together with the above approximation yields\n\n(cid:19)2(cid:18)(1 \u2212 \u03b3)m\nS(z) \u2264 e2k log n+(a1+a2+b1+b2) log me\u22122( n\n\u2248 e\u2212k\u00b7(nIr+2\u03b3mIs\u22122 log n)\u2212(a1+a2+b1+b2)\u00b7( n\n\na1\n\n2\nk\n\nb1\n\njusti\ufb01es(cid:80)\n\n2 \u2212k)kIs\u2212DzIr\n2 Ir\u2212log m) \u2264 (n\u22122\u0001)k(m\u2212\u0001)(a1+a2+b1+b2) ,\n\nz\u2208I\u2229L S(z) \u2192 0.\n\nwhere the last inequality is due to 1\n\n2 nIs + \u03b3mIr \u2265 (1 + \u0001) log n and 1\n\n2 nIr \u2265 (1 + \u0001) log m. This\nConverse: Step 1 (ML as an optimal estimator): Consider the maximum likelihood estimator\n\u03c8ML|C(\u03b3) whose output is constrained in C(\u03b3). It can be proven that\n( \u03c8ML|C(\u03b3) ) .\n\n(\u03c8) \u2265 P (\u03b3)\n\nP (\u03b3)\n\ninf\n\u03c8\n\ne\n\ne\n\ne\n\nML fails if 1\n\nML fails if 1\n\nSee the supplemental material for details. Hence, it is enough to show P (\u03b3)\n( \u03c8ML|C(\u03b3) ) (cid:54)\u2192 0. By\nsymmetry, we \ufb01x the ground truth rating matrix to be R(\u03b3).\nStep 2 (Genie-aided ML estimators): We consider genie-aided ML estimators, in which the genie\nhelps the estimator by telling the answer is one of few candidates within C(\u03b3). Owing to the notation\nX (\u00b7,\u00b7,\u00b7,\u00b7,\u00b7) in the achievability proof, two different kinds of genie-aided estimators are examined:\n\u03c8(1)\nML is given with the information that the ground truth belongs to R(\u03b3) \u222a X (0, 1, 1, 0, 0); \u03c8(2)\nML is\ngiven with the information that the ground truth belongs to R(\u03b3) \u222a X (1, 0, 0, 0, 0).\nStep 3 (Analysis of genie-aided estimators): We prove (i) \u03c8(1)\n2 nIr \u2264 (1 \u2212 \u0001) log m and\n(ii) \u03c8(2)\n2 nIs + \u03b3mIr \u2264 (1 \u2212 \u0001) log n. Here, we provide the proof sketch of (i): we\nremark that the proof of (ii) is trickier and it requires some combinatorial properties of random\ngraphs. Note that if the likelihood L(X) for some X \u2208 X (0, 0, 0, 1, 1) is less than or equal to\nL(R(\u03b3)), then \u03c8(1)\nML fails with probability at least 1/2. Hence, it is enough to show that with proba-\nbility approaching 1, there exists X \u2208 X (0, 0, 0, 1, 1) such that L(X) \u2264 L(R(\u03b3)), or equivalently\n(by taking complement), Pr\n\u2192 0 . On the other hand, some\nX\u2208X (0,0,0,1,1)\nare not mutually independent. A trick avoiding this issue is to show that the last probability is\nbounded by Pr\n.\nX\u2208X (0,0,0,1,0) and\nX\u2208X (0,0,0,0,1) are both mutually independent. Now, we conclude the proof by\nusing the reverse direction of the bound (4) (the bound (4) is indeed tight and the reverse direction\nalso holds up to a constant factor; see the supplemental material): For instance,\n\ndif\ufb01culties arise while analysing the last probability as the events(cid:2)L(X) > L(R(\u03b3))(cid:3)\n(cid:16)(cid:84)\nThe analysis is now tractable as the collections of events (cid:2)L(X) > L(R(\u03b3))(cid:3)\n(cid:2)L(X) > L(R(\u03b3))(cid:3)\n\uf8eb\uf8ed (cid:92)\n\n(cid:2)L(X) > L(R(\u03b3))(cid:3)(cid:17)\n(cid:16)(cid:84)\n\n(cid:2)L(X) > L(R(\u03b3))(cid:3)(cid:17)\n\n(cid:2)L(X) > L(R(\u03b3))(cid:3)(cid:17)\n\n(cid:105)\uf8f6\uf8f8 \u2264 (1 \u2212 e\u2212 n\n\n\u2264 e\u2212(1\u2212\u03b3)me\n\n2 Ir )|X (0,0,0,1,0)|\n\nL(X) > L(R(\u03b3))\n\nX\u2208X (0,0,0,1,0)\n\nX\u2208X (0,0,0,0,1)\n\nX\u2208X (0,0,0,1,1)\n\n(cid:16)(cid:84)\n\n\u2212 n\n2\n\nIr\n\n,\n\n+ Pr\n\n(cid:104)\n\nPr\n\nX\u2208X (0,0,0,1,0)\n\nwhere the last inequality is due to the inequality 1 \u2212 x \u2264 e\u2212x and |X (0, 0, 0, 1, 0)| = (1 \u2212 \u03b3)m; the\nlast term goes to zero when 1\n\n2 nIr \u2264 (1 \u2212 \u0001) log m.\n\n8\n\n\f(a) Synthetic data\n\n(b) Poliblog (2 clusters)\n\n(c) Facebook (6 clusters)\n\nFigure 2: (a) The level of darkness depicts the empirical success rate, and the orange line re\ufb02ects the optimal\nsample complexity due to Theorem 1. A sharp transition in darkness near the line corroborates Theorem 1.\n(b&c) Performance comparison of algorithms on real datasets. Our algorithm shows better performance than the\nother algorithms on every data set, demonstrating practicality of our approach.\n6 Experiments\n\nWe \ufb01rst conduct an experiment to corroborate Theorem 1. We consider a setting where n = 2000\nusers and m = 1000 items. The synthetic data is generated as per the model in Sec. 2.6 For each\ntriple (p, \u03b1, \u03b2), an empirical success rate of the proposed algorithm is measured over 100 random\ntrials. Figure 2a shows the result, where the empirical success rate is depicted by the level of darkness.\nThe orange (solid) line re\ufb02ects the optimal sample complexity due to Theorem 1. The phase transition\noccurs near the orange line, corroborating Theorem 1.\nWe next show that our proposed algorithm performs well even with real-world graphs. On top of the\nreal graphs (political blog network [4] and Facebook networks [51]), we synthesize ratings as per\n\nour model. For the performance metric, we use mean absolute error (MAE): E[|(cid:98)Rij \u2212 Rij|], where\n\nthe expectation is over the observed ratings. We then compare the performance of our algorithm\nwith 7 well-known recommendation algorithms. Speci\ufb01cally, we compare the performance of our\nalgorithm with 7 recommendation algorithms.7 Reported in Figure 2b, 2c are the performances of\nrating estimation algorithms on real graph data. Our algorithm shows better performance than the\nother algorithms, showing the practicality of our approach.\n\n7 Conclusion\n\nMotivated by the lack of study in quantifying the value of social graph information in recommender\nsystems, this work characterized the optimal sample complexity of the rating recovery problem with\nsocial graph information. We also proposed an ef\ufb01cient rating estimation algorithm that provably\nachieves the optimal sample complexity.\nThis paper comes with some limitations in characterizing sample complexity for more general models.\nWe hope restrictive assumptions considered in this paper, such as binary ratings and rating being\nshared across the same group, be relaxed in the future endeavors. In particular, it would be interesting\nto characterize the optimal sample for feature-based side information models [14, 45]. Moreover,\nas in the case of community detection, sample complexity for partial recovery [3] might be more\ndesirable in practice over our exact recovery setting.\n\nAcknowledgments\n\nThe work was jointly supported by the National Research Foundation of Korea (NRF) grant funded\nby the Korea government (MSIP) (No. 2018R1A1A1A05022889) and Kakao Brain Corp.\n\nReferences\n[1] Emmanuel Abbe, Afonso S Bandeira, and Georgina Hall. Exact recovery in the stochastic block model.\n\nIEEE Transactions on Information Theory, 62(1):471\u2013487, 2016.\n\n6We set \u03b8 = 0.1 and \u03b3 = 0.5. We vary \u03b1 and p while \ufb01xing \u03b2 = log n\nn .\n7The recommendation algorithms include item average, user average, item k-Nearest Neighbor (NN),\nuser k-NN, biased matrix factorization [31], matrix factorization with social regularization (SoReg) [37], and\nTrustSVD [19]. We adopt the implementations from LibRec, an open-sourced Java library for recommendation\nsystems [20].\n\n9\n\n0.0020.0040.006Is0.010.020.03p0.00.20.40.60.81.0P (Success)0.030.040.050.060.070.08p0.30.40.50.60.70.80.91.0MAEuser averageitem averageitem k-NNuser k-NNSoRegbiased MFTrustSVDours0.0150.0250.0350.0450.055p0.20.30.40.50.60.70.80.91.0MAE\f[2] Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models: Fundamental\n\nlimits and ef\ufb01cient algorithms for recovery. In FOCS, pages 670\u2013688. IEEE, 2015.\n\n[3] Emmanuel Abbe and Colin Sandon. Proof of the achievability conjectures in the general stochastic block\n\nmodel. To Appear in Communications on Pure and Applied Mathematics, 2017.\n\n[4] Lada A Adamic and Natalie Glance. The political blogosphere and the 2004 us election: divided they blog.\n\nIn Proceedings of the 3rd international workshop on Link discovery, pages 36\u201343. ACM, 2005.\n\n[5] Deepak Agarwal and Bee-Chung Chen. \ufb02da: matrix factorization through latent dirichlet allocation. In\n\nWSDM, pages 91\u2013100. ACM, 2010.\n\n[6] Hassan Ashtiani, Shrinu Kushagra, and Shai Ben-David. Clustering with same-cluster queries. In Advances\n\nin neural information processing systems, pages 3216\u20133224, 2016.\n\n[7] Sivaraman Balakrishnan, Martin J. Wainwright, and Bin Yu. Statistical guarantees for the em algorithm:\n\nFrom population to sample-based analysis. Ann. Statist., 2017.\n\n[8] Robert Bell, Yehuda Koren, and Chris Volinsky. Modeling relationships at multiple scales to improve\n\naccuracy of large recommender systems. In SIGKDD, pages 95\u2013104. ACM, 2007.\n\n[9] Christos Boutsidis, Prabhanjan Kambadur, and Alex Gittens. Spectral clustering via the power method-\n\nprovably. In ICML, pages 40\u201348, 2015.\n\n[10] Deng Cai, Xiaofei He, Jiawei Han, and Thomas S Huang. Graph regularized nonnegative matrix fac-\ntorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence,\n33(8):1548\u20131560, 2011.\n\n[11] Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via wirtinger \ufb02ow: Theory\n\nand algorithms. IEEE Transactions on Information Theory, 61(4):1985\u20132007, 2015.\n\n[12] Yuxin Chen, Govinda Kamath, Changho Suh, and David Tse. Community recovery in graphs with locality.\n\nIn ICML, 2016.\n\n[13] Yuxin Chen and Changho Suh. Spectral MLE: Top-k rank aggregation from pairwise comparisons. In\n\nICML, pages 371\u2013380, 2015.\n\n[14] Kai-Yang Chiang, Cho-Jui Hsieh, and Inderjit S Dhillon. Matrix completion with noisy side information.\n\nIn Advances in Neural Information Processing Systems, pages 3447\u20133455, 2015.\n\n[15] Peter Chin, Anup Rao, and Van Vu. Stochastic block model and community detection in sparse graphs: A\n\nspectral algorithm with optimal rate of recovery. In COLT, pages 391\u2013423, 2015.\n\n[16] Symeon Chouvardas, Mohammed Amin Abdullah, Lucas Claude, and Moez Draief. Robust online matrix\n\ncompletion on graphs. In ICASSP, pages 4019\u20134023. IEEE, 2017.\n\n[17] Chao Gao, Zongming Ma, Anderson Y Zhang, and Harrison H Zhou. Achieving optimal misclassi\ufb01cation\n\nproportion in stochastic block model. JMLR, 2017.\n\n[18] Jennifer Golbeck, James Hendler, et al. Filmtrust: Movie recommendations using trust in web-based social\n\nnetworks. In IEEE CCNC, pages 282\u2013286, 2006.\n\n[19] G. Guo, J. Zhang, and N. Yorke-Smith. Trustsvd: Collaborative \ufb01ltering with both the explicit and implicit\n\nin\ufb02uence of user trust and of item ratings. In AAAI, 2015.\n\n[20] Guibing Guo, Jie Zhang, Zhu Sun, and Neil Yorke-Smith. Librec: A java library for recommender systems.\n\nIn UMAP Workshops, volume 4, 2015.\n\n[21] Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps.\n\nSocial networks, 5(2):109\u2013137, 1983.\n\n[22] Michael Jahrer, Andreas T\u00a8oscher, and Robert Legenstein. Combining predictions for accurate recommender\n\nsystems. In SIGKDD, pages 693\u2013702. ACM, 2010.\n\n[23] Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating\n\nminimization. In STOC, pages 665\u2013674. ACM, 2013.\n\n[24] Mohsen Jamali and Martin Ester. Trustwalker: a random walk model for combining trust-based and\n\nitem-based recommendation. In SIGKDD, pages 397\u2013406. ACM, 2009.\n\n10\n\n\f[25] Mohsen Jamali and Martin Ester. Using a trust network to improve top-n recommendation. In RecSys,\n\npages 181\u2013188. ACM, 2009.\n\n[26] Mohsen Jamali and Martin Ester. A matrix factorization technique with trust propagation for recommenda-\n\ntion in social networks. In RecSys, pages 135\u2013142. ACM, 2010.\n\n[27] Adel Javanmard, Andrea Montanari, and Federico Ricci-Tersenghi. Phase transitions in semide\ufb01nite\n\nrelaxations. PNAS, 113(16):E2218\u2013E2223, 2016.\n\n[28] Varun Jog and Po-Ling Loh. Information-theoretic bounds for exact recovery in weighted stochastic block\n\nmodels using the renyi divergence. arXiv preprint arXiv:1509.06418, 2015.\n\n[29] Vassilis Kalofolias, Xavier Bresson, Michael Bronstein, and Pierre Vandergheynst. Matrix completion on\n\ngraphs. arXiv preprint arXiv:1408.1717, 2014.\n\n[30] Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few entries.\n\nIEEE Transactions on Information Theory, 2010.\n\n[31] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative \ufb01ltering model. In SIGKDD,\n\n2008.\n\n[32] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborov\u00b4a, and P. Zhang. Spectral redemption in\n\nclustering sparse networks. PNAS, 2013.\n\n[33] Jing Lei and Alessandro Rinaldo. Consistency of spectral clustering in stochastic block models. The\n\nAnnals of Statistics, 43(1):215\u2013237, 2015.\n\n[34] Wu-Jun Li and Dit Yan Yeung. Relation regularized matrix factorization. In IJCAI, page 1126, 2009.\n\n[35] Hao Ma, Irwin King, and Michael R Lyu. Learning to recommend with social trust ensemble. In SIGIR,\n\npages 203\u2013210. ACM, 2009.\n\n[36] Hao Ma, Haixuan Yang, Michael R Lyu, and Irwin King. Sorec: social recommendation using probabilistic\n\nmatrix factorization. In CIKM, pages 931\u2013940. ACM, 2008.\n\n[37] Hao Ma, Dengyong Zhou, Chao Liu, Michael R Lyu, and Irwin King. Recommender systems with social\n\nregularization. In WSDM, pages 287\u2013296. ACM, 2011.\n\n[38] Paolo Massa and Paolo Avesani. Controversial users demand local trust metrics: An experimental study on\n\nepinions. com community. In AAAI, volume 5, pages 121\u2013126, 2005.\n\n[39] Arya Mazumdar and Barna Saha. Query complexity of clustering with side information. In Advances in\n\nNeural Information Processing Systems, pages 4685\u20134696, 2017.\n\n[40] Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily in social\n\nnetworks. Annual review of sociology, 27(1):415\u2013444, 2001.\n\n[41] Andriy Mnih and Ruslan R Salakhutdinov. Probabilistic matrix factorization. In NIPS, pages 1257\u20131264,\n\n2008.\n\n[42] Federico Monti, Michael Bronstein, and Xavier Bresson. Geometric matrix completion with recurrent\n\nmulti-graph neural networks. In NIPS, pages 3700\u20133710, 2017.\n\n[43] Elchanan Mossel and Jiaming Xu. Density evolution in the degree-correlated stochastic block model.\n\narXiv preprint arXiv:1509.03281, 7, 2015.\n\n[44] Praneeth Netrapalli, Prateek Jain, and Sujay Sanghavi. Phase retrieval using alternating minimization. In\n\nNIPS, pages 2796\u20132804, 2013.\n\n[45] Nikhil Rao, Hsiang-Fu Yu, Pradeep K Ravikumar, and Inderjit S Dhillon. Collaborative \ufb01ltering with\n\ngraph information: Consistency and scalable methods. In NIPS, pages 2107\u20132115, 2015.\n\n[46] Jasson D. M. Rennie and Nathan Srebro. Fast maximum margin matrix factorization for collaborative\n\nprediction. In ICML, pages 713\u2013719, 2005.\n\n[47] Hussein Saad and Aria Nosratinia. Community detection with side information: Exact recovery under the\n\nstochastic block model. IEEE Journal of Selected Topics in Signal Processing, 2018.\n\n[48] Ruslan Salakhutdinov and Andriy Mnih. Bayesian probabilistic matrix factorization using markov chain\n\nmonte carlo. In ICML, pages 880\u2013887, 2008.\n\n11\n\n\f[49] Devavrat Shah and Christina Lee Yu. Reducing crowdsourcing to graphon estimation, statistically. arXiv\n\npreprint arXiv:1703.08085, 2017.\n\n[50] Luo Si and Rong Jin. Flexible mixture model for collaborative \ufb01ltering. In ICML, pages 704\u2013711, 2003.\n\n[51] Amanda L Traud, Peter J Mucha, and Mason A Porter. Social structure of facebook networks. Physica A:\n\nStatistical Mechanics and its Applications, 391(16):4165\u20134180, 2012.\n\n[52] Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph convolutional matrix completion. stat,\n\n1050:7, 2017.\n\n[53] Rui Wu, Jiaming Xu, Rayadurgam Srikant, Laurent Massouli\u00b4e, Marc Lelarge, and Bruce Hajek. Clustering\nIn ACM SIGMETRICS Performance Evaluation Review,\n\nand inference from pairwise comparisons.\nvolume 43, pages 449\u2013450. ACM, 2015.\n\n[54] Xiwang Yang, Yang Guo, and Yong Liu. Bayesian-inference-based recommendation in online social\n\nnetworks. IEEE Transactions on Parallel and Distributed Systems, 24(4):642\u2013651, 2013.\n\n[55] Xiwang Yang, Harald Steck, Yang Guo, and Yong Liu. On top-k recommendation using social networks.\n\nIn RecSys, pages 67\u201374. ACM, 2012.\n\n[56] Xinyang Yi, Dohyung Park, Yudong Chen, and Constantine Caramanis. Fast algorithms for robust pca via\n\ngradient descent. In NIPS, pages 4152\u20134160, 2016.\n\n[57] Se-Young Yun and Alexandre Proutiere. Accurate community detection in the stochastic block model via\n\nspectral algorithms. arXiv preprint arXiv:1412.7335, 2014.\n\n[58] Anderson Y Zhang, Harrison H Zhou, et al. Minimax rates of community detection in stochastic block\n\nmodels. The Annals of Statistics, 44(5):2252\u20132280, 2016.\n\n[59] Yuan Zhang, Elizaveta Levina, and Ji Zhu. Community detection in networks with node features. arXiv\n\npreprint arXiv:1509.01173, 2015.\n\n[60] Huan Zhao, Quanming Yao, James T Kwok, and Dik Lun Lee. Collaborative \ufb01ltering with social local\n\nmodels. In ICDM, pages 645\u2013654, 2017.\n\n12\n\n\f", "award": [], "sourceid": 2091, "authors": [{"given_name": "Kwangjun", "family_name": "Ahn", "institution": "Korean Augmentation To the United States Army (KATUSA)"}, {"given_name": "Kangwook", "family_name": "Lee", "institution": "KAIST"}, {"given_name": "Hyunseung", "family_name": "Cha", "institution": "Kakao Brain"}, {"given_name": "Changho", "family_name": "Suh", "institution": "KAIST"}]}