{"title": "Optimizing Generalized PageRank Methods for Seed-Expansion Community Detection", "book": "Advances in Neural Information Processing Systems", "page_first": 11710, "page_last": 11721, "abstract": "Landing probabilities (LP) of random walks (RW) over graphs encode rich information regarding graph topology. Generalized PageRanks (GPR), which represent weighted sums of LPs of RWs, utilize the discriminative power of LP features to enable many graph-based learning studies. Previous work in the area has mostly focused on evaluating suitable weights for GPRs, and only a few studies so far have attempted to derive the optimal weights of GPRs for a given application. We take a fundamental step forward in this direction by using random graph models to better our understanding of the behavior of GPRs. In this context, we provide a rigorous non-asymptotic analysis for the convergence of LPs and GPRs to their mean-field values on edge-independent random graphs. Although our theoretical results apply to many problem settings, we focus on the task of seed-expansion community detection over stochastic block models. There, we find that the predictive power of LPs decreases significantly slower than previously reported based on asymptotic findings. Given this result, we propose a new GPR, termed Inverse PR (IPR), with LP weights that increase for the initial few steps of the walks. Extensive experiments on both synthetic and real, large-scale networks illustrate the superiority of IPR compared to other GPRs for seeded community detection.", "full_text": "Optimizing Generalized PageRank Methods for\n\nSeed-Expansion Community Detection\n\nPan Li\nUIUC\n\npanli2@illinois.edu\n\nEli Chien\n\nUIUC\n\nichien3@illinois.edu\n\nOlgica Milenkovic\n\nUIUC\n\nmilenkov@illinois.edu\n\nAbstract\n\nLanding probabilities (LP) of random walks (RW) over graphs encode rich infor-\nmation regarding graph topology. Generalized PageRanks (GPR), which represent\nweighted sums of LPs of RWs, utilize the discriminative power of LP features to\nenable many graph-based learning studies. Previous work in the area has mostly\nfocused on evaluating suitable weights for GPRs, and only a few studies so far have\nattempted to derive the optimal weights of GPRs for a given application. We take a\nfundamental step forward in this direction by using random graph models to better\nour understanding of the behavior of GPRs. In this context, we provide a rigorous\nnon-asymptotic analysis for the convergence of LPs and GPRs to their mean-\ufb01eld\nvalues on edge-independent random graphs. Although our theoretical results apply\nto many problem settings, we focus on the task of seed-expansion community\ndetection over stochastic block models. There, we \ufb01nd that the predictive power of\nLPs decreases signi\ufb01cantly slower than previously reported based on asymptotic\n\ufb01ndings. Given this result, we propose a new GPR, termed Inverse PR (IPR), with\nLP weights that increase for the initial few steps of the walks. Extensive experi-\nments on both synthetic and real, large-scale networks illustrate the superiority of\nIPR compared to other GPRs for seeded community detection. 1\n\n1\n\nIntroduction\n\nPageRank (PR), an algorithm originally proposed by Page et al. for ranking web-pages [1] has\nfound many successful applications, including community detection [2, 3], link prediction [4] and\nrecommender system design [5, 6]. The PR algorithm involves computing the stationary distribution of\na Markov process by starting from a seed vertex and then performing either a one-step of random walk\n(RW) to the neighbors of the current seed or jumping to another vertex according to a predetermined\nprobability distribution. The RW aids in capturing topological information about the graph, while the\njump probabilities incorporate modeling preferences [7]. A proper selection of the RW probabilities\nensures that the stationary distribution induces an ordering of the vertices that may be used to\ndetermine the \u201crelevance\u201d of vertices or the structure of their neighborhoods.\nDespite the wide utility of PR [7, 8], recent work in the \ufb01eld has shifted towards investigating various\ngeneralizations of PR. Generalized PR (GPR) values enable more accurate characterizations of\nvertex distances and similarities, and hence lead to improved performance of various graph learning\ntechniques [9]. GPR methods make use of arbitrarily weighted linear combinations of landing\nprobabilities (LP) of RWs of different length, de\ufb01ned as follows. Given a seed vertex and another\narbitrary vertex v in the graph, the k-step LP of v, x(k)\nv , equals the probability that a RW starting\nv , for\nsome weight sequence {\u03b3k}k\u22650. Certain GPR representations, such as personalized PR (PPR)[10] or\nheat-kernel PR (HPR)[11], are associated with weight sequences chosen in a heuristic manner: PPR\n\nfrom the seed lands at v after k steps; the GPR value for vertex v is de\ufb01ned as(cid:80)\u221e\n\nk=0 \u03b3kx(k)\n\n1Pan Li and Eli Chien contribute equally to this work.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fuses traditional PR weights, \u03b3k = (1\u2212\u03b1)\u03b1k, for some \u03b1 \u2208 (0, 1), and a seed set that captures locality\nk! e\u2212h, for some h > 0. A\nconstraints. On the other hand, HPR uses weights of the form \u03b3k = hk\nquestion that naturally arises is what are the provably near-optimal or optimal weights for a particular\ngraph-based learning task.\nClearly, there is no universal approach for addressing this issue, and prior work has mostly reported\ncomparative analytic or empirical studies for selected GPRs. As an example, for community detection\nbased on seed-expansion (SE) where the goal is to identify a densely linked component of the graph\nthat contains a set of a priori de\ufb01ned seed vertices, Chung [12] proved that the HPR method produces\ncommunities with better conductance values than PPR [13]. Kloster and Gleich [14] con\ufb01rmed\nthis \ufb01nding via extensive experiments over real world networks. Avron and Horesh [15] leveraged\ntime-dependent PRs, a convolutional form of HPR and PPR [16], and showed that this new PR can\noutperform HPR on a number of real network datasets. Another line of work considered adaptively\nlearning the GPR weights given access to suf\ufb01ciently many both within-community and out-of-\ncommunity vertex labels [17, 18]. Related studies were also conducted in other application domains\nsuch as web-ranking [8] and recommender system design [19].\nRecently, Kloumann et al. [20] took a fresh look at the GPR-based seed-expansion community\ndetection problem. They viewed LPs of different steps as features relevant to membership in the\ncommunity of interest, and the GPRs as scores produced by a linear classi\ufb01er that digests these\nfeatures. A key observation in this setting is that the GPR weights have to be chosen with respect to\nthe informativeness of these features. Based on the characterization of the mean-\ufb01eld values of the\nLPs over a modi\ufb01ed stochastic block model (SBM) [21], Kloumann et al. [20] determined that PPR\nwith a proper choice of the parameter \u03b1 corresponds to the optimal classi\ufb01er if only the \ufb01rst-order\nmoments are available. Unfortunately, as the variance of the LPs was ignored, the performance of\nthe PPR was shown to be sub-optimal even for synthetic graphs obeying the generative modeling\nassumptions used in [20].\nWe report substantial improvements of the described line of work by characterizing the non-asymptotic\nbehavior of the LPs over random graphs. More precisely, we derive non-asymptotic conditions for the\nLPs to converge to their mean-\ufb01eld values. Our \ufb01ndings indicate that in the non-asymptotic setting,\nthe discriminative power of k-step LPs does not necessarily deteriorate as k increases; this follows\nsince our bounds on the variance decay even faster than the distance between the means of LPs within\nthe same and across two different communities. We leverage this \ufb01nding and propose new weights\nthat suitably increase with the length of RWs for small values of k. This choice differs signi\ufb01cantly\nfrom the geometrically decaying weights used in PPR, as suggested by [20].\nThe reported results may also provide useful means for improving graph neural networks (GNN) [22,\n23, 24] and their variants [25, 26] for vertex classi\ufb01cation tasks. Currently, the typical numbers of\nlayers in graph neural networks is 2\u22123, as such a choice offers the best empirical performance [24, 25].\nMore layers may over-smooth vertex features and thus provide worse results. However, in this setting,\nlong paths in the graphs may not be properly utilized, as our work demonstrates that these paths\nmay have strong discriminative power for community detection. Hence a natural research direction\nof research regarding GNNs is to investigate how to leverage long paths over graphs without over-\nsmoothing the vertex features. Concurrent to this work, several empirical studies were performed\nto address the same problem. The work in [27, 28] used a decoupling non-linear transformation of\nfeatures and PR propagation over graphs, while [29] used GNNs over graphs that are transformed\nbased on GPRs.\nOur contributions are multifold. We derive the \ufb01rst non-asymptotic bound of the distance between LP\nvectors to their mean-\ufb01eld values over random graphs. This bound allows us to better our understand-\ning of a class of GPR-based community detection approaches. For example, it explains why PPR\nwith a parameter \u03b1 (cid:39) 1 often achieves good community detection performance [30] and why HPR\nstatistically outperforms PPR for community detection, which matches the combinatorial demon-\nstration proposed previously [12]. Second, we describe the \ufb01rst non-asymptotic characterization of\nGPRs with respect to their mean-\ufb01eld values over edge-independent random graphs. The obtained\nresults improve the previous analysis of standard PR methods [31, 32] as one needs fewer modeling\nassumptions and arrives at more general conclusions. Third, we introduce a new PR-type classi\ufb01er\nfor SE community detection, termed inverse PR (IPR). IPR carefully selects the weights for the\n\ufb01rst several steps of the RW by taking into account the variance of the LPs, and offers signi\ufb01cantly\nimproved SE community detection performance compared to canonical PR diffusions (such as HPR\n\n2\n\n\fand PPR) over SBMs. Fourth, we present extensive experiments for detecting seeded communities in\nreal large-scale networks using IPR. Although real world networks do not share the properties of\nSBMs used in our analysis, IPR still signi\ufb01cantly outperforms both HPR and PPR for networks with\nnon-overlapping communities and offers performance improvement over two examined networks\nwith overlapping community structures.\n\n2 Preliminaries\n\nWe start by formally introducing LPs, GPR methods, random graphs and other relevant notions.\nGeneralized PageRank. Consider an undirected graph G = (V, E) with |V | = n. Let A be the\nadjacency matrix, and let D be the diagonal degree matrix of G. The RW matrix of G equals\nW = AD\u22121. Let {\u03bbi}i\u2208[n] be the eigenvalues of W ordered as 1 = \u03bb1 \u2265 \u03bb2 \u2265 ... \u2265 \u03bbn \u2265\n\u22121. Furthermore, let dmin and dmax stand for the minimum and maximum degree of vertices\nin V , respectively. A distribution over the vertex set V is a mapping x : V \u2192 R[0,1] such that\nv\u2208V xv = 1, with xv denoting the probability of vertex v. Given an initial distribution x(0),\nthe k-step LPs equal x(k) = W kx(0). The GPRs are parameterized by a sequence of nonnegative\nk=0 \u03b3kW kx(0).\nFor an in-depth discussion of PageRank methods, the interested reader is referred to the review [7].\nIn some practical GPR settings, the bias caused by varying degrees is compensated for through\ndegree normalization [33]. The k-step degree-normalized LPs (DNLP) are de\ufb01ned as z(k) =\n\n(cid:80)\nweights \u03b3 = {\u03b3k}k\u22650 and an initial potential x(0), pr(\u03b3, x(0)) =(cid:80)\u221e\n(cid:0)(cid:80)\n\nk=0 \u03b3kx(k) =(cid:80)\u221e\n\n(cid:1) D\u22121x(k).\n\nv\u2208V dv\n\nRandom graphs. Throughout the paper, we assume that the graph G is sampled according to a\nprobability distribution P . The mean-\ufb01eld of G with respect to P is an undirected graph \u00afG with\nadjacency matrix \u00afA = E[A], where the expectation is taken with respect to P . Similarly, the mean-\n\ufb01eld degree matrix is de\ufb01ned as \u00afD = E[D] and mean-\ufb01eld random walk matrix as \u00afW = \u00afA \u00afD\u22121. The\nk=0 \u03b3k \u00afW kx(0). We also use the notation\n\nmean-\ufb01eld GPR reads as \u00afpr(\u03b3, x(0)) =(cid:80)\u221e\n\nk=0 \u03b3k \u00afx(k) =(cid:80)\u221e\n\n\u00afz(k), \u00afdmin, and \u00afdmax for the mean-\ufb01eld counterparts of z(k), dmin, and dmax, respectively.\nFor the convergence analysis, we consider a sequence of random graphs {G(n)}n\u22650 with increasing\nsize n, sampled using a corresponding sequence of distributions {P (n)}n\u22650. For a given initial\ndistribution {x(0,n)}n\u22650 and weights {\u03b3(n)}n\u22650, we aim to analyze the conditions under which\nthe LPs x(k,n) and GRPs pr(\u03b3(n), x(0,n)) converge to their corresponding mean-\ufb01eld counterparts\n\u00afx(k,n) and \u00afpr(\u03b3(n), x(0,n)), respectively. We say that an event occurs with high probability if it has\nprobability at least 1\u2212 n\u2212c, for some constant c. If no confusion arises, we omit n from the subscript.\n\nWe also use (cid:107)x(cid:107)p =(cid:0)(cid:80)\n\nv\u2208V |xv|p(cid:1) 1\n\np to measure the distance between LPs.\n\nEdge-independent random graphs and SBMs. Edge-independent models include a wide range of\nrandom graphs, such as Erd\u02ddos-R\u00e9nyi graphs [34], Chung-Lu models [35], stochastic block models\n(SBM) [21] and degree corrected SBMs [36]. In an edge-independent model, for each pair of vertices\nu, v \u2208 V , an edge uv is drawn according to the Bernoulli distribution with parameter puv \u2208 [0, 1] and\nthe draws for different edges are performed independently. Hence, E[Auv] = puv, and Auv, Au(cid:48)v(cid:48)\nare independent if uv, u(cid:48)v(cid:48) are different unordered pairs.\nSome of our subsequent discussion focuses on two-block SBMs. In this setting, we let C1, C0 \u2282 V\ndenote the two blocks, such that |C1| = n1 and |C0| = n0. For any pair of vertices from the same\nblock u, v \u2208 Ci, we set puv = pi, for some pi \u2208 (0, 1), i \u2208 {0, 1}. Note that we allow self loops, i.e.\nwe allow u = v, which makes for simpler notation without changing our conclusions. For pairs uv\nsuch that u \u2208 C1 and v \u2208 C0, we set puv = q, for some q \u2208 (0, 1). A two-block SBM in this setting\nis parameterized by (n1, p1, n0, p0, q).\n\n3 Mean-\ufb01eld Convergence Analysis of LPs and GPRs\n\nIn what follows, we characterize the conditions under which x(k) and pr(\u03b3, x(0)) converge to their\nmean-\ufb01eld counterparts \u00afx(k) and \u00afpr(\u03b3, x(0)), respectively. The derived results enable a subsequent\nanalysis of the variance of LPs over SBM, as outlined in the sections to follow (all proofs are\npostponed to Section B of the Supplement). Note that since GPRs are linear combinations of LPs,\n\n3\n\n\f(cid:107)x(k) \u2212 \u00afx(k)(cid:107)2\n\n(cid:115)\nMoreover, let g(\u03b3, \u00af\u03bb, \u00afdmin) =(cid:80)\n\n(cid:107)x(0)(cid:107)2\n\n\u2264 C1\n\nlog n\nn \u00afdmin\n\n1\n\n(cid:107)x(0)(cid:107)2\n\n+ C2k\n\n(cid:16)\u00af\u03bb + C3\n(cid:115)\n\nk\u22651 \u03b3kk\n\n(cid:115)\n\n(cid:32)\n(cid:113) log n\n\n\u00afdmin\n\n\u00af\u03bb + C3\n\n(cid:17)k\u22121\n\nlog n\n\u00afdmin\n\n. Then,\n\n(cid:33)k\u22121(cid:115) \u00afdmax log n\n\n\u00afd2\nmin\n\n(cid:115) \u00afdmax log n\n\n\u00afd2\nmin\n\n.\n\n(1)\n\n.\n\n(2)\n\nthe convergence properties of x(k) may be used to analyze the convergence properties of pr(\u03b3, x(0)).\nMore speci\ufb01cally, given a sequence of graphs of increasing sizes, and G(n) \u223c P (n), the \ufb01rst question\nof interest is to derive non-asymptotic bounds for (cid:107)x(k) \u2212 \u00afx(k)(cid:107)1, as both x(k), \u00afx(k) have unit (cid:96)1-\nnorms2. The following lemma shows that under certain conditions, one cannot expect convergence in\nthe (cid:96)1 norm for arbitrary values of k.\nLemma 3.1. If there exists a vertex v that may depend on n such that \u00afdv = \u03c9(1) and \u00afdv \u2264 (1\u2212 \u0001)n,\n\nfor some \u0001 > 0, setting x(0) = 1v gives limn\u2192\u221e P(cid:2)(cid:107)x(1) \u2212 \u00afx(1)(cid:107)1 \u2265 \u0001(cid:3) = 1.\n\n(cid:113) 1\nn ). As (cid:107)x(k) \u2212 \u00afx(k)(cid:107)1 \u2264 \u221a\n\nprobabilities for each k. The results establish uniform convergence of GPRs as long as(cid:80)\n\nConsequently, we start with an upper bound for (cid:107)x(k) \u2212 \u00afx(k)(cid:107)2. Then, we provide conditions that\nensure that (cid:107)x(k) \u2212 \u00afx(k)(cid:107)2 = o(\nn(cid:107)x(k) \u2212 \u00afx(k)(cid:107)2, we subsequently\narrive at necessary conditions for convergence in the (cid:96)1-norm. The novelty of our proof technique\nis to use mixing results for RWs to characterize the upper bound for the convergence of landing\nk \u03b3k < \u221e.\nThis \ufb01nding improves the results in [31, 32, 37] for GPRs with weights \u03b3k that scale as O(ck), where\nc \u2208 (0, 1) denotes the damping factor.\nOur \ufb01rst relevant results are non-asymptotic bounds for the (cid:96)2-distance between LPs and their mean-\n\ufb01eld values. The obtained bounds lead to non-asymptotic bounds for the (cid:96)2-distance between GPRs\nand their mean-\ufb01eld values, described in Lemma 3.2. Lemma 3.2 is then used to derive conditions for\nconvergence of LPs and GPRs in the (cid:96)1-distance, summarized in Theorems 3.3 and 3.4, respectively.\nLemma 3.2. Let \u00af\u03bb = max{|\u00af\u03bb2|,|\u00af\u03bbn|}. Suppose that \u00afdmin = \u03c9(log n). Then, with high probability,\nand for some constants C1, C2, C3 that do not depend on n or k, one has\n\n(cid:107)pr(\u03b3, x(0)) \u2212 \u00afpr(\u03b3, x(0))(cid:107)2\n\n(cid:107)x(0)(cid:107)2\n\n\u2264C1\n\nlog n\nn \u00afdmin\n\n1\n\n(cid:107)x(0)(cid:107)2\n\n+ C2g(\u03b3, \u00af\u03bb, \u00afdmin)\n\n\u00afd2\nmin\n\n(cid:113) log n\n\nn ) and \u00afdmax log n\n\nLemma 3.2 allows us to establish the following conditions for (cid:96)1\u2212convergence of the LPs.\n= o(1), then for any sequence {k(n)}n\u22650,\nTheorem 3.3. 1) If (cid:107)x(0)(cid:107)2 = O( 1\u221a\n(cid:107)x(k(n)) \u2212 \u00afx(k(n))(cid:107)1 = o(1), w.h.p.; 2) If \u00afdmin = \u03c9(log n) and \u00af\u03bb < 1 \u2212 c, for some c > 0\nand n \u2265 n0 such that c\nwhere n0, C4 are constants, then for any x(0) and sequence\n{k(n)}n\u2265n0 that satis\ufb01es k(n) \u2265 (log n + log\n)/c, we have (cid:107)x(k(n)) \u2212 \u00afx(k(n))(cid:107)1 = o(1), w.h.p.\nTheorem 3.3 asserts that either broadly spreading the seeds, i.e., (cid:107)x(0)(cid:107)2 = O( 1\u221a\nthe RW to progress until the mixing time, i.e., k(n) \u2265 (log n + log\nconverge in (cid:96)1-distance. One also has the following corresponding convergence result for GPRs.\nTheorem 3.4. 1) If (cid:107)x(0)(cid:107)2 = O( 1\u221a\n\nn ), or allowing for\n)/c, ensures that the LPs\n\n3 > C4\n\n\u00afdmax\n\u00afdmin\n\n\u00afdmax\n\u00afdmin\n\n\u00afdmin\n\nweight sequence {\u03b3(n)}n\u22650 such that(cid:80)\n\u00afdmin = \u03c9(log n) and g(\u03b3(n), \u00af\u03bb(n)) =(cid:80)\n\no(1), w.h.p. 2) If \u03b3(n)\nand \u00afdmax log n\n\n0 /(cid:80)\n\nk \u03b3(n)\n\n\u00afdmax log n\n\n\u00afd2\nmin\nk \u03b3(n)\n\n= o(1), and \u00af\u03bb < 1\u2212c for some c > 0, then for any\nk < \u221e, one has (cid:107)pr(\u03b3(n), x(0)) \u2212 \u00afpr(\u03b3(n), x(0))(cid:107)1 =\nk \u2265 C5 > 0 for some constant C5, \u00af\u03bb < 1 \u2212 c for some c > 0,\n= o(1) w.h.p. 3) If\n\n= o(1), then for any x(0) one has (cid:107)pr(\u03b3(n),x(0))\u2212 \u00afpr(\u03b3(n),x(0))(cid:107)2\n\n(cid:107) \u00afpr(\u03b3(n),x(0))(cid:107)2\nk k(\u00af\u03bb(n) +C6)k\u22121 = O(\n\n(cid:113) \u00afdmin\n\n) for some constant\n\nk\u22651 \u03b3(n)\n\nC6 > 0, then for any x(0) one has (cid:107)pr(\u03b3(n), x(0)) \u2212 \u00afpr(\u03b3(n), x(0))(cid:107)1 = o(1) w.h.p.\n\nn \u00afdmax\n\nn ),\n\n\u00afd2\nmin\n\n2In some cases, both (cid:107)x(k)(cid:107)2 and (cid:107)\u00afx(k)(cid:107)2 naturally equal to o(1), which leads to the obvious, yet loose\n\nbound (cid:107)x(k) \u2212 \u00afx(k)(cid:107)2 \u2264 (cid:107)x(k)(cid:107)2 + (cid:107)\u00afx(k)(cid:107)2 = o(1).\n\n4\n\n\f\u00afd2\nmin\n\nRemarks pertaining to Theorem 3.4: The result in 1) requires weaker conditions than Proposition\n1 in [31] for the standard PR: we disposed of the constraint \u00af\u03bb = o(1) and bounded \u00afdmax/ \u00afdmin.\nAs a result, GPR converges in (cid:96)1-norm as long as the initial seeds are suf\ufb01ciently spread and\n\u00afdmax log n\n= o(1). The result in 2) implies that for \ufb01xed weights that do not depend on n, both the\nstandard PR and HPR have guaranteed convergence in the relative (cid:96)2-distance. This generalizes\nTheorem 1 in [32] stated for the standard PR on SBMs. The result in 3) implies that as long as the\nweights \u03b3(n)\nappropriately depend on n, convergence in the (cid:96)1-norm is guaranteed (e.g., for HPR\nwith h > (ln n + ln\n\n)/(2 \u2212 2\u00af\u03bb)).\n\nk\n\n\u00afdmax\n\u00afdmin\n\nThe following lemma uses the same proof techniques as Lemma 3.2 to provide an upper bound on\nthe distance between the DNLPs z(k) and \u00afz(k), which we \ufb01nd useful in what follows. The result\nessentially removes the dependence on the degrees in the \ufb01rst term of the right hand side of (1).\nLemma 3.5. Suppose that the conditions of Lemma 3.2 are satis\ufb01ed. Then, one has\n\n(cid:32)\n\n(cid:115)\n\n(cid:33)k\u22121(cid:115) \u00afdmax log n\n\n(cid:107)z(k) \u2212 \u00afz(k)(cid:107)2\n\n(cid:107)z(0)(cid:107)2\n\n\u2264 C1k\n\n\u00af\u03bb + C2\n\nlog n\n\u00afdmin\n\nw.h.p.\n\n\u00afd2\nmin\n\n4 GPR-Based SE Community Detection\n\nv }k\u22650\nOne important application of PRs is in SE community detection: For each vertex v, the LPs {x(k)\nmay be viewed as features and the GPR as a score used to predict the community membership of v\nby comparing it with some threshold [20]. Kloumann et al. [20] investigated mean-\ufb01eld LPs, i.e.,\nv }k\u22650, and showed that under certain symmetry conditions, PPR with \u03b1 = \u00af\u03bb2 corresponds\n{\u00afx(k)\nto an optimal classi\ufb01er for one block in an SBM, given only the \ufb01rst-order moment information.\nHowever, accompanying simulations revealed that PPR underperforms with respect to classi\ufb01cation\naccuracy. As a result, Fisher\u2019s linear discriminant [38] was used instead [20] by empirically leveraging\ninformation about the second-order moments of the LPs, and was showed to have a performance\nalmost matching that of belief propagation, a statistically optimal method for SBMs [39, 40, 41].\nIn what follows, we rigorously derive an explicit formula for a variant of Fisher\u2019s linear discriminant\nby taking into account the individual variances of the features while neglecting their correlations.\nThis explicit formula provides new insight into the behavior of GPR methods for SE community\ndetection in SBMs and will be later generalized to handle real world networks (see Section 5).\n\n4.1 Pseudo Fisher\u2019s Linear Discriminant\n\nSuppose that the mean vectors and covariance matrices of the features from two classes C0, C1 are\nequal to (\u00b50, \u03a30) and (\u00b51, \u03a31), respectively. For simplicity, assume that the covariance matrices are\nidentical, i.e., \u03a30 = \u03a31 = \u03a3. The Fisher\u2019s linear discriminant depends on the \ufb01rst two moments\n(mean and variance) of the features [38], and may be written as F (x) = [\u03a3\u22121(\u00b51 \u2212 \u00b50)]T x. The\nlabel of a data point x is determined by comparing F (x) with a threshold.\nNeglecting the differences in the second order moments by assuming that \u03a3 = \u03c32I, Fisher\u2019s linear\ndiscriminant reduces to G(x) = (\u00b51 \u2212 \u00b50)T x, which induces a decision boundary that is orthogonal\nto the difference between the means of the two classes; G(x) is optimal under the assumptions that\nonly the \ufb01rst-order moments \u00b51 and \u00b50 are available.\nThe two linear discriminants have different practical advantages and disadvantages in practice. On\nthe one hand, \u03a3 can differ signi\ufb01cantly from \u03c32I, in which case G(x) performs much worse than\nF (x). On the other hand, estimating the covariance matrix \u03a3 is nontrivial, and hence F (x) may not\nbe available in a closed form. One possible choice to mitigate the above drawbacks is to use what we\ncall the pseudo Fisher\u2019s linear discriminant,\n\nSF (x) = [diag(\u03a3)\u22121(\u00b51 \u2212 \u00b50)]T x,\n\n(3)\n\nwhere diag(\u03a3) is the diagonal matrix of \u03a3; diag(\u03a3) preserves the information about variances, but\nneglects the correlations between the terms in x. This discriminant essentially allows each feature\nto contribute equally to the \ufb01nal score. More precisely, given a feature of a vertex v, say x(k)\nv , its\n\n5\n\n\f0\n\n1 \u2212\u00b5(k)\n0\n(\u03c3(k))2\n\ncorresponding weight according to SF (\u00b7) equals \u00b5(k)\n, where (\u03c3(k))2 denotes the variance of\nthe feature (i.e., the k-th component in the diagonal of \u03a3). Note that this weight may be rewritten as\n1 \u2212\u00b5(k)\n\u03c3(k) \u00d7 1\n\u00b5(k)\n\u03c3(k) ; the \ufb01rst term is a frequently-used metric for characterizing the predictiveness of a\nfeature, called the effect size [42], while the second term is a normalization term that positions all\nfeatures on the same scale.\nNext, we derive an expression for SF (x) pertinent to SE community detection, following the setting\nproposed for Fisher\u2019s linear discriminant in [20]. To model the community to be detected with\nseeds and the out-of-community portion of a graph respectively, we focus on two-block SBMs with\nparameters (n1, p1, n0, p0, q), and characterize both the means \u00b51, \u00b50 and the variances diag(\u03a3).\nNote that for notational simplicity, we \ufb01rst work with DNLPs {z(k)\nv }k\u22650 as the features of choice, as\nthey can remove degree-induced noise; the results for LPs {x(k)\n4.2 SF (\u00b7) Weights and the Inverse PageRank\nCharacterization of the means. Consider a two-block SBM with parameters (n1, p1, n0, p0, q).\nWithout loss of generality, assume that the seed lies in block C1. Due to the block-wise symmetry\nof \u00afA, for a \ufb01xed k \u2265 1, the DNLP \u00afz(k)\nis a constant for all v \u2208 Ci within the same community Ci,\ni \u2208 {0, 1}. Consequently, the mean of the kth DNLP (feature) of block Ci is set to \u00b5(k)\ni = \u00afz(k)\nv ,\nv \u2208 Ci, i \u2208 {0, 1}. Note that \u00afz(k)\ndoes not match the traditional de\ufb01nition of the expectation E(z(k)\nv ),\nalthough the two de\ufb01nitions are consistent when n1, n0 \u2192 \u221e due to Lemma 3.5.\nx(0)\nv = 1, and using\n\nChoosing the initial seed set to lie within one single community, e.g. (cid:80)\n\nv }k\u22650 are only stated brie\ufb02y.\n\nv\n\nv\n\nsome algebraic manipulations (see Section C of the Supplement), we obtain\n\nv\u2208C1\n\n1 \u2212 \u00af\u03bb2\n\nn1(n1p1 + n0q)\n\n1 \u2212 \u00b5(k)\n\u00b5(k)\n\n0 = c\u00af\u03bbk\n2,\n\nc =\n\n.\n\n(4)\n\n1 \u2212 \u00b5(k)\n\nRecall that \u00af\u03bb2 stands for the second largest eigenvalue of the mean-\ufb01eld random walk matrix \u00afW . The\nresult in (4) shows that the distance between the means of the DNLPs of the two classes decays with\nv }k\u22650, but the results in [20]\nk at a rate \u00af\u03bb2. This result is similar to its counterpart in [20] for LPs {x(k)\nadditionally requires \u00afdv = \u00afdu for vertices u and v belonging to different blocks. By only using the\ndifference \u00b5(k)\n0 without the variance, the authors of [20] proposed to use the discriminant\nG(x) = (\u00b51 \u2212 \u00b50)T x, which corresponds to PPR with \u03b1 = \u00af\u03bb2.\nCharacterization of the variances. Characterizing the variance of each feature is signi\ufb01cantly harder\nthan characterizing the means. Nevertheless, the results reported in Lemma 3.5 and Lemma 3.2 allow\nus to determine both E(z(k)\nv \ufb01rst. Lemma 3.5\n\nimplies that with high probability, (cid:107)z(k)\u2212 \u00afz(k)(cid:107)2 \u2264 k(cid:0)\u00af\u03bb + o(1)(cid:1)k\u22121 for all k. Figure 1 (Left) depicts\nk(cid:0)\u00af\u03bb + o(1)(cid:1)k\u22121; for large k, the norm is dominated by the \ufb01rst term in (1), induced by the variance\n\n2] for a given set of parameter choices. As it may be seen, the\n2 , where \u03bb2 is the second largest eigenvalue of the\nv , Lemma 3.2 establishes that (cid:107)x(k) \u2212 \u00afx(k)(cid:107)2 is upper bounded by\n\nthe empirical value of E[(cid:107)z(k) \u2212 \u00afz(k)(cid:107)2\nexpectation decays with a rate roughly equal to \u03bb2k\nRW matrix W . With regards to x(k)\n\nv )2. Let us consider z(k)\n\nv )2 and E(x(k)\n\nv \u2212 \u00afx(k)\n\nv \u2212 \u00afz(k)\n\nof the degrees. Figure 1 (Left) plots the empirical values of E[(cid:107)x(k) \u2212 \u00afx(k)(cid:107)2\n2] to support this \ufb01nding.\nCombining the characterizations of the means and variances, we arrive at the following conclusions.\nNormalized degree case. Although the expression established in (4) reveals that the distance\n(cid:1)k\nbetween the means of the landing probabilities decays as \u00af\u03bbk\n2, the corresponding standard deviation\n2. Hence, for the classi\ufb01er SF (\u00b7), the appropriate\n\u03c3(k) \u221d E[(cid:107)z(k) \u2212 \u00afz(k)(cid:107)2] also roughly decays as \u03bbk\nweights are \u03b3k = \u00b5(k)\nweighs different DNLPs according to their effect sizes [42]. Since \u03bb2 \u2192 \u00af\u03bb2 as n \u2192 \u221e, the ratio may\ndecay very slowly as k increases. As shown in the Figure 1 (Right), the classi\ufb01cation error rate based\non a one-step DNLP remains largely unchanged as k increases to some value exceeding the mixing\ntime. The second term in the product, \u03bb\u2212k\n2 , may be viewed as a factor that balances the scale of all\nDNLPs. Due to the observed variance, DNLPs with large k should be assigned weights much larger\nthan those used in G(x), i.e., \u03b3k = \u00b5(k)\n\n2 . The \ufb01rst term(cid:0)\u00af\u03bb2/\u03bb2\n\n(cid:1)k in the product\n\n2 =(cid:0)\u00af\u03bb2/\u03bb2\n\n1 \u2212\u00b5(k)\n(\u03c3(k))2 \u223c \u00af\u03bbk\n\n2 as suggested in [20].\n\n1 \u2212 \u00b5(k)\n\n0 = \u00af\u03bbk\n\n2/\u03bb2k\n\n\u03bb\u2212k\n\n0\n\n6\n\n\fFigure 1: Left: Empirical results illustrating the decay rate of the variances (cid:107)z(k) \u2212 \u00afz(k)(cid:107)2\n2 and (cid:107)x(k) \u2212 \u00afx(k)(cid:107)2\n2,\nfor an SBM with parameters (500, 0.2, 500, 0.2, 0.05), averaged over 1000 tests. With high-probability, \u03bb2\nslightly exceeds the corresponding mean-\ufb01eld value \u00af\u03bb2 [43]; Right: Classi\ufb01cation errors based on single-step\nDNLPs or LPs for SBMs with parameters (500, 0.05, 500, 0.05, q), q \u2208 {0.01, 0.02, 0.03}.\n\n1 \u2212\u00b5(k)\n(\u03c3(k))2 \u223c \u00af\u03bbk\n\n0\n\n2/(\u03c6 + \u03bbk\n\n2/(\u03c6 + \u03bbk\n\nv \u2192 dv/(cid:80)\n\nThe unnormalized degree case. The standard deviation \u03c3(k) \u221d E[(cid:107)x(k) \u2212 \u00afx(k)(cid:107)2] roughly scales\nas \u03c6 + \u03bbk\n2, where \u03c6 captures the noise introduced by the degrees. Typically, for a small number of\nsteps k, the noise introduced by degree variation is small compared to the total noise (See Figure 1\n2 = 1. The classi\ufb01er SF (\u00b7) suggests using the weights\n(Left)). Hence, we may assume that \u03c6 < \u03bb0\n2) \u00d7 (\u03c6 + \u03bbk\n2)\u22121, where \u00af\u03bbk\n\u03b3k = \u00b5(k)\n2) represents the effect size of the\nk-th LP. This result is con\ufb01rmed by simulations: In Figure 1 (Right), the classi\ufb01cation error rate based\non a one-step LP decreases for small k and increase after k exceeds the mixing time. Moreover, by\nv dv as k \u2192 \u221e, one can con\ufb01rm that the degree-based noise deteriorates\nrecalling that x(k)\nthe classi\ufb01cation accuracy.\nInverse PR. As already observed, for \ufb01nite n and with high probability, \u03bb2 only slightly exceeds \u00af\u03bb2.\nMoreover, for SBMs with unknown parameters or for real world networks, \u00af\u03bb2 may not be well-de\ufb01ned,\nor it may be hard to compute numerically. Hence, in practice, one may need to use the heuristic value\n\u00af\u03bb2 = \u03bb2 = \u03b8, where \u03b8 is a parameter to be tuned. In this case, SF (\u00b7) with degree normalization is\nassociated with the weights \u03b3k = \u03b8\u2212k, while SF (\u00b7) without degree normalization is associated with\nthe weights \u03b3k = \u03b8k/(\u03c6 + \u03b8k)2. When k is small, \u03b3k roughly increases as \u03b8\u2212k; we term a PR with\nthis choice of weights as the Inverse PR (IPR). Note that IPR with degree normalization may not\nconverge in practice, and LP information may be estimated only for a limited number of k steps. Our\nexperiments on real world networks reveal that a good choice for the maximum value of k is 4 \u2212 5\ntimes the maximal length of the shortest paths from all unlabeled vertices to the set of seeds.\nOther insights. Note that IPR resembles HPR when k is small and \u03b3k increases, as it dampens\nthe contributions of the \ufb01rst several steps of the RW. This result also agrees with the combinatorial\nanalysis in [12] that advocates the use of HPR for community detection. Note that IPR with\ndegree normalization has monotonically increasing weights, which re\ufb02ects the fact that community\ninformation is preserved even for large-step LPs. To some extent, this result can be viewed as a\ntheoretical justi\ufb01cation for the empirical fact that PPR is often used with \u03b1 (cid:39) 1 to achieve good\ncommunity detection performance [30].\n\n5 Experiments\n\nWe evaluate the performance of the IPR method over synthetic and large-scale real world networks.\nDatasets. The network data used for evaluation may be classi\ufb01ed into three categories. The \ufb01rst\ncategory contains networks sampled from two-block SBMs that satisfy the assumptions used to\nderive our theoretical results. The second category includes three real world networks, Citeseer [44],\nCora [45] and PubMed [46], all frequently used to evaluate community detection algorithms [47, 48].\nThese networks comprise several non-overlapping communities, and may be roughly modeled as\nSBMs. The third category includes the Amazon (product) network and the DBLP (collaboration)\nnetwork from the Stanford Network Analysis Project [49]. These networks contain thousands of\noverlapping communities, and their topologies differ signi\ufb01cantly from SBMs (see Table 2 in the\nSupplement for more details). For synthetic graphs, we use single-vertex seed-sets; for real world\ngraphs, we select 20 seeds uniformly at random from the community of interest.\nComparison of the methods. We compare the proposed IPRs with PPR and HPR methods, both\nwidely used for SE community detection [14, 50]. Methods that rely on training the weights were not\n\n7\n\n051015202530steps (k)10-910-810-710-610-5E[kz(k)\u2212\u00afz(k)k22/b2k]b=\u00af\u03bb2b=\u03bb2b=1.02\u2217\u03bb2051015202530steps (k)10-610-510-410-310-2E[knx(k)\u2212n\u00afx(k)k22]051015202530steps (k)00.050.10.150.20.250.30.350.40.450.5error rateDegree Normalizationq = 0.01q = 0.02q = 0.03051015202530steps (k)00.050.10.150.20.250.30.350.40.450.5error rateNo Normalizationq = 0.01q = 0.02q = 0.03\fFigure 2: (Left): Recalls (mean \u00b1 std) for different PRs over SBMs with parameters (500, 0.05, 500, 0.05, q),\nq \u2208 {0.02, 0.03}; (Right): Results over the Citeseer, Cora and PubMed networks (from left to right). First line:\nRecalls (mean \u00b1 std) of different PRs vs steps. The second line: Averaged recalls of different PRs for the top-Q\nvertices, obtained by accumulating the LPs with k \u2264 50.\n\nconsidered as they require outside-community vertex labels. For all three approaches, the default\nchoice is degree-normalization, indicated by the suf\ufb01x \u201c-d\u201d. For synthetic networks, the parameter \u03b8\nin IPR is set to \u00af\u03bb2 = 0.05\u2212q\n0.05+q , following the recommendations of Section 4.2. For real world networks,\nwe avoid computing \u03bb2 exactly and set \u03b8 \u2208 {0.99, 0.95, 0.90}. The parameters of the PPR and HPR\nare chosen to satisfy \u03b1 \u2208 {0.9, 0.95} and h \u2208 {5, 10} and to offer the best performance, as suggested\nin [50, 51, 14]. The results for all PRs are obtained by accumulating the values over the \ufb01rst k steps;\nthe choice for k is speci\ufb01ed for each network individually.\nEvaluation metric. We adopt a metric similar to the one used in [50]. There, one is given a graph,\na hidden community C to detect, and a vertex budget Q. For a potential ordering of the vertices,\nobtained via some GPR method, the top-Q set of vertices represents the predicted community P.\nThe evaluation metric used is |P \u2229 C|/|C|. By default, Q = |C|, if not speci\ufb01ed otherwise. Other\nmetrics, such as the Normalized Mutual Information and the F-score may be used instead, but since\nthey require additional parameters to determine the GPR classi\ufb01cation threshold, the results may not\nallow for simple and fair comparisons. For SBMs, we independently generated 1000 networks for\nevery set of parameters. For each network, the results are summarized based on 1000 independently\nchosen seed sets for each community-network pair and then averaged over over all communities.\n\n5.1 Performance Evaluation\n\nSynthetic graphs. In synthetic networks, all three PRs with degree normalization perform signi\ufb01-\ncantly better than their unnormalized degree counterparts. Thus, we only present results for the \ufb01rst\nclass of methods in Figure 2 (Left). As predicted in Section 4.2, IPR-d offers substantially better\ndetection performance than either PPR-d and HPR-d, and is close in quality to belief propagation\n(BP). Note that the recall of IPR-d keeps increasing with the number of steps. This means that even\nfor large values of k, the landing probabilities remain predictive of the community structures, and\ndecreasing the weights with k as in HPR and PPR is not appropriate for these synthetic graphs. The\nclassi\ufb01er G(x), i.e., a PPR with parameters p\u2212q\np+q suggested by [20], has worse performance than the\nPPR method with parameter 0.95 and is hence not depicted.\nCiteseer, Cora and PubMed. Here as well, PRs with degree normalization perform better than PRs\nwithout degree normalization. Hence, we only display the results obtained with degree normalization.\nThe \ufb01rst line of Figure 2 (Right) shows that IPR-d 0.99 signi\ufb01cantly outperforms both PPR-d and\nHPR-d for all three networks. Moreover, the performance of IPR-d 0.99 improves with increasing k,\nonce again establishing that LPs for large k are still predictive. The results for IPR-d 0.90, 0.95 and\na related discussion are postponed to Section A.1 in the Supplement.\nThe second line of Figure 2 (Right) illustrates the rankings of vertices within the predicted community\ngiven the \ufb01rst 50 steps of the RW. Note that only for the Citeseer network does PPR provide a better\nranking of vertices in the community for small Q; for the other two networks, IPR outperforms PPR\nand HPR on the whole ranking of vertices.\n\n8\n\n051015202530steps (k)0.50.550.60.650.70.750.80.850.90.951recallq=0.02IPR-d\u00af\u03bb2PPR-d0.95HPR-d10BP020406080100steps (k)0.550.60.650.7recallCiteseerIPR-d 0.99PPR-d 0.95HPR-d 10020406080100steps (k)0.650.70.750.8recallCoraIPR-d 0.99PPR-d 0.95HPR-d 10020406080100steps (k)0.550.60.650.70.750.8recallPubMedIPR-d 0.99PPR-d 0.95HPR-d 10051015202530steps (k)0.50.550.60.650.70.750.80.85recallq=0.03IPR-d\u00af\u03bb2PPR-d0.95HPR-d10BP020406080100percentile: Q/|C|\u00d7 10000.10.20.30.40.50.60.70.8recallCiteseerIPR-d 0.99PPR-d 0.95HPR-d 10020406080100percentile: Q/|C|\u00d7 10000.10.20.30.40.50.60.70.8recallCoraIPR-d 0.99PPR-d 0.95HPR-d 10020406080100percentile: Q/|C|\u00d7 10000.10.20.30.40.50.60.70.8recallPubMedIPR-d 0.99PPR-d 0.95HPR-d 10\fSteps k\n\n5\n\n10\n\n15\n\n20\n\n5\n\n10\n\n15\n\n20\n\nAmazon (std: \u00b10.12)\n\nPPR\nHPR\n\nIPR0.99\nIPR0.95\nIPR0.90\n\n46.63\n46.64\n46.67\n46.57\n47.20\n\n48.03\n48.04\n48.08\n47.92\n48.36\n\n48.43\n48.44\n48.45\n48.30\n48.54\n\n29.27\n29.28\n29.32\n29.06\n28.85\nTable 1: Recalls (mean \u00b1 std) for different PRs over the Amazon and\nthe DBLP networks. The boldfaced values are those within one std away\nfrom the optimal values for a given \ufb01xed k.\n\n48.53\n48.53\n48.53\n48.43\n48.55\n\n27.58\n27.60\n27.64\n27.46\n28.24\n\nDBLP (std: \u00b10.09)\n29.18\n29.20\n29.26\n28.90\n28.85\n\n28.78\n28.94\n29.14\n28.49\n28.80\n\nFigure 3: Recalls based on one-step\nLPs and one-step DNLPs.\nAmazon, DBLP. We \ufb01rst preprocess these networks by following a standard approach described\nin Section A.2 of the Supplement. As opposed to the networks in the previous two categories,\nthe information in the vertex degrees is extremely predictive of the community membership for\nthis category. Figure 3 shows the predictiveness based on one-step LPs and DNLPs for these two\nnetworks. As may be seen, degree normalization may actually hurt the predictive performance of\nLPs for these two networks. This observation coincides with the \ufb01nding in [50]. Hence, for this\ncase, we do not perform degree normalization. As recommended in Section 4.2, the weights are\nchosen as \u03b3k = \u03b8k\n(\u03b8k+\u03c6)2 , where \u03b8, \u03c6 are parameters to be tuned. The value of \u03c6 typically depends on\nhow informative the degree of a vertex is. Here, we simply set \u03c6 = \u03b810 which makes \u03b3k achieve its\nmaximal value for k = 10. We also \ufb01nd that for both networks, \u03b1 = 0.95 is a good choice for PPR\nwhile for HPR, h = 10 and h = 5 are adequate for the Amazon and the DBLP network, respectively.\nFurther results are listed in Table 1, indicating that HPR outperforms other PR methods when k = 5;\nHPR is used with parameter \u2265 5, and the weights for the \ufb01rst 5 steps in HPR increase. This yet again\ncon\ufb01rms our \ufb01ndings regarding the predictiveness of large-step LPs. For larger k, IPR matches the\nperformance of HPR and even outperforms HPR on the DBLP network. Vertex rankings within the\ncommunities are available in Section A.2 of the Supplement.\n6 Discussion and Future Directions\nThere are many directions that may be pursued in future studies, including:\n(1) Our non-asymptotic analysis works for relatively dense graphs for which the minimum degree\nequals \u00afdmin = \u03c9(log n). A relevant problem is to investigate the behavior of GPR over sparse graphs.\n(2) The derived weights ignore the correlations between LPs corresponding to different step-lengths.\nCharacterizing the correlations is a particularly challenging and interesting problem.\n(3) Recently, research for network analysis has focused on networks with higher-order structures.\nPPR and HPR-based methods have been generalized to the higher-order setting [52, 53]. Analysis\nhas shown that these higher-order GPR methods may be used to detect communities of networks that\napproximates higher-order network (motif/hypergraph) conductance [54, 53]. Related works also\nshowed that PR-based approaches are powerful for practical community detection with higher-order\nstructures [55]. Hence, generalizing our analysis to higher-order structure clustering is another topic\nfor future consideration. A follow-up work on the mean-\ufb01eld analysis of higher-order GPR methods\nmay be found in [56].\n(4) Our work provides new insights regarding SE community detection. Re-deriving the non-\nasymptotic results for other GPR-based applications, including recommender system design and link\nprediction, is another class of problems of interest. For example, GRP/RW-based approaches are\nfrequently used on commodities-user bipartite graphs of recommender systems. There, one may\nmodel the network as a random graph with independent edges that correspond to one-time purchases\ngoverned by preference scores of the users. Similarities of vertices can also be characterized by GPRs\nand used to predict emerging links in networks [4]. In this setting, it is reasonable to assume that the\ngraph is edge-independent but with different edge probabilities. Analyzing how the GPR weights\nin\ufb02uence the similarity scores to infer edge probabilities may improve the performance of current\nlink prediction methods.\n7 Acknowledgement\nThis work was supported by the NSF STC Center for Science of Information at Purdue University.\nThe authors also gratefully acknowledge useful discussions with Prof. David Gleich from Purdue\nUniversity.\n\n9\n\n051015202530steps (k)0.10.20.30.40.50.60.7recallAmazonAmazon-dDBLPDBLP-d\fReferences\n[1] L. Page, S. Brin, R. Motwani, and T. Winograd, \u201cThe pagerank citation ranking: Bringing order\n\nto the web.\u201d Stanford InfoLab, Tech. Rep., 1999.\n\n[2] S. Fortunato, \u201cCommunity detection in graphs,\u201d Physics reports, vol. 486, no. 3-5, pp. 75\u2013174,\n\n2010.\n\n[3] J. J. Whang, D. F. Gleich, and I. S. Dhillon, \u201cOverlapping community detection using seed\nset expansion,\u201d in Proceedings of the 22nd ACM international conference on Conference on\ninformation & knowledge management. ACM, 2013, pp. 2099\u20132108.\n\n[4] D. Liben-Nowell and J. Kleinberg, \u201cThe link-prediction problem for social networks,\u201d Journal\nof the American society for information science and technology, vol. 58, no. 7, pp. 1019\u20131031,\n2007.\n\n[5] P. Massa and P. Avesani, \u201cTrust-aware recommender systems,\u201d in Proceedings of the 2007 ACM\n\nconference on Recommender systems. ACM, 2007, pp. 17\u201324.\n\n[6] Z. Abbassi and V. S. Mirrokni, \u201cA recommender system based on local random walks and\nspectral methods,\u201d in Proceedings of the 9th WebKDD and 1st SNA-KDD workshop. ACM,\n2007, pp. 102\u2013108.\n\n[7] D. F. Gleich, \u201cPagerank beyond the web,\u201d SIAM Review, vol. 57, no. 3, pp. 321\u2013363, 2015.\n[8] R. Baeza-Yates, P. Boldi, and C. Castillo, \u201cGeneralizing pagerank: Damping functions for\nlink-based ranking algorithms,\u201d in Proceedings of the 29th annual international ACM SIGIR\nconference on Research and development in information retrieval. ACM, 2006, pp. 308\u2013315.\n[9] J. Kleinberg, \u201cLink prediction with combinatorial structure: Block models and simplicial\n\ncomplexes,\u201d in Companion of the The Web Conference 2018 (BigNet)., 2018.\n\n[10] G. Jeh and J. Widom, \u201cScaling personalized web search,\u201d in Proceedings of the 12th interna-\n\ntional conference on World Wide Web. Acm, 2003, pp. 271\u2013279.\n\n[11] F. Chung, \u201cThe heat kernel as the pagerank of a graph,\u201d Proceedings of the National Academy\n\nof Sciences, vol. 104, no. 50, pp. 19 735\u201319 740, 2007.\n\n[12] \u2014\u2014, \u201cA local graph partitioning algorithm using heat kernel pagerank,\u201d Internet Mathematics,\n\nvol. 6, no. 3, pp. 315\u2013330, 2009.\n\n[13] F. R. Chung and F. C. Graham, Spectral graph theory. American Mathematical Soc., 1997,\n\nno. 92.\n\n[14] K. Kloster and D. F. Gleich, \u201cHeat kernel based community detection,\u201d in Proceedings of the\n20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,\n2014, pp. 1386\u20131395.\n\n[15] H. Avron and L. Horesh, \u201cCommunity detection using time-dependent personalized pagerank,\u201d\n\nin Proceedings of International Conference on Machine Learning, 2015, pp. 1795\u20131803.\n\n[16] D. F. Gleich and R. A. Rossi, \u201cA dynamical system for pagerank with time-dependent teleporta-\n\ntion,\u201d Internet Mathematics, vol. 10, no. 1-2, pp. 188\u2013217, 2014.\n\n[17] B. Jiang, K. Kloster, D. F. Gleich, and M. Gribskov, \u201cAptrank: an adaptive pagerank model\nfor protein function prediction on bi-relational graphs,\u201d Bioinformatics, vol. 33, no. 12, pp.\n1829\u20131836, 2017.\n\n[18] D. Berberidis, A. Nikolakopoulos, and G. Giannakis, \u201cAdaptive diffusions for scalable learning\nover graphs,\u201d in Mining and Learning with Graphs Workshop @ ACM KDD 2018, 8 2018, p. 1.\n[19] Z. Yin, M. Gupta, T. Weninger, and J. Han, \u201cA uni\ufb01ed framework for link recommendation\nusing random walks,\u201d in Advances in Social Networks Analysis and Mining (ASONAM), 2010\nInternational Conference on.\n\nIEEE, 2010, pp. 152\u2013159.\n\n[20] I. M. Kloumann, J. Ugander, and J. Kleinberg, \u201cBlock models and personalized pagerank,\u201d\n\nProceedings of the National Academy of Sciences, vol. 114, no. 1, pp. 33\u201338, 2017.\n\n[21] P. W. Holland, K. B. Laskey, and S. Leinhardt, \u201cStochastic blockmodels: First steps,\u201d Social\n\nnetworks, vol. 5, no. 2, pp. 109\u2013137, 1983.\n\n[22] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, \u201cSpectral networks and locally connected\n\nnetworks on graphs,\u201d arXiv preprint arXiv:1312.6203, 2013.\n\n10\n\n\f[23] M. Defferrard, X. Bresson, and P. Vandergheynst, \u201cConvolutional neural networks on graphs\nwith fast localized spectral \ufb01ltering,\u201d in Advances in neural information processing systems,\n2016, pp. 3844\u20133852.\n\n[24] T. N. Kipf and M. Welling, \u201cSemi-supervised classi\ufb01cation with graph convolutional networks,\u201d\n\narXiv preprint arXiv:1609.02907, 2016.\n\n[25] W. Hamilton, Z. Ying, and J. Leskovec, \u201cInductive representation learning on large graphs,\u201d in\n\nAdvances in Neural Information Processing Systems, 2017, pp. 1024\u20131034.\n\n[26] P. Veli\u02c7ckovi\u00b4c, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, \u201cGraph attention\n\nnetworks,\u201d arXiv preprint arXiv:1710.10903, 2017.\n\n[27] J. Klicpera, A. Bojchevski, and S. G\u00fcnnemann, \u201cPredict then propagate: Graph neural networks\nmeet personalized pagerank,\u201d in International Conference on Learning Representations, 2019.\n[28] A. Bojchevski, J. Klicpera, B. Perozzi, M. Blais, A. Kapoor, M. Lukasik, and S. G\u00fcnnemann,\n\u201cIs pagerank all you need for scalable graph neural networks?\u201d in ACM KDD, MLG Workshop,\n2019.\n\n[29] J. Klicpera, S. Wei\u00dfenberger, and S. G\u00fcnnemann, \u201cDiffusion improves graph learning,\u201d in\n\nAdvances in Neural Information Processing Systems, 2019, pp. 13 333\u201313 345.\n\n[30] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, \u201cCommunity structure in large\nnetworks: Natural cluster sizes and the absence of large well-de\ufb01ned clusters,\u201d Internet Mathe-\nmatics, vol. 6, no. 1, pp. 29\u2013123, 2009.\n\n[31] K. Avrachenkov, A. Kadavankandy, L. O. Prokhorenkova, and A. Raigorodskii, \u201cPagerank\nin undirected random graphs,\u201d in International Workshop on Algorithms and Models for the\nWeb-Graph. Springer, 2015, pp. 151\u2013163.\n\n[32] K. Avrachenkov, A. Kadavankandy, and N. Litvak, \u201cMean \ufb01eld analysis of personalized pager-\n\nank with implications for local graph clustering,\u201d arXiv preprint arXiv:1806.07640, 2018.\n\n[33] R. Andersen, F. Chung, and K. Lang, \u201cLocal graph partitioning using pagerank vectors,\u201d in\nIEEE\n\nProceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science.\nComputer Society, 2006, pp. 475\u2013486.\n\n[34] P. Erd\u02ddos and A. R\u00e9nyi, \u201cOn random graphs i,\u201d Publ. Math. Debrecen, vol. 6, pp. 290\u2013297, 1959.\n[35] W. Aiello, F. Chung, and L. Lu, \u201cA random graph model for power law graphs,\u201d Experimental\n\nMathematics, vol. 10, no. 1, pp. 53\u201366, 2001.\n\n[36] B. Karrer and M. E. Newman, \u201cStochastic blockmodels and community structure in networks,\u201d\n\nPhysical review E, vol. 83, no. 1, p. 016107, 2011.\n\n[37] N. Chen, N. Litvak, and M. Olvera-Cravioto, \u201cGeneralized pagerank on directed con\ufb01guration\n\nnetworks,\u201d Random Structures & Algorithms, vol. 51, no. 2, pp. 237\u2013274, 2017.\n\n[38] R. A. Fisher, \u201cThe use of multiple measurements in taxonomic problems,\u201d Annals of eugenics,\n\nvol. 7, no. 2, pp. 179\u2013188, 1936.\n\n[39] E. Mossel, J. Neeman, and A. Sly, \u201cBelief propagation, robust reconstruction and optimal\n\nrecovery of block models,\u201d in Conference on Learning Theory, 2014, pp. 356\u2013370.\n\n[40] P. Zhang and C. Moore, \u201cScalable detection of statistically signi\ufb01cant communities and hierar-\nchies, using message passing for modularity,\u201d Proceedings of the National Academy of Sciences,\nvol. 111, no. 51, pp. 18 144\u201318 149, 2014.\n\n[41] E. Abbe and C. Sandon, \u201cDetection in the stochastic block model with multiple clusters: proof of\nthe achievability conjectures, acyclic bp, and the information-computation gap,\u201d arXiv preprint\narXiv:1512.09080, 2015.\n\n[42] K. Kelley and K. J. Preacher, \u201cOn effect size.\u201d Psychological methods, vol. 17, no. 2, p. 137,\n\n2012.\n\n[43] T. Tao, Topics in random matrix theory. American Mathematical Soc., 2012, vol. 132.\n[44] C. L. Giles, K. D. Bollacker, and S. Lawrence, \u201cCiteseer: An automatic citation indexing\nsystem,\u201d in Proceedings of the third ACM conference on Digital libraries. ACM, 1998, pp.\n89\u201398.\n\n[45] A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, \u201cAutomating the construction of\ninternet portals with machine learning,\u201d Information Retrieval, vol. 3, no. 2, pp. 127\u2013163, 2000.\n\n11\n\n\f[46] G. Namata, B. London, L. Getoor, B. Huang, and U. EDU, \u201cQuery-driven active surveying for\ncollective classi\ufb01cation,\u201d in 10th International Workshop on Mining and Learning with Graphs,\n2012.\n\n[47] T. Yang, R. Jin, Y. Chi, and S. Zhu, \u201cCombining link and content for community detection: a\ndiscriminative approach,\u201d in Proceedings of the 15th ACM SIGKDD international conference\non Knowledge discovery and data mining. ACM, 2009, pp. 927\u2013936.\n\n[48] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, \u201cCollective classi\ufb01ca-\n\ntion in network data,\u201d AI magazine, vol. 29, no. 3, p. 93, 2008.\n\n[49] J. Leskovec and R. Sosi\u02c7c, \u201cSnap: A general-purpose network analysis and graph-mining library,\u201d\n\nACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, no. 1, p. 1, 2016.\n\n[50] I. M. Kloumann and J. M. Kleinberg, \u201cCommunity membership identi\ufb01cation from small\nseed sets,\u201d in Proceedings of the 20th ACM SIGKDD international conference on Knowledge\ndiscovery and data mining. ACM, 2014, pp. 1366\u20131375.\n\n[51] Y. Li, K. He, D. Bindel, and J. E. Hopcroft, \u201cUncovering the small community structure in large\nnetworks: A local spectral approach,\u201d in Proceedings of the 24th international conference on\nworld wide web.\nInternational World Wide Web Conferences Steering Committee, 2015, pp.\n658\u2013668.\n\n[52] P. Li, N. He, and O. Milenkovic, \u201cQuadratic decomposable submodular function minimization,\u201d\n\nin Advances in Neural Information Processing Systems, 2018, pp. 1062\u20131072.\n\n[53] M. Ikeda, A. Miyauchi, Y. Takai, and Y. Yoshida, \u201cFinding cheeger cuts in hypergraphs via heat\n\nequation,\u201d arXiv:1809.04396, 2018.\n\n[54] P. Li, N. He, and O. Milenkovic, \u201cQuadratic decomposable submodular function minimization:\n\nTheory and practice,\u201d arXiv preprint arXiv:1902.10132, 2019.\n\n[55] H. Yin, A. R. Benson, J. Leskovec, and D. F. Gleich, \u201cLocal higher-order graph clustering,\u201d in\nProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and\nData Mining. ACM, 2017, pp. 555\u2013564.\n\n[56] E. Chien, P. Li, and O. Milenkovic, \u201cLanding probabilities of random walks for seed-set\n\nexpansion in hypergraphs,\u201d 2019.\n\n[57] K. He, Y. Sun, D. Bindel, J. Hopcroft, and Y. Li, \u201cDetecting overlapping communities from\nlocal spectral subspaces,\u201d in Data Mining (ICDM), 2015 IEEE International Conference on.\nIEEE, 2015, pp. 769\u2013774.\n\n[58] F. Chung and M. Radcliffe, \u201cOn the spectra of general random graphs,\u201d the electronic journal of\n\ncombinatorics, vol. 18, no. 1, p. 215, 2011.\n\n[59] H. Weyl, \u201cDas asymptotische verteilungsgesetz der eigenwerte linearer partieller differential-\ngleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung),\u201d Mathematische\nAnnalen, vol. 71, no. 4, pp. 441\u2013479, 1912.\n\n12\n\n\f", "award": [], "sourceid": 6251, "authors": [{"given_name": "Pan", "family_name": "Li", "institution": "Stanford"}, {"given_name": "I", "family_name": "Chien", "institution": "UIUC"}, {"given_name": "Olgica", "family_name": "Milenkovic", "institution": "University of Illinois at Urbana-Champaign"}]}