{"title": "Community Detection via Measure Space Embedding", "book": "Advances in Neural Information Processing Systems", "page_first": 2890, "page_last": 2898, "abstract": "We present a new algorithm for community detection. The algorithm uses random walks to embed the graph in a space of measures, after which a modification of $k$-means in that space is applied. The algorithm is therefore fast and easily parallelizable. We evaluate the algorithm on standard random graph benchmarks, including some overlapping community benchmarks, and find its performance to be better or at least as good as previously known algorithms. We also prove a linear time (in number of edges) guarantee for the algorithm on a $p,q$-stochastic block model with where $p \\geq c\\cdot N^{-\\half + \\epsilon}$ and $p-q \\geq c' \\sqrt{p N^{-\\half + \\epsilon} \\log N}$.", "full_text": "Community Detection via Measure Space Embedding\n\nMark Kozdoba\n\nThe Technion, Haifa, Israel\n\nmarkk@tx.technion.ac.il\n\nShie Mannor\n\nThe Technion, Haifa, Israel\n\nshie@ee.technion.ac.il\n\nAbstract\n\nWe present a new algorithm for community detection. The algorithm uses ran-\ndom walks to embed the graph in a space of measures, after which a modi\ufb01cation\nof k-means in that space is applied. The algorithm is therefore fast and easily\nparallelizable. We evaluate the algorithm on standard random graph benchmarks,\nincluding some overlapping community benchmarks, and \ufb01nd its performance to\nbe better or at least as good as previously known algorithms. We also prove a\nlinear time (in number of edges) guarantee for the algorithm on a p, q-stochastic\nblock model with where p \u2265 c \u00b7 N\u2212 1\n\n2 +\u0001 and p \u2212 q \u2265 c(cid:48)(cid:113)\n\npN\u2212 1\n\n2 +\u0001 log N.\n\n1\n\nIntroduction\n\nCommunity detection in graphs, also known as graph clustering, is a problem where one wishes to\nidentify subsets of the vertices of a graph such that the connectivity inside the subset is in some\nway denser than the connectivity of the subset with the rest of the graph. Such subsets are referred\nto as communities, and it often happens in applications that if two vertices belong to the same\ncommunity, they have similar application-related qualities. This in turn may allow for a higher level\nanalysis of the graph, in terms of communities instead of individual nodes. Community detection\n\ufb01nds applications in a diversity of \ufb01elds, such as social networks analysis, communication and traf\ufb01c\ndesign, in biological networks, and, generally, in most \ufb01elds where meaningful graphs can arise (see,\nfor instance, [1] for a survey). In addition to direct applications to graphs, community detection can,\nfor instance, be also applied to general Euclidean space clustering problems, by transforming the\nmetric to a weighted graph structure (see [2] for a survey).\nCommunity detection problems come in different \ufb02avours, depending on whether the graph in ques-\ntion is simple, or weighted, or/and directed. Another important distinction is whether the communi-\nties are allowed to overlap or not. In the overlapping communities case, each vertex can belong to\nseveral subsets.\nA dif\ufb01culty with community detection is that the notion of community is not well de\ufb01ned. Differ-\nent algorithms may employ different formal notions of a community, and can sometimes produce\ndifferent results. Nevertheless, there exist several widely adopted benchmarks \u2013 synthetic models\nand real-life graphs \u2013 where the ground truth communities are known, and algorithms are evalu-\nated based on the similarity of the produced output to the ground truth, and based on the amount\nof required computations. On the theoretical side, most of the effort is concentrated on developing\nalgorithms with guaranteed recovery of clusters for graphs generated from variants of the Stochastic\nBlock Model (referred to as SBM in what follows, [1]).\nIn this paper we present a new algorithm, DER (Diffusion Entropy Reducer, for reasons to be clari-\n\ufb01ed later), for non-overlapping community detection. The algorithm is an adaptation of the k-means\nalgorithm to a space of measures which are generated by short random walks from the nodes of the\ngraph. The adaptation is done by introducing a certain natural cost on the space of the measures.\nAs detailed below, we evaluate the DER on several benchmarks and \ufb01nd its performance to be as\ngood or better than the best alternative method. In addition, we establish some theoretical guarantees\n\n1\n\n\fon its performance. While the main purpose of the theoretical analysis in this paper is to provide\nsome insight into why DER works, our result is also one of a few results in the literature that show\nreconstruction in linear time.\nOn the empirical side, we \ufb01rst evaluate our algorithm on a set of random graph benchmarks known\nas the LFR models, [3].\nIn [4], 12 other algorithms were evaluated on these benchmarks, and\nthree algorithms, described in [5], [6] and [7], were identi\ufb01ed, that exhibited signi\ufb01cantly better\nperformance than the others, and similar performance among themselves. We evaluate our algorithm\non random graphs with the same parameters as those used in [4] and \ufb01nd its performance to be\nas good as these three best methods. Several well known methods, including spectral clustering\n[8], exhaustive modularity optimization (see [4] for details), and clique percolation [9], have worse\nperformance on the above benchmarks.\nNext, while our algorithm is designed for non-overlapping communities, we introduce a simple\nmodi\ufb01cation that enables it to detect overlapping communities in some cases. Using this modi\ufb01ca-\ntion, we compare the performance of our algorithm to the performance of 4 overlapping community\nalgorithms on a set of benchmarks that were considered in [10]. We \ufb01nd that in all cases DER per-\nforms better than all 4 algorithms. None of the algorithms evaluated in [4] and [3] has theoretical\nguarantees.\nOn the theoretical side, we show that DER reconstructs with high probability the partition of the\np, q-stochastic block model such that, roughly, p \u2265 N\u2212 1\n2 , where N is the number of vertices, and\np \u2212 q \u2265 c\nq \u2265 c(cid:48) > 1) for some constant c > 0.\nWe show that for this reconstruction only one iteration of the k-means is suf\ufb01cient. In fact, three\npassages over the set of edges suf\ufb01ce. While the cost function we introduce for DER will appear at\n\ufb01rst to have purely probabilistic motivation, for the purposes of the proof we provide an alternative\ninterpretation of this cost in terms of the graph, and the arguments show which properties of the\ngraph are useful for the convergence of the algorithm.\nFinally, although this is not the emphasis of the present paper, it is worth noting here that, as will\nbe evident later, our algorithm can be trivially parallelalized. This seems to be a particularly nice\nfeature since most other algorithms, including spectral clustering, are not easy to parallelalize and\ndo not seem to have parallel implementations at present.\nThe rest of the paper is organized as follows: Section 2 overviews related work and discusses rela-\ntions to our results. In Section 3 we provide the motivation for the de\ufb01nition of the algorithm, derive\nthe cost function and establish some basic properties. Section 4 we present the results on the empir-\nical evaluation of the algorithm and Section 5 describes the theoretical guarantees and the general\nproof scheme. Some proofs and additional material are provided in the supplementary material.\n\n2 +\u0001 log N (this holds in particular when p\n\n(cid:113)\n\npN\u2212 1\n\n2 Literature review\n\nCommunity detection in graphs has been an active research topic for the last two decades and gener-\nated a huge literature. We refer to [1] for an extensive survey. Throughout the paper, let G = (V, E)\nbe a graph, and let P = P1, . . . , Pk be a partition of V . Loosely speaking, a partition P is a good\ncommunity structure on G if for each Pi \u2208 P , more edges stay within Pi than leave Pi. This is\nusually quanti\ufb01ed via some cost function that assigns larger scalars to partitions P that are in some\nsense better separated. Perhaps the most well known cost function is the modularity, which was\nintroduced in [11] and served as a basis of a large number of community detection algorithms ([1]).\nThe popular spectral clustering methods, [8]; [2], can also be viewed as a (relaxed) optimization of\na certain cost (see [2]).\nYet another group of algorithms is based on \ufb01tting a generative model of a graph with communities\nto a given graph. References [12]; [10] are two among the many examples. Perhaps the simplest\ngenerative model for non-overlapping communities is the stochastic block model, see [13],[1] which\nwe now de\ufb01ne: Let P = P1, . . . , Pk be a partition of V into k subsets. p, q-SBM is a distribution\nover the graphs on vertex set V , such that all edges are independent and for i, j \u2208 V , the edge (i, j)\nexists with probability p if i, j belong to the same Ps, and it exists with probability q otherwise. If\nq << p, the components Pi will be well separated in this model. We denote the number of nodes\nby N = |V | throughout the paper.\n\n2\n\n\fGraphs generated from SBMs can serve as a benchmark for community detection algorithms. How-\never, such graphs lack certain desirable properties, such as power-law degree and community size\ndistributions. Some of these issues were \ufb01xed in the benchmark models in [3]; [14], and these mod-\nels are referred to as LFR models in the literature. More details on these models are given in Section\n4.\nWe now turn to the discussion of the theoretical guarantees. Typically results in this direction provide\nalgorithms that can reconstruct,with high probability, the ground partition of a graph drawn from a\nvariant of a p, q-SBM model, with some, possibly large, number of components k. Recent results\ninclude the works [15] and [16]. In this paper, however, we only analytically analyse the k = 2 case,\nand such that, in addition, |P1| = |P2|.\nFor this case, the best known reconstruction result was obtained already in [17] and was only im-\nproved in terms of runtime since then. Namely, Bopanna\u2019s result states that if p \u2265 c1\nN and\np \u2212 q \u2265 c2\nN , then with high probability the partition is reconstructible. Similar bound can be\nobtained, for instance, from the approaches in [15]; [16], to name a few. The methods in this group\nare generally based on the spectral properties of adjacency (or related) matrices. The run time of\nthese algorithms is non-linear in the size of the graph and it is not known how these algorithms\nbehave on graphs not generated by the probabilistic models that they assume.\nIt is generally known that when the graphs are dense (p of order of constant), simple linear time\nreconstruction algorithms exist (see [18]). The \ufb01rst, and to the best of our knowledge, the only\nprevious linear time algorithm for non dense graphs was proposed in [18]. This algorithm works\nfor p \u2265 c3(\u0001)N\u2212 1\n2 +\u0001, for any \ufb01xed \u0001 > 0. The approach of [18] was further extended in [19],\nto handle more general cluster sizes. These approaches approaches differ signi\ufb01cantly from the\nspectrum based methods, and provide equally important theoretical insight. However, their empir-\nical behaviour was never studied, and it is likely that even for graphs generated from the SBM,\nextremely high values of N would be required for the algorithms to work, due to large constants in\nthe concentration inequalities (see the concluding remarks in [19]).\n\nlog N\n\nlog N\n\n3 Algorithm\n\ndi =(cid:80)\nLet G be a \ufb01nite undirected graph with a vertex set V = {1, . . . , n}. Denote by A = {aij} the\nsymmetric adjacency matrix of G, where aij \u2265 0 are edge weights, and for a vertex i \u2208 V , set\nj aij to be the degree of i. Let D be an n \u00d7 n diagonal matrix such that Dii = di, and set\nT = D\u22121A to be the transition matrix of the random walk on G. Set also pij = Tij. Finally, denote\nby \u03c0, \u03c0(i) = di(cid:80)\n\nthe stationary measure of the random walk.\n\nj dj\n\nA number of community detection algorithms are based on the intuition that distinct communities\nshould be relatively closed under the random walk (see [1]), and employ different notions of closed-\nness. Our approach also takes this point of view.\nFor a \ufb01xed L \u2208 N, consider the following sampling process on the graph: Choose vertex v0 ran-\ndomly from \u03c0, and perform L steps of a random walk on G, starting from v0. This results in a\nlength L + 1 sequence of vertices, x1. Repeat the process N times independently, to obtain also\nx1, . . . , xN .\nSuppose now that we would like to model the sequences xs as a multinomial mixture model with a\nsingle component. Since each coordinate xs\nt is distributed according to \u03c0, the single component of\nthe mixture should be \u03c0 itself, when N grows. Now suppose that we would like to model the same\nsequences with a mixture of two components. Because the sequences are sampled from a random\nwalk rather then independently from each other, the components need no longer be \u03c0 itself, as in any\nmixture where some elements appear more often together then others. The mixture as above can be\nfound using the EM algorithm, and this in principle summarizes our approach. The only additional\nstep, as discussed above, is to replace the sampled random walks with their true distributions, which\nsimpli\ufb01es the analysis and also leads to somewhat improved empirical performance.\nWe now present the DER algorithm for detecting the non-overlapping communities. Its input is\nthe number of components to detect, k, the length of the walks L, an initialization partition P =\n\n3\n\n\fAlgorithm 1 DER\n1: Input: Graph G, walk length L,\n\nnumber of components k.\n\n2: Compute the measures wi.\n3: Initialize P1, . . . , Pk to be a random partition such that\n|Pi| = |V |/k for all i.\n(1) For all s \u2264 k, construct \u00b5s = \u00b5Ps.\n(2) For all s \u2264 k, set\n\n4: repeat\n5:\n6:\n\n7: until the sets Ps do not change\n\nPs =\n\ni \u2208 V | s = argmax\n\nl\n\n(cid:26)\n\n(cid:27)\n\nD(wi, \u00b5l)\n\n.\n\n{P1, . . . , Pk} of V into disjoint subsets. P would be usually taken to be a random partition of V\ninto equally sized subsets.\nFor t = 0, 1, . . . and a vertex i \u2208 V , denote by wt\ni is the\ndistribution of the random walk on G, started at i, after t steps. Set wi = 1\ni ), which\nis the distribution corresponding to the average of the empirical measures of sequences x that start\nat i.\nFor two probability measures \u03bd, \u00b5 on V , set\n\ni the i-th row of the matrix T t. Then wt\n\ni + . . . + wL\n\nL (w1\n\nD(\u03bd, \u00b5) =\n\n\u03bd(i) log \u00b5(i).\n\n(cid:88)\n\ni\u2208V\n\n(cid:88)\n\ni\u2208S\n\n\u00b5S =\n\n1\ndS\n\nAlthough D is not a metric, will act as a distance function in our algorithm. Note that if \u03bd was an\nempirical measure, then, up to a constant, D would be just the log-likelihood of observing \u03bd from\nindependent samples of \u00b5.\n\nFor a subset S \u2282 V , set \u03c0S to be the restriction of the measure \u03c0 to S, and also set dS =(cid:80)\n\ni\u2208S di\n\nto be the full degree of S. Let\n\ndiwi\n\n(1)\n\ndenote the distribution of the random walk started from \u03c0S.\nThe complete DER algorithm is described in Algorithm 1.\nThe algorithm is essentially a k-means algorithm in a non-Euclidean space, where the points are\nthe measures wi, each occurring with multiplicity di. Step (1) is the \u201cmeans\u201d step, and (2) is the\nmaximization step.\nLet\n\nC =\n\ndi \u00b7 D(wi, \u00b5l)\n\n(2)\n\nL(cid:88)\n\n(cid:88)\n\nl=1\n\ni\u2208Pl\n\nbe the associated cost. As with the usual k-means, we have the following\nLemma 3.1. Either P is unchanged by steps (1) and (2) or both steps (1) and (2) strictly increase\nthe value of C.\n\nThe proof is by direct computation and is deferred to the supplementary material. Since the number\nof con\ufb01gurations P is \ufb01nite, it follows that DER always terminates and provides a \u201clocal maximum\u201d\nof the cost C.\nThe cost C can be rewritten in a somewhat more informative form. To do so, we introduce some\nnotation \ufb01rst. Let X be a random variable on V , distributed according to measure \u03c0. Let Y a step of\na random walk started at X, so that the distribution of Y given X = i is wi. Finally, for a partition\nP , let Z be the indicator variable of a partition, Z = s iff X \u2208 Ps. With this notation, one can write\n\nC = \u2212dV \u00b7 H(Y |Z) = dV (\u2212H(Y ) + H(Z) \u2212 H(Z|Y )) ,\n\n(3)\n\n4\n\n\f(a) Karate Club\n\n(b) Political Blogs\n\nwhere H are the full and conditional Shannon entropies. Therefore, DER algorithm can be inter-\npreted as seeking a partition that maximizes the information between current known state (Z), and\nthe next step from it (Y ). This interpretation gives rise to the name of the algorithm, DER, since\nevery iteration reduces the entropy H(Y |Z) of the random walk, or diffusion, with respect to the\npartition. The second equality in (3) has another interesting interpretation. Suppose, for simplicity,\nthat k = 2, with partition P1, P2. In general, a clustering algorithm aims to minimize the cut, the\nnumber of edges between P1 and P2. However, minimizing the number of edges directly will lead\nto situations where P1 is a single node, connected with one edge to the rest of the graph in P2. To\navoid such situation, a relative, normalized version of a cut needs to be introduced, which takes into\naccount the sizes of P1, P2. Every clustering algorithms has a way to resolve this issue, implicitly\nor explicitly. For DER, this is shown in second equality of (3). H(Z) is maximized when the com-\nponents are of equal sizes (with respect to \u03c0), while H(Z|Y ) is minimized when the measures \u00b5Ps\nare as disjointly supported as possible.\nAs any k-means algorithm, DER\u2019s results depend somewhat on its random initialization. All k-\nmeans-like schemes are usually restarted several times and the solution with the best cost is chosen.\nIn all cases which we evaluated we observed empirically that the dependence of DER on the initial\nparameters is rather weak. After two or three restarts it usually found a partition nearly as good\nas after 100 restarts. For clustering problems, however, there is another simple way to aggregate\nthe results of multiple runs into a single partition, which slightly improves the quality of the \ufb01nal\nresults. We use this technique in all our experiments and we provide the details in the Supplementary\nMaterial, Section A.\nWe conclude by mentioning two algorithms that use some of the concepts that we use. The Walktrap,\n[20], similarly to DER constructs the random walks (the measures wi, possibly for L > 1) as part of\nits computation. However, Walktrap uses wi\u2019s in a completely different way. Both the optimization\nprocedure and the cost function are different from ours. The Infomap , [5], [21], has a cost that\nis related to the notion of information. It aims to minimize to the information required to transmit\na random walk on G through a channel, the source coding is constructed using the clusters, and\nbest clusters are those that yield the best compression. This does not seem to be directly connected\nto the maximum likelyhood motivated approach that we use. As with Walktrap, the optimization\nprocedure of Infomap also completely differs from ours.\n\n4 Evaluation\n\nIn this section results of the evaluation of DER algorithm are presented. In Section 4.1 we illustrate\nDER on two classical graphs. Sections 4.2 and 4.3 contain the evaluation on the LFR benchmarks.\n\n4.1 Basic examples\n\nWhen a new clustering algorithm is introduced, it is useful to get a general feel of it with some\nsimple examples. Figure 1a shows the classical Zachary\u2019s Karate Club, [22]. This graph has a\n\n5\n\n0123456789101112131415161718192021222324252627282930313233\fground partition into two subsets. The partition shown in Figure 1a is a partition obtained from a\ntypical run of DER algorithm, with k = 2, and wide range of L\u2019s. (L \u2208 [1, 10] were tested). As is\nthe case with many other clustering algorithms, the shown partition differs from the ground partition\nin one element, node 8 (see [1]).\nFigure 1b shows the political blogs graph, [23]. The nodes are political blogs, and the graph has an\n(undirected) edge if one of the blogs had a link to the other. There are 1222 nodes in the graph. The\nground truth partition of this graph has two components - the right wing and left wing blogs. The\nlabeling of the ground truth was partially automatic and partially manual, and both processes could\nintroduce some errors. The run of DER reconstructs the ground truth partition with only 57 nodes\nmissclassifed. The NMI (see the next section, Eq. (4)) to the ground truth partition is .74.\nThe political blogs graphs is particularly interesting since it is an example of a graph for which\n\ufb01tting an SBM model to reconstruct the clusters produces results very different from the ground\ntruth. To overcome the problem with SBM \ufb01tting on this graph, a degree sensitive version of SBM,\nDCBM, was introduced in [24]. That algorithm produces partition with NMI .75. Another approach\nto DCBM can be found in [25].\n\n4.2 LFR benchmarks\n\nThe LFR benchmark model, [14], is a widely used extension of the stochastic block model, where\nnode degrees and community sizes have power law distribution, as often observed in real graphs.\nAn important parameter of this model is the mixing parameter \u00b5 \u2208 [0, 1] that controls the fraction\nof the edges of a node that go outside the node\u2019s community (or outside all of node\u2019s communities,\nin the overlapping case). For small \u00b5, there will be a small number of edges going outside the\ncommunities, leading to disjoint, easily separable graphs, and the boundaries between communities\nwill become less pronounced as \u00b5 grows.\nGiven a set of communities P on a graph, and the ground truth set of communities Q, there are\nseveral ways to measure how close P is to Q. One standard measure is the normalized mutual\ninformation (NMI), given by:\n\nN M I(P, Q) = 2\n\nI(P, Q)\n\nH(P ) + H(Q)\n\n,\n\n(4)\n\nwhere H is the Shannon entropy of a partition and I is the mutual information (see [1] for details).\nNMI is equal 1 if and only if the partitions P and Q coincide, and it takes values between 0 and 1\notherwise.\nWhen computed with NMI, the sets inside P, Q can not overlap. To deal with overlapping commu-\nnities, an extension of NMI was proposed in [26]. We refer to the original paper for the de\ufb01nition,\nas the de\ufb01nition is somewhat lengthy. This extension, which we denote here as ENMI, was sub-\nsequently used in the literature as a measure of closeness of two sets of communities, event in the\ncases of disjoint communities. Note that most papers use the notation NMI while the metric that\nthey really use is ENMI.\nFigure 2a shows the results of evaluation of DER for four cases: the size of a graph was either\nN = 1000 or N = 5000 nodes, and the size of the communities was restricted to be either between\n10 to 50 (denoted S in the \ufb01gures) or between 20 to 100 (denoted B). For each combination of these\nparameters, \u00b5 varied between 0.1 and 0.8. For each combination of graph size, community size\nrestrictions as above and \u00b5 value, we generated 20 graphs from that model and run DER. To provide\nsome basic intuition about these graphs, we note that the number of communities in the 1000S\ngraphs is strongly concentrated around 40, and in 1000B, 5000S, and 5000B graphs it is around 25,\n200 and 100 respectively. Each point in Figure 2a is a the average ENMI on the 20 corresponding\ngraphs, with standard deviation as the error bar. These experiments correspond precisely to the ones\nperformed in [4] (see Supplementary Material, Section Cfor more details). In all runs on DER we\nhave set L = 5 and set k to be the true number of communities for each graph, as was done in [4] for\nthe methods that required it. Therefore our Figure 2a can be compared directly with Figure 2 in [4].\nFrom this comparison we see that DER and the two of the best algorithms identi\ufb01ed in [4], Infomap\n[5] and RN [6], reconstruct the partition perfectly for \u00b5 \u2264 0.5, for \u00b5 = 0.6 DER\u2019s reconstruction\nscores are between Infomap\u2019s and RN\u2019s, with values for all of the algorithms above 0.95, and for\n\n6\n\n\f(a) DER, LFR benchmarks\n\n(b) Spectral Alg., LFR benchmarks\n\n\u00b5 = 0.7 DER has the best performance in two of the four cases. For \u00b5 = 0.8 all algorithms have\nscore 0.\nWe have also performed the same experiments with the standard version of spectral clustering, [8],\nbecause this version was not evaluated in [4]. The results are shown in Fig. 2b. Although the\nperformance is generally good, the scores are mostly lower than those of DER, Infomap and RN.\n\n4.3 Overlapping LFR benchmarks\n\nWe now describe how DER can be applied to overlapping community detection. Observe that DER\ninternally operates on measures \u00b5Ps rather then subsets of the vertex set. Recall that \u00b5Ps(i) is the\nprobability that a random walk started from Ps will hit node i. We can therefore consider each i to\nbe a member of those communities from which the probability to hit it is \u201chigh enough\u201d. To de\ufb01ne\nthis formally, we \ufb01rst note that for any partition P , the following decomposition holds:\n\nk(cid:88)\n\ns=1\n\n\u03c0 =\n\n\u03c0(Ps)\u00b5Ps.\n\n(5)\n\nThis follows from the invariance of \u03c0 under the random walk. Now, given the out put of DER - the\nsets Ps and measures \u00b5Ps set\n\n(cid:80)k\n\nmi(s) =\n\n\u00b5Ps (i)\u03c0(Ps)\nt=1 \u00b5Pt(i)\u03c0(Pt)\n\n=\n\n\u00b5Ps (i)\u03c0(Ps)\n\n\u03c0(i)\n\n,\n\n(6)\n\nwhere we used (5) in the second equality. Then mi(s) is the probability that the walks started at Ps,\ngiven that it \ufb01nished in i. For each i \u2208 V , set si = argmaxl mi(l) to be the most likely community\ngiven i. Then de\ufb01ne the overlapping communities C1, . . . , Ck via\n\u00b7 mi(si)\n\n(cid:26)\n\n(cid:27)\n\nCt =\n\n(7)\n\n.\n\ni \u2208 V | mi(t) \u2265 1\n2\n\nThe paper [10] introduces a new algorithm for overlapping communities detection and contains also\nan evaluation of that algorithm as well as of several other algorithms on a set of overlapping LFR\nbenchmarks. The overlapping communities LFR model was de\ufb01ned in [3]. In Table 1 we present\nthe ENMI results of DER runs on the N = 10000 graphs with same parameters as in [10], and also\nshow the values obtained on these benchmarks in [10] (Figure S4 in [10]), for four other algorithms.\nThe DER algorithm was run with L = 2, and k was set to the true number of communities. Each\nnumber is an average over ENMIs on 10 instances of graphs with a given set of parameters (as in\n[10]). The standard deviation around this average for DER was less then 0.02 in all cases. Variances\nfor other algorithms are provided in [10].\nFor \u00b5 \u2265 0.6 all algorithms yield ENMI of less then 0.3. As we see in Table 1, DER performs better\nthan all other algorithms in all the cases. We believe this indicates that DER together with equation\n(7) is a good choice for overlapping community detection in situations where community overlap\nbetween each two communities is sparse, as is the case in the LFR models considered above. Further\ndiscussion is provided in the Supplementary Material, Section D.\n\n7\n\n0.10.20.30.40.50.60.70.8\u00b50.00.20.40.60.81.0ENMIn1000Sn1000Bn5000Sn5000B0.10.20.30.40.50.60.70.8\u00b50.00.20.40.60.81.0ENMIn1000Sn1000Bn5000Sn5000B\fTable 1: Evaluation for Overlapping LFR. All values except DER are from [10]\n\nAlg.\nDER\nSVI ([10])\nPOI ([27])\nINF ([21])\nCOP ([28])\n\n\u00b5 = 0 \u00b5 = 0.2 \u00b5 = 0.4\n0.94\n0.89\n0.86\n0.42\n0.65\n\n0.83\n0.6\n0.55\n0.4\n0.0\n\n0.9\n0.73\n0.68\n0.38\n0.43\n\nWe conclude this section by noting that while in the non-overlapping case the models generated\nwith \u00b5 = 0 result in trivial community detection problems, because in these cases communities are\nsimply the connected components of the graph, this is no longer true in the overlapping case. As\na point of reference, the well known Clique Percolation method was also evaluated in [10], in the\n\u00b5 = 0 case. The average ENMI for this algorithm was 0.2 (Table S3 in [10]).\n\n5 Analytic bounds\n\nIn this section we restrict our attention to the case L = 1 of the DER algorithm. Recall that the\np, q-SBM model was de\ufb01ned in Section 2. We shall consider the model with k = 2 and such that\n|P1| = |P2|. We assume that the initial partition for the DER, denoted C1, C2 in what follows, is\nchosen as in step 3 of DER (Algorithm 1) - a random partition of V into two equal sized subsets.\nIn this setting we have the following:\nTheorem 5.1. For every \u0001 > 0 there exists C > 0 and c > 0 such that if\n\nand\n\np \u2265 C \u00b7 N\u2212 1\n\n2 +\u0001\n\n(cid:113)\n\n(8)\n\n(9)\nthen DER recovers the partition P1, P2 after one iteration, with probability \u03c6(N ) such that \u03c6(N ) \u2192\n1 when N \u2192 \u221e.\n\n2 +\u0001 log N\n\np \u2212 q \u2265 c\n\npN\u2212 1\n\nNote that the probability in the conclusion of the theorem refers to a joint probability of a draw from\nthe SBM and of an independent draw from the random initialization.\nThe proof of the theorem has essentially three steps. First, we observe that the random initialization\nC1, C2 is necessarily somewhat biased, in the sense that C1 and C2 never divide P1 exactly into two\nhalves. Speci\ufb01cally, ||C1 \u2229 P1| \u2212 |C2 \u2229 P1|| \u2265 N\u2212 1\n2\u2212\u0001 with high probability. Assume that C1 has\nthe bigger half, |C1 \u2229 P1| > |C2 \u2229 P1|. In the second step, by an appropriate linearization argument\nwe show that for a node i \u2208 P1, deciding whether D(wi, \u00b5C1 ) > D(wi, \u00b5C2 ) or vice versa amounts\nto counting paths of length two between i and |C1 \u2229 P1|. In the third step we estimate the number of\nthese length two paths in the model. The fact that |C1 \u2229 P1| > |C2 \u2229 P1| + N\u2212 1\n2\u2212\u0001 will imply more\npaths to C1 \u2229 P1 from i \u2208 P1 and we will conclude that D(wi, \u00b5C1) > D(wi, \u00b5C2 ) for all i \u2208 P1\nand D(wi, \u00b5C2 ) > D(wi, \u00b5C1 ) for all i \u2208 P2. The full proof is provided in the supplementary\nmaterial.\n\nReferences\n[1] Santo Fortunato. Community detection in graphs. Physics Reports, 486(35):75 \u2013 174, 2010.\n[2] Ulrike Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395\u2013416,\n\n2007.\n\n[3] Andrea Lancichinetti and Santo Fortunato. Benchmarks for testing community detection\nalgorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E,\n80(1):016118, 2009.\n\n[4] Santo Fortunato and Andrea Lancichinetti. Community detection algorithms: A comparative\n\nanalysis. In Fourth International ICST Conference, 2009.\n\n8\n\n\f[5] M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal commu-\n\nnity structure. Proc. Natl. Acad. Sci. USA, page 1118, 2008.\n\n[6] Peter Ronhovde and Zohar Nussinov. Multiresolution community detection for megascale\n\nnetworks by information-based replica correlations. Phys. Rev. E, 80, 2009.\n\n[7] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast\nunfolding of communities in large networks. Journal of Statistical Mechanics: Theory and\nExperiment, 2008(10), 2008.\n\n[8] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an\n\nalgorithm. In Advances in Neural Information Processing Systems 14, 2001.\n\n[9] Gergely Palla, Imre Der\u00b4enyi, Ill\u00b4es Farkas, and Tam\u00b4as Vicsek. Uncovering the overlapping\n\ncommunity structure of complex networks in nature and society. Nature, 435, 2005.\n\n[10] Prem K Gopalan and David M Blei. Ef\ufb01cient discovery of overlapping communities in massive\n\nnetworks. Proceedings of the National Academy of Sciences, 110(36):14534\u201314539, 2013.\n\n[11] M. Girvan and M. E. J. Newman. Community structure in social and biological networks.\n\nProceedings of the National Academy of Sciences, 99(12):7821\u20137826, 2002.\n\n[12] MEJ Newman and EA Leicht. Mixture models and exploratory analysis in networks. Proceed-\n\nings of the National Academy of Sciences, 104(23):9564, 2007.\n\n[13] Paul W. Holland, Kathryn B. Laskey, and Samuel Leinhardt. Stochastic blockmodels: First\n\nsteps. Social Networks, 5(2):109\u2013137, 1983.\n\n[14] Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. Benchmark graphs for testing\n\ncommunity detection algorithms. Phys. Rev. E, 78(4), 2008.\n\n[15] Animashree Anandkumar, Rong Ge, Daniel Hsu, and Sham Kakade. A tensor spectral ap-\nIn COLT, volume 30 of JMLR\n\nproach to learning mixed membership community models.\nProceedings, 2013.\n\n[16] Yudong Chen, S. Sanghavi, and Huan Xu. Improved graph clustering. Information Theory,\n\nIEEE Transactions on, 60(10):6440\u20136455, Oct 2014.\n\n[17] Ravi B. Boppana. Eigenvalues and graph bisection: An average-case analysis. In Foundations\n\nof Computer Science, 1987., 28th Annual Symposium on, pages 280\u2013285, Oct 1987.\n\n[18] Anne Condon and Richard M. Karp. Algorithms for graph partitioning on the planted partition\n\nmodel. Random Struct. Algorithms, 18(2):116\u2013140, 2001.\n\n[19] Ron Shamir and Dekel Tsur. Improved algorithms for the random cluster graph model. Random\n\nStruct. Algorithms, 31(4):418\u2013449, 2007.\n\n[20] Pascal Pons and Matthieu Latapy. Computing communities in large networks using random\n\nwalks. J. of Graph Alg. and App., 10:284\u2013293, 2004.\n\n[21] Alcides Viamontes Esquivel and Martin Rosvall. Compression of \ufb02ow can reveal overlapping-\n\nmodule organization in networks. Phys. Rev. X, 1:021025, Dec 2011.\n\n[22] W. W. Zachary. An information \ufb02ow model for con\ufb02ict and \ufb01ssion in small groups. Journal of\n\nAnthropological Research, 33:452\u2013473, 1977.\n\n[23] Lada A. Adamic and Natalie Glance. The political blogosphere and the 2004 U.S. election:\n\nDivided they blog. LinkKDD \u201905, 2005.\n\n[24] Brian Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in net-\n\nworks. Phys. Rev. E, 83, 2011.\n\n[25] Arash A. Amini, Aiyou Chen, Peter J. Bickel, and Elizaveta Levina. Pseudo-likelihood meth-\n\nods for community detection in large sparse networks. 41, 2013.\n\n[26] Andrea Lancichinetti, Santo Fortunato, and Jnos Kertsz. Detecting the overlapping and hier-\narchical community structure in complex networks. New Journal of Physics, 11(3):033015,\n2009.\n\n[27] Brian Ball, Brian Karrer, and M. E. J. Newman. Ef\ufb01cient and principled method for detecting\n\ncommunities in networks. Phys. Rev. E, 84:036103, Sep 2011.\n\n[28] Steve Gregory. Finding overlapping communities in networks by label propagation. New\n\nJournal of Physics, 12(10):103018, 2010.\n\n9\n\n\f", "award": [], "sourceid": 1650, "authors": [{"given_name": "Mark", "family_name": "Kozdoba", "institution": "Technion"}, {"given_name": "Shie", "family_name": "Mannor", "institution": "Technion"}]}