{"title": "Information-theoretic Limits for Community Detection in Network Models", "book": "Advances in Neural Information Processing Systems", "page_first": 8324, "page_last": 8333, "abstract": "We analyze the information-theoretic limits for the recovery of node labels in several network models. This includes the Stochastic Block Model, the Exponential Random Graph Model, the Latent Space Model, the Directed Preferential Attachment Model, and the Directed Small-world Model. For the Stochastic Block Model, the non-recoverability condition depends on the probabilities of having edges inside a community, and between different communities. For the Latent Space Model, the non-recoverability condition depends on the dimension of the latent space, and how far and spread are the communities in the latent space. For the Directed Preferential Attachment Model and the Directed Small-world Model, the non-recoverability condition depends on the ratio between homophily and neighborhood size. We also consider dynamic versions of the Stochastic Block Model and the Latent Space Model.", "full_text": "Information-theoretic Limits for Community\n\nDetection in Network Models\n\nDepartment of Computer Science\n\nDepartment of Computer Science\n\nJean Honorio\n\nPurdue University\n\nWest Lafayette, IN 47907\njhonorio@purdue.edu\n\nChuyang Ke\n\nPurdue University\n\nWest Lafayette, IN 47907\n\ncke@purdue.edu\n\nAbstract\n\nWe analyze the information-theoretic limits for the recovery of node labels in\nseveral network models. This includes the Stochastic Block Model, the Expo-\nnential Random Graph Model, the Latent Space Model, the Directed Preferential\nAttachment Model, and the Directed Small-world Model. For the Stochastic Block\nModel, the non-recoverability condition depends on the probabilities of having\nedges inside a community, and between different communities. For the Latent\nSpace Model, the non-recoverability condition depends on the dimension of the\nlatent space, and how far and spread are the communities in the latent space. For\nthe Directed Preferential Attachment Model and the Directed Small-world Model,\nthe non-recoverability condition depends on the ratio between homophily and\nneighborhood size. We also consider dynamic versions of the Stochastic Block\nModel and the Latent Space Model.\n\n1\n\nIntroduction\n\nNetwork models have already become a powerful tool for researchers in various \ufb01elds. With the rapid\nexpansion of online social media including Twitter, Facebook, LinkedIn and Instagram, researchers\nnow have access to more real-life network data and network models are great tools to analyze the vast\namount of interactions [16, 2, 1, 21]. Recent years have seen the applications of network models in\nmachine learning [5, 33, 23], bioinformatics [9, 15, 11], as well as in social and behavioral researches\n[26, 14].\nAmong these literatures one of the central problems related to network models is community detection.\nIn a typical network model, nodes represent individuals in a social network, and edges represent\ninterpersonal interactions. The goal of community detection is to recover the label associated with\neach node (i.e., the community where each node belongs to). The exact recovery of 100% of the labels\nhas always been an important research topic in machine learning, for instance, see [2, 10, 20, 27].\nOne particular issue researchers care about in the recovery of network models is the relation between\nthe number of nodes, and the proximity between the likelihood of connecting within the same\ncommunity and across different communities. For instance, consider the Stochastic Block Model\n(SBM), in which p is the probability for connecting two nodes in the same community, and q is the\nprobability for connecting two nodes in different communities. Clearly if p equals q, it is impossible\nto identify the communities, or equivalently, to recover the labels for all nodes. Intuitively, as the\ndifference between p and q increases, labels are easier to be recovered.\nIn this paper, we analyze the information-theoretic limits for community detection. Our main contri-\nbution is the comprehensive study of several network models used in the literature. To accomplish\nthat task, we carefully construct restricted ensembles. The key idea of using restricted ensembles is\nthat for any learning problem, if a subclass of models is dif\ufb01cult to be learnt, then the original class\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fTable 1: Comparison of network models (S - static; UD - undirected dynamic; DD - directed dynamic)\n\nType Model\n\nS\n\nSBM\n\nOur Result\n(p\u2212q)2\nq(1\u2212q) \u2264 2 log 2\n\nn \u2212 4 log 2\n\nn2\n\nS\nS\nUD\nUD\nDD\nDD\n\n(4\u03c32 + 1)\u22121\u2212p/2(cid:107)\u00b5(cid:107)2\n\n2(cosh \u03b2 \u2212 1) \u2264 2 log 2\n\nn \u2212 4 log 2\nERGM\n2 \u2264 log 2\nLSM\nq(1\u2212q) \u2264 (n\u22122) log 2\n(p\u2212q)2\nDSBM\nn2\u2212n\n2 \u2264 (n\u22122) log 2\n(4\u03c32 + 1)\u22121\u2212p/2(cid:107)\u00b5(cid:107)2\nDLSM\n4n2\u22124n\n(s + 1)/8m \u2264 2(n\u22122)/(n2\u2212n)/n2\nDPAM\nDSWM (s + 1)2/(mp(1 \u2212 p)) \u2264 22(n\u22122)/n2\n\n2n \u2212 log 2\n\nn2\n\nn2\n\n/n\n\nPrevious Result\n(p\u2212q)2\np+q \u2264 2\nn[25]\n(p\u2212q)2\nq(1\u2212q) \u2264 O( 1\nn )[10]\nNovel\nNovel\nNovel\nNovel\nNovel\nNovel\n\nThm. No.\n\nThm. 1\n\nCor. 1\nThm. 2\nThm. 3\nThm. 4\nThm. 5\nThm. 6\n\nof models will be at least as dif\ufb01cult to be learnt. The use of restricted ensembles is customary for\ninformation-theoretic lower bounds [28, 31].\nWe provide a series of novel results in this paper. While the information-theoretic limits of the\nStochastic Block Model have been heavily studied (in slightly different ways), none of the other\nmodels considered in this paper have been studied before. Thus, we provide new information-theoretic\nresults for the Exponential Random Graph Model (ERGM), the Latent Space Model (LSM), the\nDirected Preferential Attachment Model (DPAM), and the Directed Small-world Model (DSWM).\nWe also provide new results for dynamic versions of the Stochastic Block Model (DSBM) and the\nLatent Space Model (DLSM).\nTable 1 summarizes our results.\n\n2 Static Network Models\n\nIn this section we analyze the information-theoretic limits for two static network models:\nthe\nStochastic Block Model (SBM) and the Latent Space Model (LSM). Furthermore, we include a\nparticular case of the Exponential Random Graph Model (ERGM) as a corollary of our results for the\nSBM. We call these static models, because in these models edges are independent of each other.\n\n2.1 Stochastic Block Model\n\nAmong different network models the Stochastic Block Model (SBM) has received particular attention.\nVariations of the Stochastic Block Model include, for example, symmetric SBMs [3], binary SBMs\n[27, 13], labelled SBMs [36, 20, 34, 18], and overlapping SBMs [4]. For regular SBMs [25] and\n[10] showed that under certain conditions recovering the communities in a SBM is fundamentally\nimpossible. Our analysis for the Stochastic Block Model follows the method used in [10] but we\nanalyze a different regime. In [10], two clusters are required to have the equal size (Planted Bisection\nModel), while in our SBM setup, nature picks the label of each node uniformly at random. Thus in\nour model only the expectation of the sizes of the two communities are equal.\nWe now de\ufb01ne the Stochastic Block Model, which has two parameters p and q.\nDe\ufb01nition 1 (Stochastic Block Model). Let 0 < q < p < 1. A Stochastic Block Model with\nparameters (p, q) is an undirected graph of n nodes with the adjacency matrix A, where each\nAij \u2208 {0, 1}. Each node is in one of the two classes {+1,\u22121}. The distribution of true labels\ni is assigned to +1 with probability 0.5, and \u22121 with\nY \u2217 = (y\u2217\nprobability 0.5.\nThe adjacency matrix A is distributed as follows: if y\u2217\notherwise Aij is Bernoulli with parameter q.\nThe goal is to recover labels \u02c6Y = (\u02c6y1, . . . , \u02c6yn) that are equal to the true labels Y \u2217, given the\nobservation of A. We are interested in the information-theoretic lower bounds. Thus, we de\ufb01ne the\nMarkov chain Y \u2217 \u2192 A \u2192 \u02c6Y . Using Fano\u2019s inequality, we obtain the following results.\n\nj then Aij is Bernoulli with parameter p;\n\nn) is uniform, i.e., each label y\u2217\n\n1, . . . , y\u2217\n\ni = y\u2217\n\n2\n\n\fTheorem 1. In a Stochastic Block Model with parameters (p, q) with 0 < q < p < 1, if\n\n(p \u2212 q)2\nq(1 \u2212 q)\n\n\u2264 2 log 2\n\nn\n\n\u2212 4 log 2\nn2\n\n,\n\nthen we have that for any algorithm that a learner could use for picking \u02c6Y , the probability of error\nP( \u02c6Y (cid:54)= Y \u2217) is greater than or equal to 1\n2 .\nNotice that our result for the Stochastic Block Model is similar to the one in [10]. This means that\nthe method of generating labels does not affect the information-theoretic bound.\n\n2.2 Exponential Random Graph Model\n\nform: P(A) = exp(\u03c6(A))/(cid:80)\n\nExponential Random Graph Models (ERGMs) are a family of distributions on graphs of the following\nA(cid:48) exp(\u03c6(A(cid:48))), where \u03c6 : {0, 1}n\u00d7n \u2192 R is some potential function\nover graphs. Selecting different potential functions enables ERGMs to model various structures in\nnetwork graphs, for instance, the potential function can be a sum of functions over edges, triplets,\ncliques, among other choices [16].\nIn this section we analyze a special case of the Exponential Random Graph Model as a corollary of\nour results for the Stochastic Block Model, in which the potential function is de\ufb01ned as a sum of\n\u03c6ij(yi, yj), where \u03c6ij(yi, yj) = \u03b2yiyj and \u03b2 > 0\ni,j \u03b2Aijyiyj. This leads to the\n\nfunctions over edges. That is, \u03c6(A) =(cid:80)\nis a parameter. Simplifying the expression above, we have \u03c6(A) =(cid:80)\n\ni,j|Aij=1\n\nfollowing de\ufb01nition.\nDe\ufb01nition 2 (Exponential Random Graph Model). Let \u03b2 > 0. An Exponential Random Graph\nModel with parameter \u03b2 is an undirected graph of n nodes with the adjacency matrix A, where\neach Aij \u2208 {0, 1}. Each node is in one of the two classes {+1,\u22121}. The distribution of true labels\ni is assigned to +1 with probability 0.5, and \u22121 with\nY \u2217 = (y\u2217\nprobability 0.5.\n\nThe adjacency matrix A is distributed as follows: P(A|Y ) = exp(\u03b2(cid:80)\nZ(\u03b2) =(cid:80)\n\nA(cid:48)\u2208{0,1}n\u00d7n exp(\u03b2(cid:80)\n\nn) is uniform, i.e., each label y\u2217\n\ni 0, if\n\n2(cosh \u03b2 \u2212 1) \u2264 2 log 2\n\nn\n\n\u2212 4 log 2\nn2\n\n,\n\nthen we have that for any algorithm that a learner could use for picking \u02c6Y , the probability of error\nP( \u02c6Y (cid:54)= Y \u2217) is greater than or equal to 1\n2 .\n\n2.3 Latent Space Model\n\nThe Latent Space Model (LSM) was \ufb01rst proposed by [19]. The core assumption of the model is that\neach node has a low-dimensional latent vector associated with it. The latent vectors of nodes in the\nsame community follow a similar pattern. The connectivity of two nodes in the Latent Space Model is\ndetermined by the distance between their corresponding latent vectors. Previous works on the Latent\nSpace Model [30] analyzed asymptotic sample complexity, but did not focus on information-theoretic\nlimits for exact recovery.\nWe now de\ufb01ne the Latent Space Model, which has three parameters \u03c3 > 0, d \u2208 Z+ and \u00b5 \u2208 Rd,\n\u00b5 (cid:54)= 0.\nDe\ufb01nition 3 (Latent Space Model). Let d \u2208 Z+, \u00b5 \u2208 Rd and \u00b5 (cid:54)= 0, \u03c3 > 0. A Latent Space Model\nwith parameters (d, \u00b5, \u03c3) is an undirected graph of n nodes with the adjacency matrix A, where\neach Aij \u2208 {0, 1}. Each node is in one of the two classes {+1,\u22121}. The distribution of true labels\ni is assigned to +1 with probability 0.5, and \u22121 with\nY \u2217 = (y\u2217\nprobability 0.5.\n\nn) is uniform, i.e., each label y\u2217\n\n1, . . . , y\u2217\n\n3\n\n\fFor every node i, nature generates a latent d-dimensional vector zi \u2208 Rd according to the Gaussian\ndistribution Nd(yi\u00b5, \u03c32I).\nThe adjacency matrix A is distributed as follows: Aij is Bernoulli with parameter exp(\u2212(cid:107)zi \u2212 zj(cid:107)2\n2).\n\nThe goal is to recover labels \u02c6Y = (\u02c6y1, . . . , \u02c6yn) that are equal to the true labels Y \u2217, given the observa-\ntion of A. Notice that we do not have access to Z. we are interested in the information-theoretic\nlower bounds. Thus, we de\ufb01ne the Markov chain Y \u2217 \u2192 A \u2192 \u02c6Y . Fano\u2019s inequality and a proper\nconversion of the above model lead to the following theorem.\n\nTheorem 2. In a Latent Space Model with parameters (d, \u00b5, \u03c3), if\n\n(4\u03c32 + 1)\u22121\u2212d/2(cid:107)\u00b5(cid:107)2\n\n2 \u2264 log 2\n2n\n\n\u2212 log 2\nn2 ,\n\nthen we have that for any algorithm that a learner could use for picking \u02c6Y , the probability of error\nP( \u02c6Y (cid:54)= Y \u2217) is greater than or equal to 1\n2 .\n\n3 Dynamic Network Models\n\nIn this section we analyze the information-theoretic limits for two dynamic network models: the\nDynamic Stochastic Block Model (DSBM) and the Dynamic Latent Space Model (DLSM). We call\nthese dynamic models, because we assume there exists some ordering for edges, and the distribution\nof each edge not only depends on its endpoints, but also depends on previously generated edges.\nWe start by giving the de\ufb01nition of predecessor sets. Notice that the following de\ufb01nition of predecessor\nsets employs a lexicographic order, and the motivation is to use it as a subclass to provide a bound\nfor general dynamic models. Fano\u2019s inequality is usually used for a restricted ensemble, i.e., a\nsubclass of the original class of interest. If a subclass (e.g., dynamic SBM or LSM with a particular\npredecessor set \u03c4) is dif\ufb01cult to be learnt, then the original class (SBMs or LSMs with general\ndynamic interactions) will be at least as dif\ufb01cult to be learnt. The use of restricted ensembles is\ncustomary for information-theoretic lower bounds [28, 31].\nDe\ufb01nition 4. For every pair i and j with i < j, we denote its predecessor set using \u03c4i,j, where\n\n\u03c4ij \u2286 {(k, l)|(k < l) \u2227 (k < i \u2228 (k = i \u2227 l < j))}\n\nand\n\nA\u03c4ij = {Akl|(k, l) \u2208 \u03c4ij}.\n\nIn a dynamic model, the probability distribution of each edge Aij not only depends on the labels of\nnodes i and j (i.e., y\u2217\nNext, we prove the following lemma using the de\ufb01nition above.\nLemma 1. Assume now the probability distribution of A given labeling Y is P(A|Y ) =\n\nj ), but also on the previously generated edges A\u03c4ij .\n\ni and y\u2217\n\nP(Aij|A\u03c4ij , yi, yj). Then for any labeling Y and Y (cid:48), we have\n\ni 0. Let\nF = {fk}(n\nk=0 be a set of functions, where fk : {0, 1}k \u2192 (0, 1]. A Latent Space Model with\n2)\nparameters (d, \u00b5, \u03c3, F ) is an undirected graph of n nodes with the adjacency matrix A, where each\nAij \u2208 {0, 1}. Each node is in one of the two classes {+1,\u22121}. The distribution of true labels\ni is assigned to +1 with probability 0.5, and \u22121 with\nY \u2217 = (y\u2217\nprobability 0.5.\nFor every node i, nature generates a latent d-dimensional vector zi \u2208 Rd according to the Gaussian\ndistribution Nd(yi\u00b5, \u03c32I).\nThe adjacency matrix A is distributed as follows: Aij is Bernoulli with parameter f|\u03c4ij|(A\u03c4ij ) \u00b7\nexp(\u2212(cid:107)zi \u2212 zj(cid:107)2\n2).\nThe goal is to recover labels \u02c6Y = (\u02c6y1, . . . , \u02c6yn) that are equal to the true labels Y \u2217, given the observa-\ntion of A. Notice that we do not have access to Z. We are interested in the information-theoretic\nlower bounds. Thus, we de\ufb01ne the Markov chain Y \u2217 \u2192 A \u2192 \u02c6Y . Using Fano\u2019s inequality and\nLemma 1, our analysis leads to the following theorem.\n\nTheorem 4. In a Dynamic Latent Space Model with parameters (d, \u00b5, \u03c3,{fk}), if\n\n(4\u03c32 + 1)\u22121\u2212d/2(cid:107)\u00b5(cid:107)2\n\n2 \u2264 n \u2212 2\n4(n2 \u2212 n)\n\nlog 2,\n\nthen we have that for any algorithm that a learner could use for picking \u02c6Y , the probability of error\nP( \u02c6Y (cid:54)= Y \u2217) is greater than or equal to 1\n2 .\n\n5\n\n\f4 Directed Network Models\n\nIn this section we analyze the information-theoretic limits for two directed network models: the\nDirected Preferential Attachment Model (DPAM) and the Directed Small-world Model (DSWM). In\ncontrast to previous sections, here we consider directed graphs.\nNote that in social networks such as Twitter, the graph is directed. That is, each user follows other\nusers. Users that are followed by many others (i.e., nodes with high out-degree) are more likely to be\nfollowed by new users. This is the case of popular singers, for instance. Additionally, a new user will\nfollow people with similar preferences. This is referred in the literature as homophily. In our case, a\nnode with positive label will more likely follow nodes with positive label, and vice versa.\nThe two models de\ufb01ned in this section will require an expected number of in-neighbors m, for each\nnode. In order to guarantee this in a setting in which nodes decide to connect to at most k > m nodes\nindependently, one should guarantee that the probability of choosing each of the k nodes is less than\nor equal to 1/m.\n\nThe above motivates an algorithm that takes a vector in the k-simplex (i.e., w \u2208 Rk and(cid:80)k\nand produces another vector in the k-simplex (i.e., \u02dcw \u2208 Rk,(cid:80)k\n\ni=1 wi = 1)\ni=1 \u02dcwi = 1 and for all i, \u02dcwi \u2264 1/m).\n\nConsider the following optimization problem:\n\nminimize\n\n\u02dcw\n\n1\n2\n\n( \u02dcwi \u2212 wi)2\n\nsubject to 0 \u2264 \u02dcwi \u2264 1\nm\n\nfor all i\n\nk(cid:88)\n\ni=1\n\nk(cid:88)\n\ni=1\n\n\u02dcwi = 1.\n\nwhich is solved by the following algorithm:\nAlgorithm 1: k-simplex\ninput\n\n:vector w \u2208 Rk where(cid:80)k\noutput :vector \u02dcw \u2208 Rk where(cid:80)k\n\nexpected number of in-neighbors m \u2264 k\n\ni=1 wi = 1,\ni=1 \u02dcwi = 1 and \u02dcwi \u2264 1/m for all i\n\n\u02dcwi \u2190 wi;\n\n1 for i \u2208 {1, . . . , k} do\n2\n3 end\n4 for i \u2208 {1, . . . , k} such that \u02dcwi > 1\n\nm do\n\nS \u2190 \u02dcwi \u2212 1\nm;\n\u02dcwi \u2190 1\nm;\nDistribute S evenly across all j \u2208 {1, . . . , k} such that \u02dcwj < 1\nm;\n\n5\n6\n7\n8 end\nOne important property that we will use in our proofs is that mini \u02dcwi \u2265 mini wi, as well as\nmaxi \u02dcwi \u2264 maxi wi.\n\n4.1 Directed Preferential Attachment Model\n\nHere we consider a Directed Preferential Attachment Model (DPAM) based on the classic Preferential\nAttachment Model [7]. While in the classic model every mode has exactly m neighbors, in our model\nthe expected number of in-neighbors is m.\nDe\ufb01nition 7 (Directed Preferential Attachment Model). Let m be a positive integer with 0 < m (cid:28) n.\nLet s > 0 be the homophily parameter. A Directed Preferential Attachment Model with parameters\n(m, s) is a directed graph of n nodes with the adjacency matrix A, where each Aij \u2208 {0, 1}. Each\nnode is in one of the two classes {+1,\u22121}. The distribution of true labels Y \u2217 = (y\u2217\nn) is\nuniform, i.e., each label y\u2217\nNodes 1 through m are not connected to each other, and they all have an in-degree of 0. For\nnode i from m + 1 to n, nature \ufb01rst generates the weight wji for each node j < i, where wji \u221d\n\n1, . . . , y\u2217\ni is assigned to +1 with probability 0.5, and \u22121 with probability 0.5.\n\n6\n\n\f((cid:80)i\u22121\n\nj ]s + 1), and(cid:80)i\u22121\n\nk=1 Ajk + 1)(1[y\u2217\n\ni = y\u2217\n\nj=1 wji = 1. Then every node j < i connects to node\nj ) = m \u02dcwji, where ( \u02dcw1i... \u02dcwi\u22121,i) is\n\ni with the following probability: P(Aji = 1 | A\u03c4ij , y\u2217\ncomputed from (w1i...wi\u22121,i) as in Algorithm 1.\nThe goal is to recover labels \u02c6Y = (\u02c6y1, . . . , \u02c6yn) that are equal to the true labels Y \u2217, given the\nobservation of A. We are interested in the information-theoretic lower bounds. Thus, we de\ufb01ne the\nMarkov chain Y \u2217 \u2192 A \u2192 \u02c6Y . Using Fano\u2019s inequality, we obtain the following results.\n\n1, . . . , y\u2217\n\nTheorem 5. In a Directed Preferential Attachment Model with parameters (m, s), if\n\ns + 1\n8m\n\n\u2264 2(n\u22122)/(n2\u2212n)\n\nn2\n\n,\n\nthen we have that for any algorithm that a learner could use for picking \u02c6Y , the probability of error\nP( \u02c6Y (cid:54)= Y \u2217) is greater than or equal to 1\n2 .\n\n4.2 Directed Small-world Model\n\n1, . . . , y\u2217\n\nn) is uniform, i.e., each label y\u2217\n\nHere we consider a Directed Small-world Model (DSWM) based on the classic small-world phe-\nnomenon [32]. While in the classic model every mode has exactly m neighbors, in our model the\nexpected number of in-neighbors is m.\nDe\ufb01nition 8 (Directed Small-world Model). Let m be a positive integer with 0 < m (cid:28) n. Let s > 0\nbe the homophily parameter. Let p be the mixture parameter with 0 < p < 1. A Directed Small-world\nModel with parameters (m, s, p) is a directed graph of n nodes with the adjacency matrix A, where\neach Aij \u2208 {0, 1}. Each node is in one of the two classes {+1,\u22121}. The distribution of true labels\ni is assigned to +1 with probability 0.5, and \u22121 with\nY \u2217 = (y\u2217\nprobability 0.5.\nNodes 1 through m are not connected to each other, and they all have an in-degree of 0. For node i\nfrom m + 1 to n, nature \ufb01rst generates the weight wji for each node j < i, where wji \u221d (1[y\u2217\ni =\nwji = 1 \u2212 p. Then every node j < i connects to node\ny\u2217\ni with the following probability: P(Aji = 1 | A\u03c4ij , y\u2217\nj ) = m \u02dcwji, where ( \u02dcw1i... \u02dcwi\u22121,i) is\ncomputed from (w1i...wi\u22121,i) as in Algorithm 1.\nThe goal is to recover labels \u02c6Y = (\u02c6y1, . . . , \u02c6yn) that are equal to the true labels Y \u2217, given the\nobservation of A. We are interested in the information-theoretic lower bounds. Thus, we de\ufb01ne the\nMarkov chain Y \u2217 \u2192 A \u2192 \u02c6Y . Using Fano\u2019s inequality, we obtain the following results.\n\nj=i\u2212m wji = p,(cid:80)i\u2212m\u22121\n\nj ]s + 1), and(cid:80)i\u22121\n\n1, . . . , y\u2217\n\nj=1\n\nTheorem 6. In a Directed Small-world Model with parameters (m, s, p), if\n\n(s + 1)2\nmp(1 \u2212 p)\n\n\u2264 22(n\u22122)/n2\n\nn\n\n,\n\nthen we have that for any algorithm that a learner could use for picking \u02c6Y , the probability of error\nP( \u02c6Y (cid:54)= Y \u2217) is greater than or equal to 1\n2 .\n\n5 Concluding Remarks\n\nIn the past decade a lot of effort has been made in the Stochastic Block Model (SBM) community to\n\ufb01nd polynomial time algorithms for the exact recovery. For example, [2, 10] provided analyses to\nvarious parameter regimes in symmetric SBMs, and showed that some easy regimes could be solved\nin polynomial time using semide\ufb01nite programming relaxation; [3] also provides quasi-linear time\nalgorithms for SBMs; [17] and [6] discovered the existence of phase transition in the exact recovery\nof symmetric SBMs. All of the aforementioned literature has mathematical guarantees of statistical\nand computational ef\ufb01ciency. There exists algorithms without formal guarantees, for example,\n[16] introduced some MCMC-based methods. Other heuristic algorithms include Kernighan-Lin\u2019s\nalgorithm, METIS, Local Spectral Partitioning, etc. (See e.g., [22] for reference.)\n\n7\n\n\fWe want to highlight that community detection for undirected models could be viewed as a special\ncase of the Markov random \ufb01eld (MRF) inference problem. In the MRF model, if the pairwise\npotentials are submodular, the problem could be solved exactly in polynomial time via graph cuts in\nthe case of two communities [8].\nRegarding our contributions, we highlight that the entries in the adjacency matrix A are not inde-\npendent in several models considered in our paper, including the Dynamic Stochastic Block Model,\nthe Dynamic Latent Space Model, the Directed Preferential Attachment Model and the Directed\nSmall-world Model. Also, in the Latent Space Model and the Dynamic Latent Space Model, we\nhave additional latent variables. Furthermore, in the Directed Preferential Attachment Model and\nthe Directed Small-world Model, an entry in A also depends on several entries in Y \u2217 to account for\nhomophily.\nOur research could be extended in several ways. First, our models only involve two symmetric clusters.\nFor the Latent Space Model and dynamic models, it might be interesting to analyze the case with\nmultiple clusters. Some more complicated models involving Markovian assumptions, for example,\nthe Dynamic Social Network in Latent Space model [29], can also be analyzed. We acknowledge\nthat the information-theoretic lower bounds we provide in this paper may not be necessarily tight. It\nwould be interesting to analyze phase transitions and information-computational gaps for the new\nmodels.\n\nReferences\n[1] Emmanuel Abbe. Community detection and stochastic block models: recent developments.\n\narXiv preprint arXiv:1703.10146, 2017.\n\n[2] Emmanuel Abbe, Afonso S Bandeira, and Georgina Hall. Exact recovery in the stochastic block\n\nmodel. IEEE Transactions on Information Theory, 62(1):471\u2013487, 2016.\n\n[3] Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models:\nFundamental limits and ef\ufb01cient algorithms for recovery. In Foundations of Computer Science\n(FOCS), 2015 IEEE 56th Annual Symposium on, pages 670\u2013688. IEEE, 2015.\n\n[4] Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing. Mixed membership\n\nstochastic blockmodels. Journal of Machine Learning Research, 9(Sep):1981\u20132014, 2008.\n\n[5] Brian Ball, Brian Karrer, and Mark EJ Newman. Ef\ufb01cient and principled method for detecting\n\ncommunities in networks. Physical Review E, 84(3):036103, 2011.\n\n[6] Afonso S Bandeira. Random laplacian matrices and convex relaxations. Foundations of\n\nComputational Mathematics, 18(2):345\u2013379, 2018.\n\n[7] Albert-L\u00e1szl\u00f3 Barab\u00e1si and R\u00e9ka Albert. Emergence of scaling in random networks. science,\n\n286(5439):509\u2013512, 1999.\n\n[8] Yuri Boykov and Olga Veksler. Graph cuts in vision and graphics: Theories and applications.\n\nIn Handbook of mathematical models in computer vision, pages 79\u201396. Springer, 2006.\n\n[9] Irineo Cabreros, Emmanuel Abbe, and Aristotelis Tsirigos. Detecting community structures in\nHi-C genomic data. In Information Science and Systems (CISS), 2016 Annual Conference on,\npages 584\u2013589. IEEE, 2016.\n\n[10] Yudong Chen and Jiaming Xu. Statistical-computational phase transitions in planted models:\nThe high-dimensional setting. In International Conference on Machine Learning, pages 244\u2013\n252, 2014.\n\n[11] Melissa S Cline, Michael Smoot, Ethan Cerami, Allan Kuchinsky, Nerius Landys, Chris\nWorkman, Rowan Christmas, Iliana Avila-Campilo, Michael Creech, Benjamin Gross, et al.\nIntegration of biological networks and gene expression data using Cytoscape. Nature protocols,\n2(10):2366, 2007.\n\n[12] Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons,\n\n2012.\n\n8\n\n\f[13] Yash Deshpande, Emmanuel Abbe, and Andrea Montanari. Asymptotic mutual information\nfor the binary stochastic block model. In Information Theory (ISIT), 2016 IEEE International\nSymposium on, pages 185\u2013189. IEEE, 2016.\n\n[14] Santo Fortunato. Community detection in graphs. Physics reports, 486(3-5):75\u2013174, 2010.\n\n[15] Michelle Girvan and Mark EJ Newman. Community structure in social and biological networks.\n\nProceedings of the national academy of sciences, 99(12):7821\u20137826, 2002.\n\n[16] Anna Goldenberg, Alice X Zheng, Stephen E Fienberg, Edoardo M Airoldi, et al. A survey\nof statistical network models. Foundations and Trends R(cid:13) in Machine Learning, 2(2):129\u2013233,\n2010.\n\n[17] Bruce Hajek, Yihong Wu, and Jiaming Xu. Achieving exact cluster recovery threshold via\nsemide\ufb01nite programming. IEEE Transactions on Information Theory, 62(5):2788\u20132797, 2016.\n\n[18] Simon Heimlicher, Marc Lelarge, and Laurent Massouli\u00e9. Community detection in the labelled\nstochastic block model. NIPS Workshop on Algorithmic and Statistical Approaches for Large\nSocial Networks, 2012.\n\n[19] Peter D Hoff, Adrian E Raftery, and Mark S Handcock. Latent space approaches to social\nnetwork analysis. Journal of the american Statistical association, 97(460):1090\u20131098, 2002.\n\n[20] Varun Jog and Po-Ling Loh. Information-theoretic bounds for exact recovery in weighted\nstochastic block models using the Renyi divergence. IEEE Allerton Conference on Communica-\ntion, Control, and Computing, 2015.\n\n[21] Bomin Kim, Kevin Lee, Lingzhou Xue, and Xiaoyue Niu. A review of dynamic network models\n\nwith latent variables. arXiv preprint arXiv:1711.10421, 2017.\n\n[22] Jure Leskovec, Kevin J Lang, and Michael Mahoney. Empirical comparison of algorithms for\nnetwork community detection. In Proceedings of the 19th international conference on World\nwide web, pages 631\u2013640. ACM, 2010.\n\n[23] Greg Linden, Brent Smith, and Jeremy York. Amazon. com recommendations: Item-to-item\n\ncollaborative \ufb01ltering. IEEE Internet computing, 7(1):76\u201380, 2003.\n\n[24] Arakaparampil M Mathai and Serge B Provost. Quadratic forms in random variables: theory\n\nand applications. Dekker, 1992.\n\n[25] Elchanan Mossel, Joe Neeman, and Allan Sly. Stochastic block models and reconstruction.\n\narXiv preprint arXiv:1202.1499, 2012.\n\n[26] Mark EJ Newman, Duncan J Watts, and Steven H Strogatz. Random graph models of social\n\nnetworks. Proceedings of the National Academy of Sciences, 99(suppl 1):2566\u20132572, 2002.\n\n[27] Hussein Saad, Ahmed Abotabl, and Aria Nosratinia. Exact recovery in the binary stochastic\nblock model with binary side information. IEEE Allerton Conference on Communication,\nControl, and Computing, 2017.\n\n[28] Narayana P Santhanam and Martin J Wainwright. Information-theoretic limits of selecting\nIEEE Transactions on Information Theory,\n\nbinary graphical models in high dimensions.\n58(7):4117\u20134134, 2012.\n\n[29] Purnamrita Sarkar and Andrew W Moore. Dynamic social network analysis using latent space\n\nmodels. In Advances in Neural Information Processing Systems, pages 1145\u20131152, 2006.\n\n[30] Minh Tang, Daniel L Sussman, Carey E Priebe, et al. Universally consistent vertex classi\ufb01cation\n\nfor latent positions graphs. The Annals of Statistics, 41(3):1406\u20131430, 2013.\n\n[31] Wei Wang, Martin J Wainwright, and Kannan Ramchandran. Information-theoretic bounds on\nmodel selection for gaussian markov random \ufb01elds. In Information Theory Proceedings (ISIT),\n2010 IEEE International Symposium on, pages 1373\u20131377. IEEE, 2010.\n\n9\n\n\f[32] Duncan J Watts and Steven H Strogatz. Collective dynamics of \u2018small-world\u2019networks. nature,\n\n393(6684):440, 1998.\n\n[33] Rui Wu, Jiaming Xu, Rayadurgam Srikant, Laurent Massouli\u00e9, Marc Lelarge, and Bruce Hajek.\nClustering and inference from pairwise comparisons. In ACM SIGMETRICS Performance\nEvaluation Review, volume 43, pages 449\u2013450. ACM, 2015.\n\n[34] Jiaming Xu, Laurent Massouli\u00e9, and Marc Lelarge. Edge label inference in generalized\nIn Conference on\n\nstochastic block models: from spectral theory to impossibility results.\nLearning Theory, pages 903\u2013920, 2014.\n\n[35] Bin Yu. Assouad, Fano, and Le Cam. Festschrift for Lucien Le Cam, 423:435, 1997.\n\n[36] Se-Young Yun and Alexandre Proutiere. Optimal cluster recovery in the labeled stochastic\n\nblock model. In Advances in Neural Information Processing Systems, pages 965\u2013973, 2016.\n\n10\n\n\f", "award": [], "sourceid": 5047, "authors": [{"given_name": "Chuyang", "family_name": "Ke", "institution": "Purdue University"}, {"given_name": "Jean", "family_name": "Honorio", "institution": "Purdue University"}]}