{"title": "Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel", "book": "Advances in Neural Information Processing Systems", "page_first": 3120, "page_last": 3128, "abstract": "Spectral clustering is a fast and popular algorithm for finding clusters in networks. Recently, Chaudhuri et al. and Amini et al. proposed variations on the algorithm that artificially inflate the node degrees for improved statistical performance. The current paper extends the previous theoretical results to the more canonical spectral clustering algorithm in a way that removes any assumption on the minimum degree and provides guidance on the choice of tuning parameter. Moreover, our results show how the star shape\" in the eigenvectors--which are consistently observed in empirical networks--can be explained by the Degree-Corrected Stochastic Blockmodel and the Extended Planted Partition model, two statistical model that allow for highly heterogeneous degrees. Throughout, the paper characterizes and justifies several of the variations of the spectral clustering algorithm in terms of these models. \"", "full_text": "Regularized Spectral Clustering under the\nDegree-Corrected Stochastic Blockmodel\n\nTai Qin\n\nDepartment of Statistics\n\nUniversity of Wisconsin-Madison\n\nMadison, WI\n\nqin@stat.wisc.edu\n\nKarl Rohe\n\nDepartment of Statistics\n\nUniversity of Wisconsin-Madison\n\nMadison, WI\n\nkarlrohe@stat.wisc.edu\n\nAbstract\n\nSpectral clustering is a fast and popular algorithm for \ufb01nding clusters in net-\nworks. Recently, Chaudhuri et al. [1] and Amini et al. [2] proposed inspired\nvariations on the algorithm that arti\ufb01cially in\ufb02ate the node degrees for improved\nstatistical performance. The current paper extends the previous statistical esti-\nmation results to the more canonical spectral clustering algorithm in a way that\nremoves any assumption on the minimum degree and provides guidance on the\nchoice of the tuning parameter. Moreover, our results show how the \u201cstar shape\u201d\nin the eigenvectors\u2013a common feature of empirical networks\u2013can be explained\nby the Degree-Corrected Stochastic Blockmodel and the Extended Planted Par-\ntition model, two statistical models that allow for highly heterogeneous degrees.\nThroughout, the paper characterizes and justi\ufb01es several of the variations of the\nspectral clustering algorithm in terms of these models.\n\n1\n\nIntroduction\n\nOur lives are embedded in networks\u2013social, biological, communication, etc.\u2013 and many researchers\nwish to analyze these networks to gain a deeper understanding of the underlying mechanisms. Some\ntypes of underlying mechanisms generate communities (aka clusters or modularities) in the network.\nAs machine learners, our aim is not merely to devise algorithms for community detection, but also\nto study the algorithm\u2019s estimation properties, to understand if and when we can make justi\ufb01able in-\nferences from the estimated communities to the underlying mechanisms. Spectral clustering is a fast\nand popular technique for \ufb01nding communities in networks. Several previous authors have studied\nthe estimation properties of spectral clustering under various statistical network models (McSherry\n[3], Dasgupta et al. [4], Coja-Oghlan and Lanka [5], Ames and Vavasis [6], Rohe et al. [7], Sussman\net al. [8] and Chaudhuri et al. [1]). Recently, Chaudhuri et al. [1] and Amini et al. [2] proposed two\ninspired ways of arti\ufb01cially in\ufb02ating the node degrees in ways that provide statistical regularization\nto spectral clustering.\nThis paper examines the statistical estimation performance of regularized spectral clustering under\nthe Degree-Corrected Stochastic Blockmodel (DC-SBM), an extension of the Stochastic Block-\nmodel (SBM) that allows for heterogeneous degrees (Holland and Leinhardt [9], Karrer and New-\nman [10]). The SBM and the DC-SBM are closely related to the planted partition model and the\nextended planted partition model, respectively. We extend the previous results in the following ways:\n(a) In contrast to previous studies, this paper studies the regularization step with a canonical version\nof spectral clustering that uses k-means. The results do not require any assumptions on the min-\nimum expected node degree; instead, there is a threshold demonstrating that higher degree nodes\nare easier to cluster. This threshold is a function of the leverage scores that have proven essential\nin other contexts, for both graph algorithms and network data analysis (see Mahoney [11] and ref-\nerences therein). These are the \ufb01rst results that relate leverage scores to the statistical performance\n\n1\n\n\fof spectral clustering. (b) This paper provides more guidance for data analytic issues than previous\napproaches. First, the results suggest an appropriate range for the regularization parameter. Sec-\nond, our analysis gives a (statistical) model-based explanation for the \u201cstar-shaped\u201d \ufb01gure that often\nappears in empirical eigenvectors. This demonstrates how projecting the rows of the eigenvector\nmatrix onto the unit sphere (an algorithmic step proposed by Ng et al. [12]) removes the ancillary\neffects of heterogeneous degrees under the DC-SBM. Our results highlight when this step may be\nunwise.\nPreliminaries: Throughout, we study undirected and unweighted graphs or networks. De\ufb01ne a\ngraph as G(E, V ), where V = {v1, v2, . . . , vN} is the vertex or node set and E is the edge set. We\nwill refer to node vi as node i. E contains a pair (i, j) if there is an edge between node i and j. The\nedge set can be represented by the adjacency matrix A \u2208 {0, 1}n\u00d7n. Aij = Aji = 1 if (i, j) is in\nthe edge set and Aij = Aji = 0 otherwise. De\ufb01ne the diagonal matrix D and the normalized Graph\nLaplacian L, both elements of RN\u00d7N , in the following way:\n\n(cid:88)\n\nDii =\n\nAij,\n\nL = D\u22121/2AD\u22121/2.\n\nj\n\nThe following notations will be used throughout the paper: ||\u00b7|| denotes the spectral norm, and ||\u00b7||F\ndenotes the Frobenius norm. For two sequence of variables {xN} and {yN}, we say xN = \u03c9(yN )\nif and only if yN /xN = o(1). \u03b4(.,.) is the indicator function where \u03b4x,y = 1 if x = y and \u03b4x,y = 0\nif x (cid:54)= y.\n\n2 The Algorithm: Regularized Spectral Clustering (RSC)\n\nFor a sparse network with strong degree heterogeneity, standard spectral clustering often fails to\nfunction properly (Amini et al. [2], Jin [13]). To account for this, Chaudhuri et al. [1] proposed the\nregularized graph Laplacian that can be de\ufb01ned as\n\nL\u03c4 = D\u22121/2\n\n\u03c4\n\nAD\u22121/2\n\n\u03c4\n\n\u2208 RN\u00d7N\n\nwhere D\u03c4 = D + \u03c4 I for \u03c4 \u2265 0.\nThe spectral algorithm proposed and studied by Chaudhuri et al. [1] divides the nodes into two\nrandom subsets and only uses the induced subgraph on one of those random subsets to compute\nthe spectral decomposition. In this paper, we will study the more traditional version of spectral\nalgorithm that uses the spectral decomposition on the entire matrix (Ng et al. [12]). De\ufb01ne the\nregularized spectral clustering (RSC) algorithm as follows:\n\n1. Given input adjacency matrix A, number of clusters K, and regularizer \u03c4, calculate the\nregularized graph Laplacian L\u03c4 . (As discussed later, a good default for \u03c4 is the average\nnode degree.)\n2. Find the eigenvectors X1, ..., XK \u2208 RN corresponding to the K largest eigenvalues of L\u03c4 .\nForm X = [X1, ..., XK] \u2208 RN\u00d7K by putting the eigenvectors into the columns.\n3. Form the matrix X\u2217 \u2208 RN\u00d7K from X by normalizing each of X\u2019s rows to have\nunit length. That is, project each row of X onto the unit sphere of RK (X\u2217\nij =\n\nXij/((cid:80)\n\nj X 2\n\nij)1/2).\n\n4. Treat each row of X\u2217 as a point in RK, and run k-means with K clusters. This creates K\n\nnon-overlapping sets V1, ..., VK whose union is V.\n\n5. Output V1, ..., VK. Node i is assigned to cluster r if the i\u2019th row of X\u2217 is assigned to Vr.\nThis paper will refer to \u201cstandard spectral clustering\u201d as the above algorithm with L replacing L\u03c4 .\nThese spectral algorithms have two main steps: 1) \ufb01nd the principal eigenspace of the (regularized)\ngraph Laplacian; 2) determine the clusters in the low dimensional eigenspace. Later, we will study\nRSC under the Degree-Corrected Stochastic Blockmodel and show rigorously how regularization\nhelps to maintain cluster information in step (a) and why normalizing the rows of X helps in step\n(b). From now on, we use X\u03c4 and X\u2217\n\u03c4 instead of X and X\u2217 to emphasize that they are related to\nL\u03c4 . Let X i\nThe next section introduces the Degree-Corrected Stochastic Blockmodel and its matrix formulation.\n\n\u03c4 ]i denote the i\u2019th row of X\u03c4 and X\u2217\n\u03c4 .\n\n\u03c4 and [X\u2217\n\n2\n\n\f3 The Degree-Corrected Stochastic Blockmodel (DC-SBM)\n\nIn the Stochastic Blockmodel (SBM), each node belongs to one of K blocks. Each edge corresponds\nto an independent Bernoulli random variable where the probability of an edge between any two\nnodes depends only on the block memberships of the two nodes (Holland and Leinhardt [9]). The\nformal de\ufb01nition is as follows.\nDe\ufb01nition 3.1. For a node set {1, 2, ..., N}, let z : {1, 2, ..., N} \u2192 {1, 2, ..., K} partition the N\nnodes into K blocks. So, zi equals the block membership for node i. Let B be a K \u00d7 K matrix\nwhere Bab \u2208 [0, 1] for all a, b. Then under the SBM, the probability of an edge between i and j is\nPij = Pji = Bzizj for any i, j = 1, 2, ..., n. Given z, all edges are independent.\n\neach block, the summation of \u03b8i\u2019s is 1. That is, (cid:80)\n\nOne limitation of the SBM is that it presumes all nodes within the same block have the same expected\ndegree. The Degree-Corrected Stochastic Blockmodel (DC-SBM) (Karrer and Newman [10]) is\na generalization of the SBM that adds an additional set of parameters (\u03b8i > 0 for each node i)\nthat control the node degrees. Let B be a K \u00d7 K matrix where Bab \u2265 0 for all a, b. Then the\nprobability of an edge between node i and node j is \u03b8i\u03b8jBzizj , where \u03b8i\u03b8jBzizj \u2208 [0, 1] for any\ni, j = 1, 2, ..., n. Parameters \u03b8i are arbitrary to within a multiplicative constant that is absorbed into\nB. To make it identi\ufb01able, Karrer and Newman [10] suggest imposing the constraint that, within\ni \u03b8i\u03b4zi,r = 1 for any block label r. Under\nthis constraint, B has explicit meaning: If s (cid:54)= t, Bst represents the expected number of links\nbetween block s and block t and if s = t, Bst is twice the expected number of links within block s.\nThroughout the paper, we assume that B is positive de\ufb01nite.\nUnder the DC-SBM, de\ufb01ne A (cid:44) EA. This matrix can be expressed as a product of the matrices,\n\nwhere (1) \u0398 \u2208 RN\u00d7N is a diagonal matrix whose ii\u2019th element is \u03b8i and (2) Z \u2208 {0, 1}N\u00d7K is the\nmembership matrix with Zit = 1 if and only if node i belongs to block t (i.e. zi = t).\n\nA = \u0398ZBZ T \u0398,\n\n3.1 Population Analysis\n\nUnder the DC-SBM, if the partition is identi\ufb01able, then one should be able to determine the partition\nfrom A . This section shows that with the population adjacency matrix A and a proper regularizer\n\u03c4, RSC perfectly reconstructs the block partition.\n\nDe\ufb01ne the diagonal matrix D to contain the expected node degrees, Dii = (cid:80)\n\nAij and de\ufb01ne\nD\u03c4 = D + \u03c4 I where \u03c4 \u2265 0 is the regularizer. Then, de\ufb01ne the population graph Laplacian L and\nthe population version of regularized graph Laplacian L\u03c4 , both elements of RN\u00d7N , in the following\nway:\n\nj\n\nL\u03c4 = D\u22121/2\n\nL = D\u22121/2A D\u22121/2,\n\nDe\ufb01ne DB \u2208 RK\u00d7K as a diagonal matrix whose (s, s)\u2019th element is [DB]ss =(cid:80)\n\nt Bst. A couple\nlines of algebra shows that [DB]ss = Ws is the total expected degrees of nodes from block s and\nthat Dii = \u03b8i[DB]zizi. Using these quantities, the next Lemma gives an explicit form for L\u03c4 as a\nproduct of the parameter matrices.\nLemma 3.2. (Explicit form for L\u03c4 ) Under the DC-SBM with K blocks with parameters {B, Z, \u0398},\nde\ufb01ne \u03b8\u03c4\n\nA D\u22121/2\n\n.\n\n\u03c4\n\n\u03c4\n\ni as:\n\n\u03b8\u03c4\ni =\n\n\u03b82\ni\n\n\u03b8i + \u03c4 /Wzi\n\n= \u03b8i\n\nDii\n\n.\n\nDii + \u03c4\n\nLet \u0398\u03c4 \u2208 Rn\u00d7n be a diagonal matrix whose ii\u2019th entry is \u03b8\u03c4\nL\u03c4 can be written\n\ni . De\ufb01ne BL = D\n\n\u22121/2\nB BD\n\n\u22121/2\nB\n\n, then\n\nL\u03c4 = D\u2212 1\n\n\u03c4 A D\u2212 1\n\n2\n\n2\n\n\u03c4 = \u0398\n\n\u03c4 ZBLZ T \u0398\n\n1\n2\n\n1\n2\n\n\u03c4 .\n\nRecall that A = \u0398ZBZ T \u0398. Lemma 3.2 demonstrates that L\u03c4 has a similarly simple form that\nseparates the block-related information (BL) and node speci\ufb01c information (\u0398\u03c4 ). Notice that if\n\u03c4 = 0, then \u03980 = \u0398 and L = D\u2212 1\n2 . The next lemma shows that L\u03c4\nhas rank K and describes how its eigen-decomposition can be expressed in terms of Z and \u0398.\n\n2 ZBLZ T \u0398 1\n\n2 A D\u2212 1\n\n2 = \u0398 1\n\n3\n\n\fLemma 3.3. (Eigen-decomposition for L\u03c4 ) Under the DC-SBM with K blocks and parameters\n{B, Z, \u0398}, L\u03bb has K positive eigenvalues. The remaining N \u2212 K eigenvalues are zero. Denote\nthe K positive eigenvalues of L\u03c4 as \u03bb1 \u2265 \u03bb2 \u2265 ... \u2265 \u03bbK > 0 and let X\u03c4 \u2208 RN\u00d7K contain the\neigenvector corresponding to \u03bbi in its i\u2019th column. De\ufb01ne X \u2217\n\u03c4 to be the row-normalized version of\nX\u03c4 , similar to X\u2217\n\u03c4 as de\ufb01ned in the RSC algorithm in Section 2. Then, there exists an orthogonal\nmatrix U \u2208 RK\u00d7K depending on \u03c4, such that\n\n1\n2\n\n\u03c4 Z(Z T \u0398\u03c4 Z)\u22121/2U\n\n1. X\u03c4 = \u0398\n2. X \u2217\n\n\u03c4 = ZU, Zi (cid:54)= Zj \u21d4 ZiU (cid:54)= ZjU, where Zi denote the i\u2019th row of the membership\nmatrix Z.\n\n\u03c4 and X j\n\ni(cid:80)\n\n\u03c4 . First, if two nodes i and j\n\u03c4 ) both point\n\u03c4 ||2 = (\n)1/2. Second, if two nodes\n\u03c4 are orthogonal to each other. Third, if zi = zj\n\u03c4 ]j =\n\u03c4 ]j. Figure 1 illustrates the\n\u03c4 when there are three underlying blocks. Notice that running k-means on\n\nThis lemma provides four useful facts about the matrices X\u03c4 and X \u2217\nbelong to the same block, then the corresponding rows of X\u03c4 (denoted as X i\nin the same direction, but with different lengths: ||X i\ni and j belong to different blocks, then X i\n\u03c4 and X j\nthen after projecting these points onto the sphere as in X \u2217\nUzi. Finally, if zi (cid:54)= zj, then the rows are perpendicular, [X \u2217\ngeometry of X\u03c4 and X \u2217\nthe rows of X \u2217\nNote that if \u0398 were the identity matrix, then the left panel in Figure 1 would look like the right panel\nin Figure 1; without degree heterogeneity, there would be no star shape and no need for a projection\nstep. This suggests that the star shaped \ufb01gure often observed in data analysis stems from the degree\nheterogeneity in the network.\n\n\u03c4 ]i \u22a5 [X \u2217\n\u03bb (in right panel of Figure 1) will return perfect clusters.\n\n\u03b8\u03c4\nj \u03b8\u03c4\nj \u03b4zj ,zi\n\n\u03c4 , the rows are equal: [X \u2217\n\n\u03c4 ]i = [X \u2217\n\nFigure 1: In this numerical example, A comes from the DC-SBM with three blocks. Each point\ncorresponds to one row of the matrix X\u03c4 (in left panel) or X \u2217\n\u03c4 (in right panel). The different colors\ncorrespond to three different blocks. The hollow circle is the origin. Without normalization (left\npanel), the nodes with same block membership share the same direction in the projected space.\nAfter normalization (right panel), nodes with same block membership share the same position in\nthe projected space.\n\n4 Regularized Spectral Clustering with the Degree Corrected model\n\nThis section bounds the mis-clustering rate of Regularized Spectral Clustering under the DC-SBM.\nThe section proceeds as follows: Theorem 4.1 shows that L\u03c4 is close to L\u03c4 . Theorem 4.2 shows\nthat X\u03c4 is close to X\u03c4 and that X\u2217\n\u03c4 . Finally, Theorem 4.4 shows that the output from\nRSC with L\u03c4 is close to the true partition in the DC-SBM (using Lemma 3.3).\nTheorem 4.1. (Concentration of the regularized Graph Laplacian) Let G be a random graph, with\nindependent edges and pr(vi \u223c vj) = pij. Let \u03b4 be the minimum expected degree of G, that is\n\u03b4 = mini Dii. For any \u0001 > 0, if \u03b4 + \u03c4 > 3 ln N + 3 ln(4/\u0001), then with probability at least 1 \u2212 \u0001,\n\n\u03c4 is close to X \u2217\n\n(cid:114) 3 ln(4N/\u0001)\n\n\u03b4 + \u03c4\n\n||L\u03c4 \u2212 L\u03c4|| \u2264 4\n\n4\n\n.\n\n(1)\n\n\u22120.2\u22120.15\u22120.1\u22120.050\u22120.2\u22120.100.10.2\u22120.15\u22120.1\u22120.0500.050.10.150.2\u22121\u22120.50\u22121\u22120.500.5\u22120.8\u22120.6\u22120.4\u22120.200.20.40.60.8\f\u03c4\n\n\u03c4 A\u2212 D\u22121\n\nRemark: This theorem builds on the results of Chung and Radcliffe [14] and Chaudhuri et al. [1]\nwhich give a seemingly similar bound on ||L\u2212 L || and ||D\u22121\nA ||. However, the previous\npapers require that \u03b4 \u2265 c ln N, where c is some constant. This assumption is not satis\ufb01ed in a large\nproportion of sparse empirical networks with heterogeneous degrees. In fact, the regularized graph\nLaplacian is most interesting when this condition fails, i.e. when there are several nodes with very\nlow degrees. Theorem 4.1 only assumes that \u03b4 + \u03c4 > 3 ln N + 3 ln(4/\u0001). This is the fundamental\nreason that RSC works for networks containing some nodes with extremely small degrees. It shows\nthat, by introducing a proper regularizer \u03c4, ||L\u03c4 \u2212 L\u03c4|| can be well bounded, even with \u03b4 very small.\nLater we will show that a suitable choice of \u03c4 is the average degree.\nThe next theorem bounds the difference between the empirical and population eigenvectors (and\ntheir row normalized versions) in terms of the Frobenius norm.\nTheorem 4.2. Let A be the adjacency matrix generated from the DC-SBM with K blocks and pa-\nrameters {B, Z, \u0398}. Let \u03bb1 \u2265 \u03bb2 \u2265 ... \u2265 \u03bbK > 0 be the only K positive eigenvalues of L\u03c4 .\nLet X\u03c4 and X\u03c4 \u2208 RN\u00d7K contain the top K eigenvectors of L\u03c4 and L\u03c4 respectively. De\ufb01ne\nm = mini{min{||X i\n\u03c4 and\n\u03c4 \u2208 RN\u00d7K be the row normalized versions of X\u03c4 and X\u03c4 , as de\ufb01ned in step 3 of the RSC\nX \u2217\nalgorithm.\nFor any \u0001 > 0 and suf\ufb01ciently large N, assume that\n\n\u03c4 ||2}} as the length of the shortest row in X\u03c4 and X\u03c4 . Let X\u2217\n\n\u03c4||2,||X i\n\n(cid:114) K ln(4N/\u0001)\n\n(a)\n\n\u221a\n\u2264 1\n8\n\n\u03bbK,\n\n(b) \u03b4 + \u03c4 > 3 ln N + 3 ln(4/\u0001),\n\n\u03b4 + \u03c4\n\nthen with probability at least 1 \u2212 \u0001, the following holds,\n||X\u2217\n||X\u03c4 \u2212 X\u03c4 O||F \u2264 c0\n\n(cid:114) K ln(4N/\u0001)\n\n, and\n\n3\n\n1\n\u03bbK\n\n\u03b4 + \u03c4\n\n\u03c4 \u2212 X \u2217\n\n\u03c4\n\nO||F \u2264 c0\n\n1\n\nm\u03bbK\n\n(cid:114) K ln(4N/\u0001)\n\n. (2)\n\n\u03b4 + \u03c4\n\nThe proof of Theorem 4.2 can be found in the supplementary materials.\nNext we use Theorem 4.2 to derive a bound on the mis-clustering rate of RSC. To de\ufb01ne \u201cmis-\nclustered\u201d, recall that RSC applies the k-means algorithm to the rows of X\u2217\n\u03c4 , where each row is a\npoint in RK. Each row is assigned to one cluster, and each of these clusters has a centroid from\nk-means. De\ufb01ne C1, . . . , Cn \u2208 RK such that Ci is the centroid corresponding to the i\u2019th row of\n\u03c4 . Similarly, run k-means on the rows of the population eigenvector matrix X \u2217\nX\u2217\n\u03c4 and de\ufb01ne the\npopulation centroids C1, . . . ,Cn \u2208 RK. In essence, we consider node i correctly clustered if Ci is\ncloser to Ci than it is to any other Cj for all j with Zj (cid:54)= Zi.\nThe de\ufb01nition is complicated by the fact that, if any of the \u03bb1, . . . , \u03bbK are equal, then only the\nsubspace spanned by their eigenvectors is identi\ufb01able. Similarly, if any of those eigenvalues are\nclose together, then the estimation results for the individual eigenvectors are much worse that for the\nestimation results for the subspace that they span. Because clustering only requires estimation of the\ncorrect subspace, our de\ufb01nition of correctly clustered is amended with the rotation O T \u2208 RK\u00d7K,\nthe matrix which minimizes (cid:107)X\u2217\n\u03c4 (cid:107)F . This is referred to as the orthogonal Procrustes\nproblem and [15] shows how the singular value decomposition gives the solution.\nDe\ufb01nition 4.3. If CiO T is closer to Ci than it is to any other Cj for j with Zj (cid:54)= Zi, then we say\nthat node i is correctly clustered. De\ufb01ne the set of mis-clustered nodes:\n\nO T \u2212 X \u2217\n\n\u03c4\n\nM = {i : \u2203j (cid:54)= i, s.t.||CiO T \u2212 Ci||2 > ||CiO T \u2212 Cj||2}.\n\n(3)\n\nThe next theorem bounds the mis-clustering rate |M|/N.\nTheorem 4.4. (Main Theorem) Suppose A \u2208 RN\u00d7N is an adjacency matrix of a graph G gener-\nated from the DC-SBM with K blocks and parameters {B, Z, \u0398}. Let \u03bb1 \u2265 \u03bb2 \u2265 ... \u2265 \u03bbK > 0\nbe the K positive eigenvalues of L\u03c4 . De\ufb01ne M , the set of mis-clustered nodes, as in De\ufb01nition 4.3.\nLet \u03b4 be the minimum expected degree of G. For any \u0001 > 0 and suf\ufb01ciently large N, assume (a)\nand (b) as in Theorem 4.2. Then with probability at least 1 \u2212 \u0001, the mis-clustering rate of RSC with\nregularization constant \u03c4 is bounded,\n\n|M|/N \u2264 c1\n\nK ln(N/\u0001)\n\nN m2(\u03b4 + \u03c4 )\u03bb2\nK\n\n.\n\n(4)\n\n5\n\n\fEM\nN , \u03b2\n\nC = (Z T \u0398\u03c4 Z)1/2BL(Z T \u0398\u03c4 Z)1/2 \u2208 RK\u00d7K\n\nIn simulations, we \ufb01nd that \u03c4 = M/N (i.e.\n\ns. If EM = \u03c9(N ln N ) where M = (cid:80)\n\n(5)\nhas the same eigenvalues as the largest K eigenvalues of L\u03c4 (see supplementary materials for de-\ni within block\ntails). The matrix Z T \u0398\u03c4 Z is diagonal and the (s, s)\u2019th element is the summation of \u03b8\u03c4\ni Dii is the sum of the node degrees, then \u03c4 = \u03c9(M/N )\nsends the smallest diagonal entry of Z T \u0398\u03c4 Z to 0, sending \u03bbK, the smallest eigenvalue of C, to zero.\nEM\nThe trade-off between these two suggests that a proper range of \u03c4 is (\u03b1\nN ), where 0 < \u03b1 < \u03b2\nare two constants. Keeping \u03c4 within this range guarantees that \u03bbK is lower bounded by some\nconstant depending only on K.\nthe average node\ndegree) provides good results. The theoretical results only suggest that this is the correct rate. So,\none could adjust this by a multiplicative constant. Our simulations suggest that the results are not\nsensitive to such adjustments.\nRemark 2 (Thresholding m): Mahoney [11] (and references therein) shows how the leverage\nscores of A and L are informative for both data analysis and algorithmic stability. For L, the leverage\nscore of node i is ||X i||2\n2, the length of the ith row of the matrix containing the top K eigenvectors.\nTheorem 4.4 is the \ufb01rst result that explicitly relates the leverage scores to the statistical performance\nof spectral clustering. Recall that m2 is the minimum of the squared row lengths in X\u03c4 and X\u03c4 ,\nthat is the minimum leverage score in both L\u03c4 and L\u03c4 . This appears in the denominator of (4). The\nleverage scores in L\u03c4 have an explicit form ||X i\n. So, if node i has small expected\n\u03c4 ||2 small. This can deteriorate the bound in Theorem 4.4.\ni is small, rendering ||X i\ndegree, then \u03b8\u03c4\n\u03c4 onto the unit sphere for a node i with small leverage; it\nThe problem arises from projecting X i\nampli\ufb01es a noisy measurement. Motivated by this intuition, the next corollary focuses on the high\nleverage nodes. More speci\ufb01cally, let m\u2217 denote the threshold. De\ufb01ne S to be a subset of nodes\nwhose leverage scores in L\u03c4 and X\u03c4 , ||X i\nS = {i : ||X i\n\n\u03c4 || and ||X i\n\u03c4|| \u2265 m\u2217}.\n\u03c4 || \u2265 m\u2217,||X i\n\u03c4 ]i, i \u2208 S}, we cluster these nodes. The\nThen by applying k-means on the set of vectors {[X\u2217\nfollowing corollary bounds the mis-clustering rate on S.\nCorollary 4.5. Let N1 = |S| denote the number of nodes in S and de\ufb01ne M1 = M \u2229 S as the set of\n\u221a\nmis-clustered nodes restricted in S. With the same settings and assumptions as in Theorem 4.4, let\n\u03b3 > 0 be a constant and set m\u2217 = \u03b3/\nN. If N/N1 = O(1), then by applying k-means on the set of\n\u03c4 ]i, i \u2208 S}, we have with probability at least 1 \u2212 \u0001, there exist constant c2 independent\nvectors {[X\u2217\nof \u0001, such that\n\n\u03c4|| exceed the threshold m\u2217:\n\n\u03c4 ||2\n\n2 =\n\ni(cid:80)\n\n\u03b8\u03c4\nj \u03b8\u03c4\nj \u03b4zj ,zi\n\nRemark 1 (Choice of \u03c4): The quality of the bound in Theorem 4.4 depends on \u03c4 through three\nterms: (\u03b4 + \u03c4 ), \u03bbK, and m. Setting \u03c4 equal to the average node degree balances these terms. In\nessence, if \u03c4 is too small, there is insuf\ufb01cient regularization. Speci\ufb01cally, if the minimum expected\ndegree \u03b4 = O(ln N ), then we need \u03c4 \u2265 c(\u0001) ln N to have enough regularization to satisfy condition\n(b) on \u03b4 + \u03c4. Alternatively, if \u03c4 is too large, it washes out signi\ufb01cant eigenvalues.\nTo see that \u03c4 should not be too large, note that\n\n|M1|/N1 \u2264 c2\n\nK ln(N1/\u0001)\n\u03b32(\u03b4 + \u03c4 )\u03bb2\nK\n\n.\n\n(6)\n\nIn the main theorem (Theorem 4.4), the denominator of the upper bound contains m2. Since we do\nnot make a minimum degree assumption, this value potentially approaches zero, making the bound\nuseless. Corollary 4.5 replaces N m2 with the constant \u03b32, providing a superior bound when there\nare several small leverage scores.\nIf \u03bbK (the Kth largest eigenvalue of L\u03c4 ) is bounded below by some constant and \u03c4 = \u03c9(ln N ),\nthen Corollary 4.5 implies that |M1|/N1 = op(1). The above thresholding procedure only clusters\nthe nodes in S. To cluster all of the nodes, de\ufb01ne the thresholded RSC (t-RSC) as follows:\n\n(a) Follow step (1), (2), and (3) of RSC as in section 2.\n(b) Apply k-means with K clusters on the set S = {i,||X i\n(c) For each node i /\u2208 S, \ufb01nd the centroid Cs such that ||[X\u2217\n\n\u221a\n\u03c4||2 \u2265 \u03b3/\nthem to one of V1, ..., VK. Let C1, ..., CK denote the K centroids given by k-means.\n\u03c4 ]i \u2212 Cs||2 = min1\u2264t\u2264K||[X\u2217\nCt||2. Assign node i to Vs. Output V1, ...VK.\n\nN} and assign each of\n\u03c4 ]i \u2212\n\n6\n\n\fRemark 3 (Applying to SC): Theorem 4.4 can be easily applied to the standard SC algorithm under\nboth the SBM and the DC-SBM by setting \u03c4 = 0. In this setting, Theorem 4.4 improves upon the\nprevious results for spectral clustering.\nDe\ufb01ne the four parameter Stochastic Blockmodel SBM (p, r, s, K) as follows: p is the probability\nof an edge occurring between two nodes from the same block, r is the probability of an out-block\nlinkage, s is the number of nodes within each block, and K is the number of blocks.\nBecause the SBM lacks degree heterogeneity within blocks, the rows of X within the same block\nalready share the same length. So, it is not necessary to project X i\u2019s to the unit sphere. Under the\nfour parameter model, \u03bbK = (K[r/(p \u2212 r)] + 1)\u22121 (Rohe et al. [7]). Using Theorem 4.4, with p\nand r \ufb01xed and p > r, and applying k-means to the rows of X, we have\n\n|M|/N = Op\n\n(cid:113) N\n(7)\nln N ), then |M|/N \u2192 0 in probability. This improves the previous results that required\nIf K = o(\nK = o(N 1/3) (Rohe et al. [7]). Moreover, it makes the results for spectral clustering comparable to\nthe results for the MLE in Choi et al. [16].\n\nN\n\n(cid:18) K 2 ln N\n\n(cid:19)\n\n.\n\n5 Simulation and Analysis of Political Blogs\n\nThis section compares \ufb01ve different methods of spectral clustering. Experiment 1 generates net-\nworks from the DC-SBM with a power-law degree distribution. Experiment 2 generates networks\nfrom the standard SBM. Finally, the bene\ufb01ts of regularization are illustrated on an empirical network\nfrom the political blogosphere during the 2004 presidential election (Adamic and Glance [17]).\nThe simulations compare (1) standard spectral clustering (SC), (2) RSC as de\ufb01ned in section 2, (3)\nRSC without projecting X\u03c4 onto unit sphere (RSC wp), (4) regularized SC with thresholding (t-\nRSC), and (5) spectral clustering with perturbation (SCP) (Amini et al. [2]) which applies SC to the\nperturbed adjacency matrix Aper = A + a11T . In addition, experiment 2 compares the performance\nof RSC on the subset of nodes with high leverage scores (RSC on S) with the other 5 methods. We\nset \u03c4 = M/N, threshold parameter \u03b3 = 1, and a = M/N 2 except otherwise speci\ufb01ed.\nExperiment 1. This experiment examines how degree heterogeneity affects the performance of the\nspectral clustering algorithms. The \u0398 parameters (from the DC-SBM) are drawn from the power law\ndistribution with lower bound xmin = 1 and shape parameter \u03b2 \u2208 {2, 2.25, 2.5, 2.75, 3, 3.25, 3.5}.\nA smaller \u03b2 indicates to greater degree heterogeneity. For each \ufb01xed \u03b2, thirty networks are sampled.\nIn each sample, K = 3 and each block contains 300 nodes (N = 900). De\ufb01ne the signal to noise\nratio to be the expected number of in-block edges divided by the expected number of out-block\nedges. Throughout the simulations, the SNR is set to three and the expected average degree is set to\neight.\nThe left panel of Figure 2 plots \u03b2 against the misclustering rate for SC, RSC, RSC wp, t-RSC, SCP\nand RSC on S. Each point is the average of 30 sampled networks. Each line represents one method.\nIf a method assigns more than 95% of the nodes into one block, then we consider all nodes to be\nmisclustered. The experiment shows that (1) if the degrees are more heterogeneous (\u03b2 \u2264 3.5),\nthen regularization improves the performance of the algorithms; (2) if \u03b2 < 3, then RSC and t-\nRSC outperform RSC wp and SCP, verifying that the normalization step helps when the degrees are\nhighly heterogeneous; and, \ufb01nally, (3) uniformly across the setting of \u03b2, it is easier to cluster nodes\nwith high leverage scores.\nExperiment 2. This experiment compares SC, RSC, RSC wp, t-RSC and SCP under the SBM with\nno degree heterogeneity. Each simulation has K = 3 blocks and N = 1500 nodes. As in the\nprevious experiment, SNR is set to three. In this experiment, the average degree has three different\nsettings: 10, 21, 30. For each setting, the results are averaged over 50 samples of the network.\nThe right panel of Figure 2 shows the misclustering rate of SC and RSC for the three different\nvalues of the average degree. SCP, RSC wp, t-RSC perform similarly to RSC, demonstrating that\nunder the standard SBM (i.e. without degree heterogeneity) all spectral clustering methods perform\ncomparably. The one exception is that under the sparsest model, SC is less stable than the other\nmethods.\n\n7\n\n\fFigure 2: Left Panel: Comparison of Performance for SC, RSC, RSC wp, t-RSC, SCP and (RSC\non S) under different degree heterogeneity. Smaller \u03b2 corresponds to greater degree heterogeneity.\nRight Panel: Comparison of Performance for SC and RSC under SBM with different sparsity.\n\nAnalysis of Blog Network. This empirical network is comprised of political blogs during the 2004\nUS presidential election (Adamic and Glance [17]). Each blog has a known label as liberal or\nconservative. As in Karrer and Newman [10], we symmetrize the network and consider only the\nlargest connected component of 1222 nodes. The average degree of the network is roughly 15. We\napply RSC to the data set with \u03c4 ranging from 0 to 30. In the case where \u03c4 = 0, it is standard\nSpectral Clustering. SC assigns 1144 out of 1222 nodes to the same block, failing to detect the\nideological partition. RSC detects the partition, and its performance is insensitive to the \u03c4. With\n\u03c4 \u2208 [1, 30], RSC misclusters (80 \u00b1 2) nodes out of 1222.\nIf RSC is applied to the 90% of nodes with the largest leverage scores (i.e. excluding the nodes\nwith the smallest leverage scores), then the misclustering rate among these high leverage nodes is\n44/1100, which is almost 50% lower. This illustrates how the leverage score corresponding to a\nnode can gauge the strength of the clustering evidence for that node relative to the other nodes.\nWe tried to compare these results to the regularized algorithm in [1]. However, because there are\nseveral very small degree nodes in this data, the values computed in step 4 of the algorithm in [1]\nsometimes take negative values. Then, step 5 (b) cannot be performed.\n\n6 Discussion\n\nIn this paper, we give theoretical, simulation, and empirical results that demonstrate how a simple\nadjustment to the standard spectral clustering algorithm can give dramatically better results for net-\nworks with heterogeneous degrees. Our theoretical results add to the current results by studying the\nregularization step in a more canonical version of the spectral clustering algorithm. Moreover, our\nmain results require no assumptions on the minimum node degree. This is crucial because it allows\nus to study situations where several nodes have small leverage scores; in these situations, regular-\nization is most bene\ufb01cial. Finally, our results demonstrate that choosing a tuning parameter close to\nthe average degree provides a balance between several competing objectives.\n\nAcknowledgements\n\nThanks to Sara Fernandes-Taylor for helpful comments. Research of TQ is supported by NSF Grant\nDMS-0906818 and NIH Grant EY09946. Research of KR is supported by grants from WARF and\nNSF grant DMS-1309998.\n\n8\n\nlllllll2.02.53.03.50.00.20.40.60.81.0betamis\u2212clustering ratelSCRSCRSC_wpt\u2212RSCSCPRSC on Slll10152025300.00.10.20.30.40.5expected average degreemis\u2212clustering ratelSCRSC\fReferences\n[1] K. Chaudhuri, F. Chung, and A. Tsiatas. Spectral clustering of graphs with general degrees\nin the extended planted partition model. Journal of Machine Learning Research, pages 1\u201323,\n2012.\n\n[2] Arash A Amini, Aiyou Chen, Peter J Bickel, and Elizaveta Levina. Pseudo-likelihood methods\n\nfor community detection in large sparse networks. 2012.\n\n[3] F. McSherry. Spectral partitioning of random graphs. In Foundations of Computer Science,\n\n2001. Proceedings. 42nd IEEE Symposium on, pages 529\u2013537. IEEE, 2001.\n\n[4] Anirban Dasgupta, John E Hopcroft, and Frank McSherry. Spectral analysis of random graphs\nwith skewed degree distributions. In Foundations of Computer Science, 2004. Proceedings.\n45th Annual IEEE Symposium on, pages 602\u2013610. IEEE, 2004.\n\n[5] Amin Coja-Oghlan and Andr\u00b4e Lanka. Finding planted partitions in random graphs with general\n\ndegree distributions. SIAM Journal on Discrete Mathematics, 23(4):1682\u20131714, 2009.\n\n[6] Brendan PW Ames and Stephen A Vavasis. Convex optimization for the planted k-disjoint-\n\nclique problem. arXiv preprint arXiv:1008.2814, 2010.\n\n[7] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic\n\nblockmodel. The Annals of Statistics, 39(4):1878\u20131915, 2011.\n\n[8] D.L. Sussman, M. Tang, D.E. Fishkind, and C.E. Priebe. A consistent adjacency spectral\nembedding for stochastic blockmodel graphs. Journal of the American Statistical Association,\n107(499):1119\u20131128, 2012.\n\n[9] P.W. Holland and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):\n\n109\u2013137, 1983.\n\n[10] Brian Karrer and Mark EJ Newman. Stochastic blockmodels and community structure in\n\nnetworks. Physical Review E, 83(1):016107, 2011.\n\n[11] Michael W Mahoney. Randomized algorithms for matrices and data. Advances in Machine\nLearning and Data Mining for Astronomy, CRC Press, Taylor & Francis Group, Eds.: Michael\nJ. Way, Jeffrey D. Scargle, Kamal M. Ali, Ashok N. Srivastava, p. 647-672, 1:647\u2013672, 2012.\n[12] Andrew Y Ng, Michael I Jordan, and Yair Weiss. On spectral clustering: Analysis and an\n\nalgorithm. Advances in neural information processing systems, 2:849\u2013856, 2002.\n\n[13] Jiashun Jin. Fast network community detection by score. arXiv preprint arXiv:1211.5803,\n\n2012.\n\n[14] Fan Chung and Mary Radcliffe. On the spectra of general random graphs.\n\njournal of combinatorics, 18(P215):1, 2011.\n\nthe electronic\n\n[15] Peter H Sch\u00a8onemann. A generalized solution of the orthogonal procrustes problem. Psychome-\n\ntrika, 31(1):1\u201310, 1966.\n\n[16] D.S. Choi, P.J. Wolfe, and E.M. Airoldi. Stochastic blockmodels with a growing number of\n\nclasses. Biometrika, 99(2):273\u2013284, 2012.\n\n[17] Lada A Adamic and Natalie Glance. The political blogosphere and the 2004 us election:\ndivided they blog. In Proceedings of the 3rd international workshop on Link discovery, pages\n36\u201343. ACM, 2005.\n\n9\n\n\f", "award": [], "sourceid": 1422, "authors": [{"given_name": "Tai", "family_name": "Qin", "institution": "UW-Madison"}, {"given_name": "Karl", "family_name": "Rohe", "institution": "UW-Madison"}]}