{"title": "Spectral Modification of Graphs for Improved Spectral Clustering", "book": "Advances in Neural Information Processing Systems", "page_first": 4858, "page_last": 4867, "abstract": "Spectral clustering algorithms provide approximate solutions to hard optimization problems that formulate graph partitioning in terms of the graph conductance. It is well understood that the quality of these approximate solutions is negatively affected by a possibly significant gap between the conductance and the second eigenvalue of the graph. In this paper we show that for \\textbf{any} graph $G$, there exists a `spectral maximizer' graph $H$ which is cut-similar to $G$, but has eigenvalues that are near the theoretical limit implied by the cut structure of $G$. Applying then spectral clustering on $H$ has the potential to produce improved cuts that also exist in $G$ due to the cut similarity. This leads to the second contribution of this work: we describe a practical spectral modification algorithm that raises the eigenvalues of the input graph, while preserving its cuts. Combined with spectral clustering on the modified graph, this yields demonstrably improved cuts.", "full_text": "Spectral Modi\ufb01cation of Graphs\nfor Improved Spectral Clustering\n\nIoannis Koutis\n\nDepartment of Computer Science\nNew Jersey Institute of Technology\n\nNewark, NJ 07102\nikoutis@njit.edu\n\nHuong Le\n\nDepartment of Computer Science\nNew Jersey Institute of Technology\n\nNewark, NJ 07102\nhyl4@njit.edu\n\nAbstract\n\nSpectral clustering algorithms provide approximate solutions to hard optimization\nproblems that formulate graph partitioning in terms of the graph conductance. It\nis well understood that the quality of these approximate solutions is negatively\naffected by a possibly signi\ufb01cant gap between the conductance and the second\neigenvalue of the graph. In this paper we show that for any graph G, there exists\na \u2018spectral maximizer\u2019 graph H which is cut-similar to G, but has eigenvalues\nthat are near the theoretical limit implied by the cut structure of G. Applying then\nspectral clustering on H has the potential to produce improved cuts that also exist\nin G due to the cut similarity. This leads to the second contribution of this work:\nwe describe a practical spectral modi\ufb01cation algorithm that raises the eigenvalues\nof the input graph, while preserving its cuts. Combined with spectral clustering on\nthe modi\ufb01ed graph, this yields demonstrably improved cuts.\n\n1\n\nIntroduction\n\nSpectral Clustering is a widely known family of algorithms that use eigenvectors to partition the\nvertices of a graph into meaningful clusters. The introduction of spectral partitioning methods goes\nback to the work of Donath and Hoffman [8] who used eigenvectors for partitioning logic circuits,\nbut owes its popularity to the work of Shi and Malik [25] who brought it in the realm of computer\nvision and machine learning, subsequently leading to a vast amount of related works. Several other\nclustering methods have since emerged, including of course methods based on neural networks. But\nspectral clustering remains a frequently used baseline, and a serious contender to state-of-the-art\ngraph embedding methods, e.g. [20, 11, 28, 22].\nThe remarkable performance of spectral clustering is possibly due to the fact that it produces outputs\nwith theoretically understood approximation properties. Roughly speaking, spectral clustering\ncomputes the second eigenvalue \u03bb of the normalized graph Laplacian as an approximation to the\ngraph conductance, i.e. the value of the optimal cut. Cheeger inequality shows that while \u03bb is never\ngreater than \u03c6, it can be as small as \u03c62 [6]. That implies that the approximation can be a factor of\n(\u03c6/\u03bb) away from the optimal value, which can be up to O(n) even for unweighted graphs. While\nthis may be often a pessimistic estimate, there are known families of graphs where the estimate is\nrealized; in such graphs, spectral clustering computes cuts that are far from optimal [12]. It is thus\nunderstood that the ratio (\u03c6/\u03bb) affects directly the quality of spectral clustering, a fact that is viewed\nas an inherent limitation.\nThis paper shows that this limitation can be greatly alleviated via spectral modi\ufb01cation: a set of\noperations that approximately preserve the cut structure of the input while \u2018raising\u2019 its spectrum, in\neffect suppressing the ratio (\u03c6/\u03bb) and improving the output.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f2 Spectral Modi\ufb01cation: High-level Overview and Context\n\nThis section collects a number of required notions from spectral graph theory and puts spectral\nmodi\ufb01cation in perspective with important recent discoveries that inspire it. It also describes the\nmotivation for our work and gives a high-level overview that may be useful for the reader before we\ndelve into more technical details.\n\n2.1 Cut and Spectral Similarity\n\nLG(u, v) = \u2212wuv and (ii) LG(u, u) = \u2212(cid:80)\n\nu(cid:54)=v LG(u, v).\n\nLet G = (V, E, w) be a weighted graph. The Laplacian matrix LG of graph G is de\ufb01ned by: (i)\nThe quadratic form of a semi-positive de\ufb01nite matrix A is de\ufb01ned by R(A, x) = xT Ax. For a\nsubset of vertices S \u2286 V , we denote by cutG(S) the total weight of the edges leaving the set S.\nLet G and H be two weighted graphs. We say that the two graphs are \u03c1-cut similar, if there exist\nnumbers \u03b1, \u03b2 with \u03c1 = \u03b1/\u03b2, such that for all S \u2282 V , we have \u03b1\u00b7cutH (S) \u2264 cutG(S) \u2264 \u03b2\u00b7cutH (S).\nWe say that the two graphs are \u03c1-spectral similar, if there exist numbers \u03b1, \u03b2 with \u03c1 = \u03b1/\u03b2 such that\nfor all real vectors x, we have \u03b1 \u00b7 R(LH , x) \u2264 R(LG, x) \u2264 \u03b2 \u00b7 R(LH , x).\nIt is well understood that \u03c1-spectral similarity implies \u03c1-cut similarity, but not vice-versa [26].\n\n2.2 Low-diameter Cut Approximators and Spectral Maximizers\n\nLet G = (V, E) be the path graph on n vertices, and for the sake of simplicity assume that n is\na power of 2. Let T = (V \u222a I, E) be the full binary tree, where V is the set of leaves being in\none-to-one correspondence with the path vertices as illustrated in Figure 1a, and I is the set of internal\nvertices. An interesting feature of T is that it provides a cut-approximator for G, i.e. it contains\ninformation that allows estimating all cuts in G, within a factor of 2. In section 3, we describe how\nthe cut approximator T gives rise to a weighted complete graph H = (V, E, w) on the original set\nof vertices V , via a canonical process of eliminating the internal vertices of T ; \ufb01gure 1b provides a\nglimpse to the edge weights of H. Graph H is O(1)-cut similar with G, but with a very different\neigenvalue distribution, as illustrated in Figure 1c. More speci\ufb01cally, the second eigenvalue \u03bb of the\nnormalized Laplacian of G is \u0398(1/n2), while that of H is \u2126(1/(n log n)), essentially closing the\ngap with the conductance \u03c6 = \u0398(1/n). An alternative way of viewing this is that H has a second\neigenvalue which \u2013up to an O(log n) factor\u2013 is the maximum possible, since the eigenvalue is always\nsmaller than \u03c6. In some sense, the same is true for all eigenvalues of H, which leads us to call H a\nspectral maximizer of G. These properties of H can be proved using only the logarithmic diameter\nof T and the fact that T is a cut-approximator.\nThese observations set the backdrop for the idea of spectral modi\ufb01cation, which aims to modify the\ninput graph G in order to bring it spectrally closer to its maximizer H. It is worth noting that, in some\nsense, spectral modi\ufb01cation is an objective countering that of spectral graph sparsi\ufb01cation, which\naims to spectrally preserve a graph [2].\n\n(a)\n\n(b)\n\n(c)\n\nFigure 1: (a) The path graph G and the cut-approximating binary tree R. The binary tree is depicted\nwith weights that are discussed in section 3. (b) Heatmap of the log-entries of the adjacency matrix\nfor n = 8196 of the spectral maximizer H. It can be seen that H is a dense graph that inherits the\ntri-diagonal path structure, but also has other long-range edges. (c) Ratios of the \ufb01rst 30 normalized\neigenvalues of H and G, for n = 8196. H has signi\ufb01cantly larger eigenvalues.\n\n2\n\n\f2.3 Contributions and Perspective\n\nA key contribution of this paper is the observation that every graph G has a spectral maximizer H.\nThe path-tree example of the previous section is merely an instantiation of our central claim, but a\nvast generalization is possible (with a small loss), using the fact that all graphs have low-diameter cut\napproximators, as shown by R\u00e4cke [23]. Technically, this is captured by a Cheeger-like inequality that\nwe present in Section 3.3. We show that the inequality applies not only for the standard normalized\ncuts problem, but also for generalized cut problems that capture semi-supervised clustering problems.\nThe original result of [23] has undergone several subsequent algorithmic improvements and re-\n\ufb01nements [3, 14, 24, 19]. It is currently possible to compute a cut approximator in nearly-linear 1\ntime [19]; this implies a similar time for the construction of a maximizer. As discussed in the previous\nsection, the approximator is a compact representation of all cuts in a graph, and thus it is likely that its\ncomputation is a waste, when we only want to compute a k-clustering. Indeed, all existing algorithms\nare complicated and far from practical.\nOn the other hand a signi\ufb01cant strength of spectral clustering is its speed, due to the existence of\nprovably fast linear system solvers for graph Laplacians [16, 17]. A theoretical upper bound for the\ncomputation of k eigenvectors is O(km log2 m), where m is the number of edges in the graph; in\npractice, for a graph with millions of edges, one eigenvector can be computed in mere seconds on\nstandard hardware, without even exploiting the ample potential for parallelism.\nThis motivates the second contribution of the paper: a fast algorithm that modi\ufb01es the input graph G\ninto a graph M which is spectrally closer to the maximizer H, and thus more amenable to spectral\nclustering. The emphasis here is in the running time of the modi\ufb01cation algorithm and the size of\nits output. These are kept low in order to not severely impact the speed of spectral clustering. We\npresent the algorithm, and discuss its properties in Section 4.\nFinally, applying spectral clustering on graph M and mapping the output back to G has the potential\nto \u2018discover\u2019 dramatically different and improved cuts. One such case is illustrated in Figure 2, on a\nknown bad case of spectral clustering taken from [12].\n\n(a) 2-way partitioning on G\n\n(b) improved partitioning based on M\n\nFigure 2: Input G is a direct graph product of: (i) the path graph, (ii) a graph consisting of two binary\ntrees with their roots connected [12]. The modi\ufb01cation of G sways the lowest eigenvector away from\nthe cut computed in G. The asymptotic improvement in the value of the cut is O(n1/4).\n\nWe note that different graph modi\ufb01cation ideas have been explored in previous works (e.g. [27, 1, 4]).\nIn particular, in the context of \u2018regularized spectral clustering\u2019, it has been observed that adding a\nsmall copy of the identity matrix or the complete graph onto the input graph G, improves the quality\nof spectral clustering [1, 21]. The improved performance has been partially explained for block-\nstochastic models and stochastic social network graphs [13, 29] . In the latter case, the improvement\nis attributed to the \u2018masking\u2019 of unbalanced sparse cuts in the graph caused by altering their cut\nratio [29]. It is conceivable that the theoretical results of this paper will help shed additional light on\nregularized spectral clustering. It is clear though, that regularized spectral clustering does not yield\nimprovement such as that in Figure 2.\n\n3 Cut Approximators, Spectral Maximizers and Cheeger Inequalities\n\nIn this section we prove our main claim that for every graph G there exists another graph H which is\ncut-similar to G but satis\ufb01es a tight Cheeger inequality. We will \ufb01rst state our claims, and give the\nproofs in subsection 3.4.\n\n1O(m logc m) time, where m is the number of edges in G and c is a fairly large constant.\n\n3\n\n\f3.1 De\ufb01nitions of Graph Objects\n\nDe\ufb01nition 3.1 (Hierarchical Cut Decomposition). A hierarchical cut decomposition for a graph\nG = (V, E, w) is represented as a rooted tree T = (V \u222a I, E(cid:48), w(cid:48)), with the following properties:\n(i) Every vertex u of T identi\ufb01es a set Su \u2286 V .\n(ii) If r is the root of T then Sr = V .\n(iii) If u has children v1, . . . , vt in T . then Svi \u2229 Svj = \u2205 for all (i, j).\n(iv) If u is the parent of v in T then w(cid:48)(u, v) = cutG(v).\nDe\ufb01nition 3.2 (\u03b1-Cut Approximator). We say that a hierarchical decomposition T = (V \u222a I, E(cid:48), w(cid:48))\nfor G is an \u03b1-cut approximator for G if for all S \u2286 V there exists a set IS \u2286 I such that\n\ncutG(S) \u2264 cutT (S \u222a IS) \u2264 \u03b1 \u00b7 cutG(S).\n\nGiven a graph G and an associated cut approximator T we now de\ufb01ne the spectral maximizer for\nG \u2013 the choice of terminology will be justi\ufb01ed subsequently.\nDe\ufb01nition 3.3 (Spectral Maximizer). Let T = (V \u222a I, E(cid:48)) be a cut approximator for a graph\nG = (V, E, w) and let\n\n(cid:18) LI\n\n(cid:19)\n\nLT =\n\nV\nV T D\n\nI V .\n\nordered so that its \ufb01rst |I| rows are indexed by I in an arbitrary order, and its last |V | rows are\nindexed by V in the given order. We de\ufb01ne the graph maximizer H to be the graph with Laplacian\nmatrix LH = D \u2212 V T L\u22121\nThe matrix D \u2212 V T L\u22121\nI V in the above de\ufb01nition is known as the Schur complement with respect\nto the elimination of the vertices/variables in I, and given the fact that LT is Laplacian, it is well\nknown to be a Laplacian matrix (e.g see [9]). Graph theoretically, the elimination of a vertex v from\na graph introduces a weighted clique on the neighbors of v. The elimination of a set of vertices I can\nbe performed as a sequence of vertex eliminations (in an arbitrary order).\nImportant Remark: We use the term \u2018spectral maximizer\u2019 for brevity and simplicity. It should be\nmade clear that the spectral maximizer is not a unique graph, as it depends on T .\n\n3.2 Properties of Spectral Maximizers\nIn order to state our claims we \ufb01x a triple (G,T (\u03b1), H), where G is a graph, T is an associated\n\u03b1-cut approximator and H is the spectral maximizer corresponding to T . We will also denote by\ndiam(T ) the diameter of the tree, i.e. the number of edges on the longest path in T .\nWe \ufb01rst introduce some required notation additional to that from Section 2.1. Let G and H be two\ngraphs on the same vertex set, with the requirement that G is connected. In particular, H may be\nnot connected, or not even using a susbet of V . Then we say that G spectrally dominates H, if for\nall vectors x we have R(LG, x) \u2265 R(LH , x). We denote spectral domination by G (cid:23) H. We also\nwrite \u03b1 \u00b7 G to denote the graph G with its weights multiplied by \u03b1.\nTheorem 3.1 (Spectral Domination of Cut Structure). Given a triple (G,T (\u03b1), H), let \u02dcG be an\narbitrary graph which is \u03c1-cut similar to G. Then, we have diam(T ) \u00b7 \u03c1 \u00b7 H (cid:23) \u02dcG.\nTheorem 3.2 (Cut Similarity of Spectral Maximizer). Given a triple (G,T (\u03b1), H), the maxi-\nmizer H is \u03b1 \u00b7 diam(T )-cut similar with G. In particular, we have cutH (S)/\u03b1 \u2264 cutG(S) \u2264\ndiam(T )cutH (S).\nWe are now ready to discuss the justi\ufb01cation for the term \u2018spectral maximizer\u2019. The reader should\nthink of the parameters diam(T ) and \u03b1 as small, i.e. of size \u02dcO(1)2. Then, Theorem 3.1 shows that\n\u2013up to a \u02dcO(1) factor\u2013 H spectrally dominates every graph that is \u02dcO(1)-cut similar with G. This\ndirectly implies that \u2013up to the same factor\u2013 the ith eigenvalue of LH is greater than that of L \u02dcG, for\nevery graph \u02dcG which is cut-similar to G. Combined with Theorem 3.2, we get that LH has nearly the\nmaximum possible eigenvalues that any graph with similar cuts can have. In the particular case of \u03bb2\n\n2We use the \u02dcO(\u00b7) notation to hide factors logarithmic in n, that we do not attempt to optimize.\n\n4\n\n\fwe show that it is is actually within \u02dcO(1) from the graph conductance. This extends to a generalized\nnotion of conductance with algorithmic implications for supervised clustering; we discuss this in\nthe supplementary material.\n\n3.3 Cheeger Inequalities for Spectral Maximizers\n\nDe\ufb01nition 3.4 (Generalized Conductance). Let A and B be two graphs on the same set of vertices\nV . We de\ufb01ne the generalized conductance \u03c6(A, B) of the pair as: \u03c6(A, B) = minS\u2286V\nDe\ufb01nition 3.5 (Second Generalized Eigenvalue). The smallest generalized eigenvalue of a pair of\ngraphs (A, B) is given by \u03bb2(A, B) = minx\n\ncutA(S)\ncutB (S) .\n\nxT LAx\nxT LB x .\n\nThe generalized de\ufb01nition encompasses the standard conductance of a graph. Concretely, let K be the\ncomplete weighted graph, where the weight of edge (u, v) is set to be wK(u, v) = volA(u)volA(v),\ni.e. the product of the degrees of u and v in A. Also, let \u03bb2 denote the second eigenvalue of the\nnormalized Laplacian of A, i.e. \u02c6L = D\u22121/2LAD\u22121/2, where D is the diagonal matrix of the vertex\ndegrees in A. Then, it is easy to show that:\n\n\u03c6(A, K) = \u03c6(A) = minS\u2286V\n\ncutA(S)\n\nvolK (S)volK (V \u2212S) and \u03bb2(A, K) = \u03bb2.\n\nThe Cheeger inequality [6] states that \u03bb2 \u2265 \u03c62/2. A Cheeger inequality is also known for the\ngeneralized conductance [7]: \u03bb2(A, B) \u2265 \u03c6(A, B)\u03c6(A)/8.\nWe prove the following Theorem.\nTheorem 3.3 (Extended Cheeger Inequality for Cut Structure).\nFor any graph G, there exists a graph H such that (i) H is \u02dcO(1)-cut similar with G, and (ii) H\nsatis\ufb01es the following inequality for all graphs B:\n\n\u03bb2(H, B) \u2264 \u03c6(H, B) \u2264 \u02dcO(1)\u03bb2(H, B).\n\nA consequence of Theorem 3.3 is that the actual performance of spectral clustering on a given graph G\nultimately depends on its \u2018spectral distance\u2019 from its maximizer H. This is captured in the following\nCorollary.\nCorollary 3.1 (Actual Cheeger Inequality).\nLet G be a graph and H be the graph whose existence is guaranteed by Theorem 3.3. Further,\nsuppose that G and H are \u03b4-spectral similar. Then, for all graphs B, G satis\ufb01es the following\ninequality: \u03bb2(G, B) \u2264 \u03c6(G, B) \u2264 \u02dcO(\u03b4)\u03bb2(G, B).\n\n3.4 Proofs\n\nIn this section we simplify the notation and sometimes use G to mean both a graph and its corre-\nsponding Laplacian LG.\nLemma 3.1. (Edge-Path Support [5]) Let P be an unweighted path graph on k vertices, with\nendpoints u1, uk. Also let Eu1uk be the graph consisting only of the edge (u1, uk). Then we have\nkP (cid:23) Eu1uk.\nLemma 3.2 (Quadratic form of Schur complement). Let H and T be the graphs matrices appearing\nin De\ufb01nition 3.3. We have\n\nR(H, x) = min\ny\u2208R|I|\n\nR(T ,\n\n(cid:18) y\n\nx\n\n(cid:19)\n\n).\n\nWe \ufb01nally need the following (adjusted) Lemma from [23, 3]:\nLemma 3.3. Every graph G has an \u02dcO(1) cut-approximator R. The diameter of T is O(log n), where\nn is the number of vertices in G.\n\nWe are now ready to proceed with the proofs.\nProof. (of Theorem 3.1) We \ufb01rst show the intermediate claim diam(T ) \u00b7 T (cid:23) G. The technique\nuses elements from support theory [5]. Let Euv be an arbitrary edge of G of weight wuv. Let Puv be\n\n5\n\n\fWe observe that, by construction of R, we have T =(cid:80)\n\nthe unique path between u and v in R; notice that by de\ufb01nition the path has length at most diam(T ).\n(u,v)\u2208G wuvPuv. Let y, x be arbitrary vectors\n\nof appropriate dimensions, and z = [y, x]T . We have\n\n(cid:80)\n(cid:80)\n(u,v)\u2208G wuvR(Puv, z)\n(u,v)\u2208G wuvR(Euv, z)\n\nR(T , z)\nR(G, z)\n\n=\n\n\u2265 min\n(u,v)\u2208G\n\nR(Puv, z)\nR(Euv, z)\n\n\u2265 1/diam(T ).\n\nrepeated if we replace T with T (cid:48) =(cid:80)\n\nThe \ufb01rst inequality is standard for a ratio of sums of positive numbers, and the second inequality is\nan application of lemma 3.1. This proves the intermediate claim. Notice now that since the claim\nholds for all vectors z = [y, x]T for arbitrary y, it also holds for vectors where y is de\ufb01ned as in\nLemma 3.2. That implies T (H, x) \u2265 T (G, x)/diam(T ), i.e. diam(T ) \u00b7 H (cid:23) G.\nTo prove the claim for a G(cid:48) which is \u03c1-cut similar to G, we observe that the above proof can be\nuvPuv. Thus we get diam(T )\u00b7T (cid:48) (cid:23) G(cid:48) (A). Notice\nthat T (cid:48) keeps the same edges of T but with different weights. Observe now that if v is a vertex\nin T (cid:48) then the edge to its parent has weight equal to cutG(cid:48)(Sv), where Sv is the set identi\ufb01ed by\nv according to the de\ufb01nition of the cut approximator. However by the cut similarity of G and G(cid:48)\nwe know that cutG(cid:48)(Sv) \u2265 cutGSv/\u03c1. It follows that the edges of T (cid:48) have weight at most \u03c1 times\nsmaller than their weights in T , which directly implies that T (cid:22) \u03c1T (cid:48). Substituting into inequality\n(A) above, we get that \u03c1 \u00b7 diam(T ) \u00b7 T (cid:23) G(cid:48). Then applying lemma 3.2 one more time gives the\nclaim.\n\n(u,v)\u2208G w(cid:48)\n\nProof. (of Theorem 3.2) The proof is a relatively easy consequence of lemma 3.2 and de\ufb01nition 3.2.\nWe include it in the supplementary material.\n\nProof. (of Theorem 3.3) Let (G,T (\u03b1), H) be the given triple. Also, let B = (V, E, w) be an\narbitrary graph. The \ufb01rst part of the inequality is trivial. Let x be the eigenvector corresponding\nto the smallest non-zero eigenvalue of the generalized problem LH x = \u03bbLBx. Using the standard\nCourant-Fischer characterization of eigenvalues, we have\n\n\u03bb2(H, B) =\n\nR(LH , x)\nR(LB, x)\n\n=\n\nR(LT , z)\nR(LB, x)\n\n,\n\n(1)\n\nwhere z is the extension of x described in lemma 3.2. For an edge Euv, let Puv denote the (unique)\npath connecting u and v in T . Using lemma 3.1, we get:\n\nR(LB, x) =\n\nwuvR(LPuv , z) =\n\nR(wuvLPuv , z)\n\n(u,v)\u2208B\n\n(u,v)\u2208B\n\nNote that we now get the quadratic form of the graph T (cid:48) =(cid:80)\n\n(u,v)\u2208B wuvPuv. Because T (cid:48) is a sum\nof paths on T , it has the same edges with T . Denote by wT (q, q(cid:48)) the weight of the edge (q, q(cid:48)) on\n(cid:80)\nT , where q(cid:48) is the parent of q. Continuing then on inequality 1, we get\n(cid:80)\n(q,q(cid:48))\u2208T wT (q, q(cid:48))(zq \u2212 zq(cid:48))2\n(q,q(cid:48))\u2208T wT (cid:48)(q, q(cid:48))(zq \u2212 zq(cid:48))2 \u2265 min\n\n\u03bb2(H, B) \u2265 R(LT , z)\nR(LT (cid:48), z)\n\nwT (q, q(cid:48))\nwT (cid:48)(q, q(cid:48))\n\n(u,v)\u2208B\n\nq\u2208T\n\n(2)\n\n=\n\nIf Sq \u2286 V is the set identi\ufb01ed by q, we have\n\nwT (q, q) = cutG(Sq) \u2265 cutH (Sq)/\u03b1,\n\nwhere the inequality comes from Theorem 3.2. Observe now that (q, q(cid:48)) appears on T (cid:48) exactly on the\npaths Puv such u \u2208 Sq and v \u2208 S(cid:48)\nq. It follows that the edge (q, q(cid:48)) receives in T (cid:48) a total weight equal\nto the total weight of the edges leaving Sq on B, i.e. wT (cid:48)(q, q(cid:48)) = cutB(Sq). Further continuing on\ninequality 2, we get that\n\u03bb2(H, B) \u2265 min\nq\u2208T\n\n= \u03c6(H, B)/\u03b1.\n\ncutH (Sq)\n\u03b1 \u00b7 cutB(Sq)\n\nwT (q, q(cid:48))\nwT (cid:48)(q, q(cid:48))\n\ncutH (S)\n\u03b1 \u00b7 cutB(S)\n\n\u2265 min\n\n\u2265 min\n\nS\n\nq\n\nThe Theorem then follows by invoking lemma 3.3 and Theorem 3.2.\n\n6\n\n(cid:88)\n\nwuv(xu \u2212 xv)2 \u2264 (cid:88)\n\n(cid:88)\n\n\f4 A Spectral Modi\ufb01cation Algorithm\n\nThe goal of spectral modi\ufb01cation is to construct a modi\ufb01er M of the input graph G = (V, E, w),\nwhich is spectrally similar to the maximizer described in Section 3. Then Corollary 3.1 shows that\nimproved Cheeger inequalities also hold for M, up to the spectral similarity factor. Echoing the\nconstruction of the maximizer in Section 3, we will construct a graph M on a set of vertices V \u222a Vadd,\nwhere Vadd is a set of additional vertices. The modi\ufb01er M is then de\ufb01ned as the Schur complement of\nM with respect to the elimination of the nodes in Vadd. We solve the generalized eigenvalue problem\nLM x = \u03bbDx, where D is the diagonal of LG. The modi\ufb01er M is a dense graph, but we effectively\nuse only M. We accomplish that using standard techniques that we discuss in the supplementary \ufb01le.\nCut Approximators for Trees. Towards designing a modi\ufb01cation algorithm, we observe that\ncomputing a low-diameter cut approximator of a tree T is can be carried out with a recursive top-\ndown analysis of the cut structure of T , in O(n log n) time, essentially following the algorithm\nin [23]; key to the algorithm is a linear time algorithm for computing the sparsest cut on a tree. A\nlow-diameter cut approximator for a tree can also be constructed in a bottom-up fashion in O(n)\ntime, using the decompositions from [15]. Our code implements the linear time algorithm.\nWe consider the following general framework for spectral modi\ufb01cation. Given a graph G = (V, E, w):\n(a) Compute a set of weighted trees T1, . . . , Tk on vertex set V . [tree decomposition step]\n(b) Compute a cut approximator Mj for each tree Tj.\n(c) Form the graph M = \u03b1G + M1 + . . .Mk.\nThe cut approximators Tj in step (b) share the same set of leaves V , but each Tj has its own set of\nadditional internal vertices Vaddj . Thus, the weighted graphs in the sum of step (c) have mutually\ndisjoint edge sets, and the sum simply denotes the union of all these edges. The vertex set of M is\n\nV \u222a Vadd, where Vadd =(cid:83)\n\nj Vaddj .\n\nTree Decomposition Step. In this step we aim to process the input graph G in order to compute a\nset of trees, such that the sum of their maximizers is spectrally close to the maximizer of G. There\nexist several potential ways to perform that. We now give an algebra-based heuristic algorithm that\nwe have implemented and used in our experiments.\n1: procedure ENERGY_TD(G, k)\nz \u2190 approximate second eigenvector of LGx = \u03bbDx\n2:\nG(cid:48) \u2190 (V, E, w(cid:48)), where w(cid:48)\n3:\nfor j = 1 : k do\n4:\n5:\n6:\n7:\n8:\n9:\n\nRj = (V, Ej, w(cid:48)) \u2190 maximum weight spanning tree of G(cid:48)\nTj \u2190 (V, Ej, w)\nFor each e \u2208 Ej, let w(cid:48)\n\n(cid:46) Tree with edge set Ej with weights from G\n(cid:46) Update weights in G(cid:48)\n\nend for\nreturn {T1, . . . , Tk}\n\n(cid:46) D is the diagonal of LG\n\nuv = wuv(zu \u2212 zv)2\n\ne = w(cid:48)\n\ne/df\n\nENERGY_TD is based on the following reasoning. Assuming that the graph G is spectrally away\nfrom its maximizer H, we expect the second eigenvector z to be \u201cbad\u201d in the sense that the associated\nRayleigh quotient R(G, z)/zT Dz is signi\ufb01cantly lower than it would have been for the maximizer\nH. Steps 4-7 \ufb01nd k trees in G that yield most of the \u2018energy\u2019 R(G, z). Adding the maximizers\nof these trees attempts to directly \u2018push\u2019 the Rayleigh quotient for z higher in the spectrum of the\nmodi\ufb01ed graph M. At the same time, because the trees Tj are subtrees of G, and their maximizers\nhave similar cuts, the modi\ufb01er M has cuts similar to those in G. We further discuss some properties\nof ENERGY_TD and its running time, in the supplementary \ufb01le.\n\n4.1 Implementation and Experiments\n\nWe provide a MATLAB implentation. We plan to provide a Python implementation in the near future.\nThe submitted code and all future updates can be found in: https://github.com/ikoutis/spectral-modi\ufb01cation\nRemark on Baseline Spectral Clustering: We use the baseline spectral clustering implementation\nfrom [7]. We solve the eigenvalue problem LGx = \u03bbDx, which yields the standard embedding.\nA differentiation is that we further process the embedding by projecting the points onto the unit\nhypersphere, as analyzed in [18]. This actually yields a signi\ufb01cant improvement of the baseline.\n\n7\n\n\fParameter Settings: For all our experiments we set k = 3, df = 1/2, and \u03b1 = 1 in ENERGY_TD.\nSynthetic Datasets. The synthetic example described in Figure 2 highlights the potential of spectral\nmodi\ufb01cation to induce the computation of asymptotically better cuts in graphs with \u2018elongated\u2019\nfeatures, or high diameter. The output has been computationally veri\ufb01ed for a range of values for n\n(up to millions). In the supplementary \ufb01le we also describe a synthetic example of a weighted where\nspectral modi\ufb01cation yields a cut smaller by a \u0398(1/n) factor. In Figure 3, we also give a synthetic\nexample taken from [7], where spectral modi\ufb01cation clearly outperforms even a supervised method.\n\n(a) Standard spectral\n(c) Modi\ufb01ed spectral\nFigure 3: (a) The \u20184-moons\u2019 example from [7]. (A)RI is the Adjusted Rand Index.\n\n(b) Supervised spectral [7]\n\nSocial Networks. We performed experiments with four graphs (BlogCatalog, PPI, Wikipedia,\nFlickr) used as a benchmark in the recent literature [20, 22]. We compare against NetMF [22] as it\nhas previously reported an improvement over DeepWalk [20] and other competing methods. The\nevaluation methodology is identical to that in [22]. The second normalized eigenvalue \u03bb of these\ngraphs are quite high (0.43,0.49, 0.20, 0.06 respectively) and so there is little room for improvement.\nNevertheless we observe improvements in the standard Micro-F1 scores. We cannot however attribute\nthem directly to our theory, as it is not sensitive to \u02dcO(1) factors. The dimension of the embedding\nis equal to the number of clusters, except for the Flickr data set which is set to 128 because NetMF\nmethod is too expensive to be run on dimension 195 (# clusters). We also wish to highlight the fact\nthat the implemented version of baseline spectral clustering performs much better than standard\nversion. A more detailed discussion can be found in the supplementary \ufb01le.\n\n4\n\nFigure 4: Micro-F1 scores in 10x cross-validation using LIBLINEAR [10].\n\nConclusion. The performance of spectral clustering depends crucially on spectral properties of\nits input graph, which most often force it to output clusters of poor approximation quality. This has\nbeen viewed as an inherent limitation of spectral clustering. We show however that for any input\ngraph, there exists a \u2018maximizer\u2019 graph with similar cuts, but with an eigenvalue distribution which\nis favorable for spectral clustering. We propose a spectral modi\ufb01cation algorithm that attempts to\nexploit this fact via fast operations that improve the eigenvalue distribution of the input without\nchanging its cut structure. The implemented spectral modi\ufb01cation algorithm is heuristic and subject\nto various improvements. Nevertheless, it yields demonstrable asymptotic improvements in a number\nof adversarial instances. In future work we will explore the performance of spectral modi\ufb01cation on\nlarger and more diverse sets of instances, and the implementation of modi\ufb01cation algorithms with\ntheoretical guarantees\nAcknowledgements. This work has been partially supported by grants CCF-1149048, CCF-1813374.\n\n8\n\n\fReferences\n[1] Arash A. Amini, Aiyou Chen, Peter J. Bickel, and Elizaveta Levina. Pseudo-likelihood methods for\ncommunity detection in large sparse networks. Ann. Statist., 41(4):2097\u20132122, 08 2013. doi: 10.1214/\n13-AOS1138. URL https://doi.org/10.1214/13-AOS1138.\n\n[2] Joshua Batson, Daniel A. Spielman, Nikhil Srivastava, and Shang-Hua Teng. Spectral sparsi\ufb01cation of\ngraphs: Theory and algorithms. Commun. ACM, 56(8):87\u201394, August 2013. ISSN 0001-0782. doi:\n10.1145/2492007.2492029. URL http://doi.acm.org/10.1145/2492007.2492029.\n\n[3] Marcin Bienkowski, Miroslaw Korzeniowski, and Harald R\u00e4cke. A practical algorithm for constructing\noblivious routing schemes. In Proceedings of the Fifteenth Annual ACM Symposium on Parallel Algorithms\nand Architectures, SPAA \u201903, pages 24\u201333, New York, NY, USA, 2003. ACM. ISBN 1-58113-661-7. doi:\n10.1145/777412.777418. URL http://doi.acm.org/10.1145/777412.777418.\n\n[4] Aleksandar Bojchevski, Yves Matkovic, and Stephan G\u00fcnnemann. Robust spectral clustering for noisy\ndata: Modeling sparse corruptions improves latent embeddings. In Proceedings of the 23rd ACM SIGKDD\nInternational Conference on Knowledge Discovery and Data Mining, KDD \u201917, page 737\u2013746, New York,\nNY, USA, 2017. Association for Computing Machinery. ISBN 9781450348874. doi: 10.1145/3097983.\n3098156. URL https://doi.org/10.1145/3097983.3098156.\n\n[5] Erik G. Boman and Bruce Hendrickson. Support theory for preconditioning. SIAM J. Matrix Anal. Appl.,\n\n25(3):694\u2013717, 2003. ISSN 0895-4798.\n\n[6] F.R.K. Chung. Spectral Graph Theory, volume 92 of Regional Conference Series in Mathematics. American\n\nMathematical Society, 1997.\n\n[7] Mihai Cucuringu, Ioannis Koutis, Sanjay Chawla, Gary Miller, and Richard Peng. Simple and scalable\nconstrained clustering: a generalized spectral method. In Arthur Gretton and Christian C. Robert, editors,\nProceedings of the 19th International Conference on Arti\ufb01cial Intelligence and Statistics, volume 51 of\nProceedings of Machine Learning Research, pages 445\u2013454, Cadiz, Spain, 09\u201311 May 2016. PMLR. URL\nhttp://proceedings.mlr.press/v51/cucuringu16.html.\n\n[8] W.E. Donath and A.J. Hoffman. Algorithms for partitioning graphs and computer logic based on eigenvec-\n\ntors of connection matrices. IBM Technical Disclosure Bulletin, 15(3):938\u2013944, 1972.\n\n[9] David Durfee, Rasmus Kyng, John Peebles, Anup B. Rao, and Sushant Sachdeva. Sampling random\nIn Proceedings of the 49th Annual ACM SIGACT\nspanning trees faster than matrix multiplication.\nSymposium on Theory of Computing, STOC 2017, pages 730\u2013742, New York, NY, USA, 2017. ACM.\nISBN 978-1-4503-4528-6. doi: 10.1145/3055399.3055499. URL http://doi.acm.org/10.1145/\n3055399.3055499.\n\n[10] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIBLINEAR: A\nLibrary for Large Linear Classi\ufb01cation. Technical report, 2008. URL https://www.csie.ntu.edu.tw/\n~cjlin/papers/liblinear.pdf.\n\n[11] Aditya Grover and Jure Leskovec. Node2vec: Scalable feature learning for networks. In Proceedings of the\n22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD \u201916, pages\n855\u2013864, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4232-2. doi: 10.1145/2939672.2939754.\nURL http://doi.acm.org/10.1145/2939672.2939754.\n\n[12] Stephen Guattery and Gary L. Miller. On the quality of spectral separators. SIAM J. Matrix Anal.\nISSN 0895-4798. doi: 10.1137/S0895479896312262. URL http:\n\nAppl., 19(3):701\u2013719, jul 1998.\n//dx.doi.org/10.1137/S0895479896312262.\n\n[13] Antony Joseph and Bin Yu. Impact of regularization on spectral clustering. Ann. Statist., 44(4):1765\u20131791,\n\n08 2016. doi: 10.1214/16-AOS1447. URL https://doi.org/10.1214/16-AOS1447.\n\n[14] Jonathan A. Kelner, Yin Tat Lee, Lorenzo Orecchia, and Aaron Sidford. An almost-linear-time algorithm\nfor approximate max \ufb02ow in undirected graphs, and its multicommodity generalizations. In Proceedings\nof the Twenty-\ufb01fth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA \u201914, pages 217\u2013226,\nPhiladelphia, PA, USA, 2014. Society for Industrial and Applied Mathematics. ISBN 978-1-611973-38-9.\nURL http://dl.acm.org/citation.cfm?id=2634074.2634090.\n\n[15] Ioannis Koutis and Gary L. Miller. Graph partitioning into isolated, high conductance clusters: Theory,\ncomputation and applications to preconditioning. In Symposiun on Parallel Algorithms and Architectures\n(SPAA), 2008.\n\n9\n\n\f[16] Ioannis Koutis, Gary L. Miller, and Richard Peng. A nearly-m log n time solver for sdd linear systems. In\nProceedings of the 2011 IEEE 52Nd Annual Symposium on Foundations of Computer Science, FOCS \u201911,\npages 590\u2013598, Washington, DC, USA, 2011. IEEE Computer Society. ISBN 978-0-7695-4571-4. doi:\n10.1109/FOCS.2011.85. URL http://dx.doi.org/10.1109/FOCS.2011.85.\n\n[17] Ioannis Koutis, Gary L. Miller, and Richard Peng. A fast solver for a class of linear systems. Commun.\nACM, 55(10):99\u2013107, October 2012. ISSN 0001-0782. doi: 10.1145/2347736.2347759. URL http:\n//doi.acm.org/10.1145/2347736.2347759.\n\n[18] James R. Lee, Shayan Oveis Gharan, and Luca Trevisan. Multiway spectral partitioning and higher-order\ncheeger inequalities. J. ACM, 61(6):37:1\u201337:30, December 2014. ISSN 0004-5411. doi: 10.1145/2665063.\nURL http://doi.acm.org/10.1145/2665063.\n\n[19] Richard Peng. Approximate undirected maximum \ufb02ows in o(mpolylog(n)) time. In Proceedings of the\nTwenty-seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA \u201916, pages 1862\u20131867,\nPhiladelphia, PA, USA, 2016. Society for Industrial and Applied Mathematics. ISBN 978-1-611974-33-1.\nURL http://dl.acm.org/citation.cfm?id=2884435.2884565.\n\n[20] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations.\nIn Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data\nMining, KDD \u201914, pages 701\u2013710, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2956-9. doi:\n10.1145/2623330.2623732. URL http://doi.acm.org/10.1145/2623330.2623732.\n\n[21] Tai Qin and Karl Rohe. Regularized spectral clustering under the degree-corrected stochastic blockmodel.\nIn Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2,\nNIPS\u201913, pages 3120\u20133128, USA, 2013. Curran Associates Inc. URL http://dl.acm.org/citation.\ncfm?id=2999792.2999960.\n\n[22] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network embedding as\nmatrix factorization: Unifying deepwalk, line, pte, and node2vec. In Proceedings of the Eleventh ACM\nInternational Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA,\nFebruary 5-9, 2018, pages 459\u2013467, 2018. doi: 10.1145/3159652.3159706. URL https://doi.org/10.\n1145/3159652.3159706.\n\n[23] Harald R\u00e4cke. Minimizing congestion in general networks. In Proceedings of the 43rd Symposium on\n\nFoundations of Computer Science, pages 43\u201352. IEEE, 2002.\n\n[24] Harald R\u00e4cke, Chintan Shah, and Hanjo T\u00e4ubig. Computing cut-based hierarchical decompositions\nin almost linear time. In Proceedings of the Twenty-\ufb01fth Annual ACM-SIAM Symposium on Discrete\nAlgorithms, SODA \u201914, pages 227\u2013238, Philadelphia, PA, USA, 2014. Society for Industrial and Applied\nMathematics.\nISBN 978-1-611973-38-9. URL http://dl.acm.org/citation.cfm?id=2634074.\n2634091.\n\n[25] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal.\nMach. Intell., 22(8):888\u2013905, August 2000. ISSN 0162-8828. doi: 10.1109/34.868688. URL https:\n//doi.org/10.1109/34.868688.\n\n[26] D. Spielman and S. Teng. Spectral sparsi\ufb01cation of graphs. SIAM Journal on Computing, 40(4):981\u20131025,\n\n2011. doi: 10.1137/08074489X. URL https://doi.org/10.1137/08074489X.\n\n[27] David A. Tolliver and Gary L. Miller. Graph partitioning by spectral rounding: Applications in image\nsegmentation and clustering. In Proceedings of the 2006 IEEE Computer Society Conference on Computer\nVision and Pattern Recognition - Volume 1, CVPR \u201906, page 1053\u20131060, USA, 2006. IEEE Computer\nSociety. ISBN 0769525970. doi: 10.1109/CVPR.2006.129. URL https://doi.org/10.1109/CVPR.\n2006.129.\n\n[28] Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering analysis. In\nProceedings of the 33rd International Conference on International Conference on Machine Learning -\nVolume 48, ICML\u201916, pages 478\u2013487. JMLR.org, 2016. URL http://dl.acm.org/citation.cfm?\nid=3045390.3045442.\n\n[29] Yilin Zhang and Karl Rohe.\n\nconductance.\nand R. Garnett,\n10631\u201310640. Curran Associates,\n8262-understanding-regularized-spectral-clustering-via-graph-conductance.pdf.\n\nUnderstanding regularized spectral clustering via graph\nIn S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi,\nInformation Processing Systems 31, pages\nURL http://papers.nips.cc/paper/\n\neditors, Advances\n\nin Neural\n2018.\n\nInc.,\n\n10\n\n\f", "award": [], "sourceid": 2695, "authors": [{"given_name": "Ioannis", "family_name": "Koutis", "institution": "New Jersey Institute of Technology"}, {"given_name": "Huong", "family_name": "Le", "institution": "NJIT"}]}