{"title": "Clustering Signed Networks with the Geometric Mean of Laplacians", "book": "Advances in Neural Information Processing Systems", "page_first": 4421, "page_last": 4429, "abstract": "Signed networks allow to model positive and negative relationships. We analyze existing extensions of spectral clustering to signed networks. It turns out that existing approaches do not recover the ground truth clustering in several situations where either the positive or the negative network structures contain no noise. Our analysis shows that these problems arise as existing approaches take some form of arithmetic mean of the Laplacians of the positive and negative part. As a solution we propose to use the geometric mean of the Laplacians of positive and negative part and show that it outperforms the existing approaches. While the geometric mean of matrices is computationally expensive, we show that eigenvectors of the geometric mean can be computed efficiently, leading to a numerical scheme for sparse matrices which is of independent interest.", "full_text": "Clustering Signed Networks with the\n\nGeometric Mean of Laplacians\n\nPedro Mercado1, Francesco Tudisco2 and Matthias Hein1\n\n1Saarland University, Saarbr\u00fccken, Germany\n\n2University of Padua, Padua, Italy\n\nAbstract\n\nSigned networks allow to model positive and negative relationships. We analyze\nexisting extensions of spectral clustering to signed networks. It turns out that\nexisting approaches do not recover the ground truth clustering in several situations\nwhere either the positive or the negative network structures contain no noise. Our\nanalysis shows that these problems arise as existing approaches take some form of\narithmetic mean of the Laplacians of the positive and negative part. As a solution\nwe propose to use the geometric mean of the Laplacians of positive and negative\npart and show that it outperforms the existing approaches. While the geometric\nmean of matrices is computationally expensive, we show that eigenvectors of the\ngeometric mean can be computed ef\ufb01ciently, leading to a numerical scheme for\nsparse matrices which is of independent interest.\n\n1\n\nIntroduction\n\nA signed graph is a graph with positive and negative edge weights. Typically positive edges model\nattractive relationships between objects such as similarity or friendship and negative edges model\nrepelling relationships such as dissimilarity or enmity. The concept of balanced signed networks\ncan be traced back to [10, 3]. Later, in [5], a signed graph is de\ufb01ned as k-balanced if there exists\na partition into k groups where only positive edges are within the groups and negative edges are\nbetween the groups. Several approaches to \ufb01nd communities in signed graphs have been proposed\n(see [23] for an overview). In this paper we focus on extensions of spectral clustering to signed\ngraphs. Spectral clustering is a well established method for unsigned graphs which, based on the\n\ufb01rst eigenvectors of the graph Laplacian, embeds nodes of the graphs in Rk and then uses k-means\nto \ufb01nd the partition. In [16] the idea is transferred to signed graphs. They de\ufb01ne the signed ratio\nand normalized cut functions and show that the spectrum of suitable signed graph Laplacians yield a\nrelaxation of those objectives. In [4] other objective functions for signed graphs are introduced. They\nshow that a relaxation of their objectives is equivalent to weighted kernel k-means by choosing an\nappropriate kernel. While they have a scalable method for clustering, they report that they can not\n\ufb01nd any cluster structure in real world signed networks.\nWe show that the existing extensions of the graph Laplacian to signed graphs used for spectral\nclustering have severe de\ufb01ciencies. Our analysis of the stochastic block model for signed graphs\nshows that, even for the perfectly balanced case, recovery of the ground-truth clusters is not guaranteed.\nThe reason is that the eigenvectors encoding the cluster structure do not necessarily correspond to\nthe smallest eigenvalues, thus leading to a noisy embedding of the data points and in turn failure\nof k-means to recover the cluster structure. The implicit mathematical reason is that all existing\nextensions of the graph Laplacian are based on some form of arithmetic mean of operators of the\npositive and negative graphs. In this paper we suggest as a solution to use the geometric mean of\nthe Laplacians of positive and negative part. In particular, we show that in the stochastic block\nmodel the geometric mean Laplacian allows in expectation to recover the ground-truth clusters in\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fany reasonable clustering setting. A main challenge for our approach is that the geometric mean\nLaplacian is computationally expensive and does not scale to large sparse networks. Thus a main\ncontribution of this paper is showing that the \ufb01rst few eigenvectors of the geometric mean can still be\ncomputed ef\ufb01ciently. Our algorithm is based on the inverse power method and the extended Krylov\nsubspace technique introduced by [8] and allows to compute eigenvectors of the geometric mean\nA#B of two matrices A, B without ever computing A#B itself.\nIn Section 2 we discuss existing work on Laplacians on signed graphs. In Section 3 we discuss the\ngeometric mean of two matrices and introduce the geometric mean Laplacian which is the basis of our\nspectral clustering method for signed graphs. In Section 4 we analyze our and existing approaches for\nthe stochastic block model. In Section 5 we introduce our ef\ufb01cient algorithm to compute eigenvectors\nof the geometric mean of two matrices, and \ufb01nally in Section 6 we discuss performance of our\napproach on real world graphs. Proofs have been moved to the supplementary material.\n\n2 Signed graph clustering\n\nNetworks encoding positive and negative relations among the nodes can be represented by weighted\nsigned graphs. Consider two symmetric non-negative weight matrices W + and W \u2212, a vertex set\nV = {v1, . . . , vn}, and let G+ = (V, W +) and G\u2212 = (V, W \u2212) be the induced graphs. A signed\ngraph is the pair G\u00b1 = (G+, G\u2212) where G+ and G\u2212 encode positive and the negative relations,\nrespectively.\nThe concept of community in signed networks is typically related to the theory of social balance.\nThis theory, as presented in [10, 3], is based on the analysis of affective ties, where positive ties are a\nsource of balance whereas negative ties are considered as a source of imbalance in social groups.\nDe\ufb01nition 1 ([5], k-balance). A signed graph is k-balanced if the set of vertices can be partitioned\ninto k sets such that within the subsets there are only positive edges, and between them only negative.\nThe presence of k-balance in G\u00b1 implies the presence of k groups of nodes being both assortative\nin G+ and dissassortative in G\u2212. However this situation is fairly rare in real world networks and\nexpecting communities in signed networks to be a perfectly balanced set of nodes is unrealistic.\nIn the next section we will show that Laplacians inspired by De\ufb01nition 1 are based on some form of\narithmetic mean of Laplacians. As an alternative we propose the geometric mean of Laplacians and\nshow that it is able to recover communities when either G+ is assortative, or G\u2212 is disassortative, or\nboth. Results of this paper will make clear that the use of the geometric mean of Laplacians allows to\nrecognize communities where previous approaches fail.\n\n2.1 Laplacians on Unsigned Graphs\n\nSpectral clustering of undirected, unsigned graphs using the Laplacian matrix is a well established\ntechnique (see [19] for an overview). Given an unsigned graph G = (V, W ), the Laplacian and its\nnormalized version are de\ufb01ned as\n\nL = D \u2212 W\n\nLsym = D\u22121/2LD\u22121/2\n\n(1)\n\nwhere Dii =(cid:80)n\n\nj=1 wij is the diagonal matrix of the degrees of G. Both Laplacians are positive\nsemide\ufb01nite, and the multiplicity k of the eigenvalue 0 is equal to the number of connected compo-\nnents in the graph. Further, the Laplacian is suitable in assortative cases [19], i.e. for the identi\ufb01cation\nof clusters under the assumption that the amount of edges inside clusters has to be larger than the\namount of edges between them.\nFor disassortative cases, i.e. for the identi\ufb01cation of clusters where the amount of edges has to be\nlarger between clusters than inside clusters, the signless Laplacian is a better choice [18]. Given the\nunsigned graph G = (V, W ), the signless Laplacian and its normalized version are de\ufb01ned as\n\nQ = D + W,\n\nQsym = D\u22121/2QD\u22121/2\n\n(2)\n\nBoth Laplacians are positive semi-de\ufb01nite, and the smallest eigenvalue is zero if and only if the graph\nhas a bipartite component [6].\n\n2\n\n\f2.2 Laplacians on Signed Graphs\n\nand \u00afDii =(cid:80)n\n\nii =(cid:80)n\n\nRecently a number of Laplacian operators for signed networks have been introduced. Consider the\nsigned graph G\u00b1 = (G+, G\u2212). Let D+\nij be the diagonal matrix of the degrees of G+\n\nj=1 w+\n\nj=1 w+\n\nij + w\u2212\n\nij the one of the overall degrees in G\u00b1.\n\nThe following Laplacians for signed networks have been considered so far\n\nLBR = D+ \u2212 W ++W \u2212, LBN = \u00afD\u22121LBR,\nLSR = \u00afD \u2212 W ++W \u2212, LSN = \u00afD\u22121/2LSR \u00afD\u22121/2,\n\n(3)\nand spectral clustering algorithms have been proposed for G\u00b1, based on these Laplacians [16, 4].\nLet L+ and Q\u2212 be the Laplacian and the signless Laplacian matrices of the graphs G+ and G\u2212,\nrespectively. We note that the matrix LSR blends the informations from G+ and G\u2212 into (twice) the\narithmetic mean of L+ and Q\u2212, namely the following identity holds\n\n(balance ratio/normalized Laplacian)\n\n(signed ratio/normalized Laplacian)\n\n(4)\nThus, as an alternative to the normalization de\ufb01ning LSN from LSR, it is natural to consider the\narithmetic mean of the normalized Laplacians LAM = L+\nsym. In the next section we\nintroduce the geometric mean of L+\nsym and propose a new clustering algorithm for signed\ngraphs based on that matrix. The analysis and experiments of next sections will show that blending\nthe information from the positive and negative graphs trough the geometric mean overcomes the\nde\ufb01ciencies showed by the arithmetic mean based operators.\n\nsym and Q\u2212\n\nsym + Q\u2212\n\nLSR = L+ + Q\u2212 .\n\n3 Geometric mean of Laplacians\n\nWe de\ufb01ne here the geometric mean of matrices and introduce the geometric mean of normalized\nLaplacians for clustering signed networks. Let A1/2 be the unique positive de\ufb01nite solution of the\nmatrix equation X 2 = A, where A is positive de\ufb01nite.\nDe\ufb01nition 2. Let A, B be positive de\ufb01nite matrices. The geometric mean of A and B is the positive\nde\ufb01nite matrix A#B de\ufb01ned by A#B = A1/2(A\u22121/2BA\u22121/2)1/2A1/2.\nOne can prove that A#B = B#A (see [1] for details). Further, there are several useful ways to\nrepresent the geometric mean of positive de\ufb01nite matrices (see f.i. [1, 12])\n\nA#B = A(A\u22121B)1/2 = (BA\u22121)1/2A = B(B\u22121A)1/2 = (AB\u22121)1/2B\n\n(5)\nThe next result reveals further consistency with the scalar case, in fact we observe that if A and B have\nsome eigenvectors in common, then A + B and A#B have those eigenvectors, with eigenvalues given\nby the arithmetic and geometric mean of the corresponding eigenvalues of A and B, respectively.\nTheorem 1. Let u be an eigenvector of A and B with eigenvalues \u03bb and \u00b5, respectively. Then, u is\nan eigenvector of A + B and A#B with eigenvalue \u03bb + \u00b5 and\n\n\u03bb\u00b5, respectively.\n\n\u221a\n\n3.1 Geometric mean for signed networks clustering\nConsider the signed network G\u00b1 = (G+, G\u2212). We de\ufb01ne the normalized geometric mean Laplacian\nof G\u00b1 as\n\nLGM = L+\n\n(6)\nWe propose Algorithm 1 for clustering signed networks, based on the spectrum of LGM . By\nde\ufb01nition 2, the matrix geometric mean A#B requires A and B to be positive de\ufb01nite. As both\nthe Laplacian and the signless Laplacian are positve semi-de\ufb01nte, in what follows we shall assume\nsym in (6) are modi\ufb01ed by a small diagonal shift, ensuring positive\nthat the matrices L+\nde\ufb01niteness. That is, in practice, we consider L+\nsym + \u03b52I being \u03b51 and \u03b52\nsmall positive numbers. For the sake of brevity, we do not explicitly write the shifting matrices.\nInput: Symmetric weight matrices W +, W \u2212 \u2208 Rn\u00d7n, number k of clusters to construct.\nOutput: Clusters C1, . . . , Ck.\n\nsym + \u03b51I and Q\u2212\n\nsym and Q\u2212\n\nsym#Q\u2212\n\nsym\n\n1 Compute the k eigenvectors u1, . . . , uk corresponding to the k smallest eigenvalues of LGM .\n2 Let U = (u1, . . . , uk).\n3 Cluster the rows of U with k-means into clusters C1, . . . , Ck.\n\nAlgorithm 1: Spectral clustering with LGM on signed networks\n\n3\n\n\f(E+)\n(E\u2212)\n(Ebal)\n\np+\nout < p+\nin\np\u2212\nin < p\u2212\np\u2212\nin + p+\n\nout\nout < p+\n\nin + p\u2212\n\nout\n\n(Evol)\n\n(Econf )\n\n(EG)\n\nout < p+\n\n(cid:17)(cid:16)\n(cid:16)\nin + (k \u2212 1)p\u2212\np\u2212\n(cid:17)(cid:16)\n(cid:16)\n\nin+(k\u22121)p+\np+\n\nkp+\n\nout\n\nout\n\nkp+\n\nout\n\nin+(k\u22121)p+\np+\n\nout\n\nout\n\n(cid:17)\nin + (k \u2212 1)p+\n(cid:17)\n\n\u2212\n\u2212\nin+(k\u22121)p\np\nout\n\u2212\n\u2212\nin\u2212p\n1 + p\nout\n\u2212\nin+(k\u22121)p\n\n\u2212\nout\n\n\u2212\nin\n\n< 1\n\nkp\n\np\n\n< 1\n\nTable 1: Conditions for the Stochastic Block Model analysis of Section 4\n\nThe main bottleneck of Algorithm 1 is the computation of the eigenvectors in step 1. In Section 5 we\npropose a scalable Krylov-based method to handle this problem.\nLet us brie\ufb02y discuss the motivating intuition behind the proposed clustering strategy. Algorithm 1,\nas well as state-of-the-art clustering algorithms based on the matrices in (3), rely on the k smallest\neigenvalues of the considered operator and their corresponding eigenvectors. Thus the relative\nordering of the eigenvalues plays a crucial role. Assume the eigenvalues to be enumerated in\nascending order. Theorem 1 states that the functions (A, B) (cid:55)\u2192 A + B and (A, B) (cid:55)\u2192 A#B map\neigenvalues of A and B having the same corresponding eigenvectors, into the arithmetic mean\n\n\u03bbi(A) + \u03bbj(B) and geometric mean(cid:112)\u03bbi(A)\u03bbj(B), respectively, where \u03bbi(\u00b7) is the ith smallest\n\neigenvalue of the corresponding matrix. Note that the indices i and j are not the same in general,\nas the eigenvectors shared by A and B may be associated to eigenvalues having different positions\nin the relative ordering of A and B. This intuitively suggests that small eigenvalues of A + B are\nrelated to small eigenvalues of both A and B, whereas those of A#B are associated with small\neigenvalues of either A or B, or both. Therefore the relative ordering of the small eigenvalues of\nLGM is in\ufb02uenced by the presence of assortative clusters in G+ (related to small eigenvalues of\nsym) or by disassortative clusters in G\u2212 (related to small eigenvalues in Q\u2212\nsym), whereas the ordering\nL+\nof the small eigenvalues of the arithmetic mean takes into account only the presence of both those\nsituations.\nIn the next section, for networks following the stochastic block model, we analyze in expectation\nthe spectrum of the normalized geometric mean Laplacian as well as the one of the normalized\nLaplacians previously introduced. In this case the expected spectrum can be computed explicitly and\nwe observe that in expectation the ordering induced by blending the informations of G+ and G\u2212\ntrough the geometric mean allows to recover the ground truth clusters perfectly, whereas the use of\nthe arithmetic mean introduces a bias which reverberates into a signi\ufb01cantly higher clustering error.\n\n4 Stochastic block model on signed graphs\n\nin (p\u2212\n\nout (p\u2212\n\nIn this section we present an analysis of different signed graph Laplacians based on the Stochastic\nBlock Model (SBM). The SBM is a widespread benchmark generative model for networks showing a\nclustering, community, or group behaviour [22]. Given a prescribed set of groups of nodes, the SBM\nde\ufb01nes the presence of an edge as a random variable with probability being dependent on which\ngroups it joins. To our knowledge this is the \ufb01rst analysis of spectral clustering on signed graphs\nwith the stochastic block model. Let C1, . . . ,Ck be ground truth clusters, all having the same size |C|.\nWe let p+\nin) be the probability that there exists a positive (negative) edge between nodes in the\nsame cluster, and let p+\nout) denote the probability of a positive (negative) edge between nodes in\ndifferent clusters.\nCalligraphic letters denote matrices in expectation. In particular W + and W\u2212 denote the weight\nmatrices in expectation. We have W +\nin if vi, vj belong to the same cluster,\nwhereas W +\ni,j = p\u2212\nout if vi, vj belong to different clusters. Sorting nodes according\nto the ground truth clustering shows that W + and W\u2212 have rank k.\nConsider the relations in Table 1. Conditions E+ and E\u2212 describe the presence of assortative or\ndisassortative clusters in expectation. Note that, by De\ufb01nition 1, a graph is balanced if and only if\nin = 0. We can see that if E+ \u2229 E\u2212 then G\u2212 and G+ give information about the cluster\nout = p\u2212\np+\nstructure. Further, if E+ \u2229 E\u2212 holds then Ebal holds. Similarly Econf characterizes a graph where\nthe relative amount of con\ufb02icts - i.e. positive edges between the clusters and negative edges inside the\nclusters - is small. Condition EG is strictly related to such setting. In fact when E\u2212 \u2229 EG holds then\n\nout and W\u2212\n\nin and W\u2212\n\ni,j = p\u2212\n\ni,j = p+\n\ni,j = p+\n\n4\n\n\fEconf holds. Finally condition Evol implies that the expected volume in the negative graph is smaller\nthan the expected volume in the positive one. This condition is therefore not related to any signed\nclustering structure.\nLet\n\n\u03c71 = 1,\n\n\u03c7i = (k \u2212 1)1Ci \u2212 1Ci\n\n.\n\nThe use of k-means on \u03c7i, i = 1, . . . , k identi\ufb01es the ground truth communities Ci. As spectral\nclustering relies on the eigenvectors corresponding to the k smallest eigenvalues (see Algorithm 1)\nwe derive here necessary and suf\ufb01cient conditions such that in expectation the eigenvectors \u03c7i, i =\n1, . . . , k correspond to the k smallest eigenvalues of the normalized Laplacians introduced so far. In\nparticular, we observe that condition EG affects the ordering of the eigenvalues of the normalized\ngeometric mean Laplacian. Instead, the ordering of the eigenvalues of the operators based on the\narithmetic mean is related to Ebal and Evol. The latter is not related to any clustering, thus introduces\na bias in the eigenvalues ordering which reverberates into a noisy embedding of the data points and in\nturn into a signi\ufb01cantly higher clustering error.\nTheorem 2. Let LBN and LSN be the normalized Laplacians de\ufb01ned in (3) of the expected graphs.\nThe following statements are equivalent:\n\n1. \u03c71, . . . , \u03c7k are the eigenvectors corresponding to the k smallest eigenvalues of LBN .\n2. \u03c71, . . . , \u03c7k are the eigenvectors corresponding to the k smallest eigenvalues of LSN .\n3. The two conditions Ebal and Evol hold simultaneously.\n\nTheorem 3. Let LGM = L+\nsym be the geometric mean of the Laplacians of the expected\ngraphs. Then \u03c71, . . . , \u03c7k are the eigenvectors corresponding to the k smallest eigenvalues of LGM\nif and only if condition EG holds.\n\nsym#Q\u2212\n\nConditions for the geometric mean Laplacian of diagonally shifted Laplacians are available in the\nsupplementary material. Intuition suggests that a good model should easily identify clusters when\nE+ \u2229 E\u2212. However, unlike condition EG, condition Evol \u2229 Ebal is not directly satis\ufb01ed under that\nregime. Speci\ufb01cally, we have\nCorollary 1. Assume that E+ \u2229 E\u2212 holds. Then \u03c71, . . . , \u03c7k are eigenvectors corresponding to the\nk smallest eigenvalues of LGM . Let p(k) denote the proportion of cases where \u03c71, . . . , \u03c7k are the\neigenvectors of the k smallest eigenvalues of LSN or LBN , then p(k) \u2264 1\n\n6 + 2\n\n3(k\u22121) + 1\n\n(k\u22121)2 .\n\nIn order to grasp the difference in expectation between LBN , LSN and LGM , in Fig 1 we present the\nproportion of cases where Theorems 2 and 3 hold under different contexts. Experiments are done with\nall four parameters discretized in [0, 1] with 100 steps. The expected proportion of cases where EG\nholds (Theorem 3) is far above the corresponding proportion for Evol \u2229 Ebal (Theorem 2), showing\nthat in expectation the geometric mean Laplacian is superior to the other signed Laplacians.\nIn\nFig. 2 we present experiments on sampled graphs with k-means on top of the k smallest eigenvectors.\nIn all cases we consider clusters of size |C| = 100 and present the median of clustering error (i.e.,\nerror when clusters are labeled via majority vote) of 50 runs. The results show that the analysis\nmade in expectation closely resembles the actual behavior. In fact, even if we expect only one noisy\neigenvector for LBN and LSN , the use of the geometric mean Laplacian signi\ufb01cantly outperforms\nany other previously proposed technique in terms of clustering error. LSN and LBN achieve good\nclustering only when the graph resembles a k-balanced structure, whereas they fail even in the ideal\nsituation where either the positive or the negative graphs are informative about the cluster structure.\nAs shown in Section 6, the advantages of LGM over the other Laplacians discussed so far allow us to\nidentify a clustering structure on the Wikipedia benchmark real world signed network, where other\nclustering approaches have failed.\n\n5 Krylov-based inverse power method for small eigenvalues of L+\n\nsym#Q\u2212\n\nsym\n\nThe computation of the geometric mean A#B of two positive de\ufb01nite matrices of moderate size\nhas been discussed extensively by various authors [20, 11, 12, 13]. However, when A and B have\nlarge dimensions, the approaches proposed so far become unfeasible, in fact A#B is in general a full\nmatrix even if A and B are sparse. In this section we present a scalable algorithm for the computation\nof the smallest eigenvectors of L+\nsym. The method is discussed for a general pair of matrices\nA and B, to emphasize its general applicability which is therefore interesting in itself. We remark that\n\nsym#Q\u2212\n\n5\n\n\fFigure 1: Fraction of cases where in expectation \u03c71, . . . , \u03c7k correspond to the k smallest eigenvalues\nunder the SBM.\n\nFigure 2: Median clustering error under the stochastic block model over 50 runs.\n\nthe method takes advantage of the sparsity of A and B and does not require to explicitly compute the\nmatrix A#B. To our knowledge this is the \ufb01rst effective method explicitly built for the computation\nof the eigenvectors of the geometric mean of two large and sparse positive de\ufb01nite matrices.\nGiven a positive de\ufb01nite matrix M with eigenvalues \u03bb1 \u2264 \u00b7\u00b7\u00b7 \u2264 \u03bbn, let H be any eigenspace of M\nassociated to \u03bb1, . . . , \u03bbt. The inverse power method (IPM) applied to M is a method that converges\nto an eigenvector x associated to the smallest eigenvalue \u03bbH of M such that \u03bbH (cid:54)= \u03bbi, i = 1, . . . , t.\nThe pseudocode of IPM applied to A#B = A(A\u22121B)1/2 is shown in Algorithm 2. Given a vector\nv and a matrix M, the notation solve{M, v} is used to denote a procedure returning the solution\nx of the linear system M x = v. At each step the algorithm requires the solution of two linear\nsystems. The \ufb01rst one (line 2) is solved by the preconditioned conjugate gradient method, where the\npreconditioner is obtained by the incomplete Cholesky decomposition of A. Note that the conjugate\ngradient method is very fast, as A is assumed sparse and positive de\ufb01nite, and it is matrix-free, i.e. it\nrequires to compute the action of A on a vector, whereas it does not require the knowledge of A (nor\nits inverse). The solution of the linear system occurring in line 3 is the major inner-problem of the\nproposed algorithm. Its ef\ufb01cient solution is performed by means of an extended Krylov subspace\ntechnique that we describe in the next section. The proposed implementation ensures the whole IPM\nis matrix-free and scalable.\n\n5.1 Extended Krylov subspace method for the solution of the linear system (A\u22121B)1/2x = y\nWe discuss here how to apply the technique known as Extended Krylov Subspace Method (EKSM) for\nthe solution of the linear system (A\u22121B)1/2x = y. Let M be a large and sparse matrix, and y a given\nvector. When f is a function with a single pole, EKSM is a very effective method to approximate\nthe vector f (M )y without ever computing the matrix f (M ) [8]. Note that, given two positive\nde\ufb01nite matrices A and B and a vector y, the vector we want to compute is x = (A\u22121B)\u22121/2y,\nso that our problem boils down to the computation of the product f (M )y, where M = A\u22121B and\nf (X) = X\u22121/2. The general idea of EKSM s-th iteration is to project M onto the subspace\n\nKs(M, y) = span{y, M y, M\u22121y, . . . , M s\u22121y, M 1\u2212sy} ,\n\nand solve the problem there. The projection onto Ks(M, y) is realized by means of the Lanczos\nprocess, which produces a sequence of matrices Vs with orthogonal columns, such that the \ufb01rst\n\n6\n\n2 5 102550100Numberofclusters00.20.40.60.81PositiveandNegativeInformative:p+out<p+inandp\u2212in<p\u2212outUpperboundLSN,LBNLGM(ours)2 5 102550100NumberofclustersPositiveorNegativeInformative:p+out<p+inorp\u2212in<p\u2212out2 5 102550100Numberofclustersp\u2212in+p+out<p+in+p\u2212out-0.050 0.05 PositiveInformation:p+in\u2212p+out00.10.20.30.4MedianClusteringErrorNegativeInformativep\u2212in=0.01,p\u2212out=0.09-0.050 0.05 NegativeInformation:p\u2212in\u2212p\u2212outPositiveInformativep+in=0.09,p+out=0.012 5 7 10Sparsity(%)NegativeInformativep\u2212out/p\u2212in=9/1p+out/p+in=1\u00b10.32 5 7 10Sparsity(%)PositiveInformativep+out/p+in=1/9p\u2212out/p\u2212in=1\u00b10.3L+symQ\u2212symLSNLBNLAMLGM(ours)\fcolumn of Vs is a multiple of y and range(Vs) = Ks(M, y). Moreover at each step we have\n\nM Vs = VsHs + [us+1, vs+1][e2s+1, e2s+2]T\n\n(7)\nwhere Hs is 2s \u00d7 2s symmetric tridiagonal, us+1 and vs+1 are orthogonal to Vs, and ei is the i-th\ncanonical vector. The solution x is then approximated by xs = Vsf (Hs)e1(cid:107)y(cid:107) \u2248 f (M )y. If n\nis the order of M, then the exact solution is obtained after at most n steps. However, in practice,\nsigni\ufb01cantly fewer iterations are enough to achieve a good approximation, as the error (cid:107)xs \u2212 x(cid:107)\ndecays exponentially with s (Thm 3.4 and Prop. 3.6 in [14]). See the supplementary material for\ndetails.\nThe pseudocode for the extended Krylov iteration is presented in Algorithm 3. We use the stopping\ncriterion proposed in [14]. It is worth pointing out that at step 4 of the algorithm we can freely\nchoose any scalar product (cid:104)\u00b7,\u00b7(cid:105), without affecting formula (7) nor the convergence properties of\nthe method. As M = A\u22121B, we use the scalar product (cid:104)u, v(cid:105)A = uT Av induced by the positive\nde\ufb01nite matrix A, so that the computation of the tridiagonal matrix Hs in the algorithm simpli\ufb01es\nto V T\ns BVs. We refer to [9] for further details. As before, the solve procedure is implemented\nby means of the preconditioned conjugate gradient method, where the preconditioner is obtained\nby the incomplete Cholesky decomposition of the coef\ufb01cient matrix. Figure 3 shows that we are\nable to compute the smallest eigenvector of L+\nsym being just a constant factor worse than\nthe computation of the eigenvector of the arithmetic mean, whereas the direct computation of the\ngeometric mean followed by the computation of the eigenvectors is unfeasible for large graphs.\n\nsym#Q\u2212\n\n3\n\nInput: x0, eigenspace H of A#B.\nOutput: Eigenpair (\u03bbH, x) of A#B\n1 repeat\n2\n\nuk \u2190 solve{A, xk}\nvk \u2190 solve{(A\u22121B)1/2, uk}\nyk \u2190 project uk over H\u22a5\n4\nxk+1 \u2190 yk/(cid:107)yk(cid:107)2\n5\n6 until tolerance reached\n7 \u03bbH \u2190 xT\nAlgorithm 2:\nA#B.1/2\n\nk+1xk, x \u2190 xk+1\n\nIPM applied\n\nto\n\n5\n\n4\n\nInput: u0 = y, V0 = [\u00b7 ]\nOutput: x = (A\u22121B)\u22121/2y\n1 v0 \u2190 solve{B, Au0}\n2 for s = 0, 1, 2, . . . , n do\n\u02dcVs+1 \u2190 [Vs, us, vs]\n3\nVs+1 \u2190 Orthogonalize columns of \u02dcVs+1 w.r.t. (cid:104)\u00b7,\u00b7(cid:105)A\nHs+1 \u2190 V T\nxs+1 \u2190 H\nif tolerance reached then break\nus+1 \u2190 solve{A, BVs+1e1}\nvs+1 \u2190 solve{B, AVs+1e2}\n\ns+1BVs+1\n\u22121/2\ns+1 e1\n\n6\n7\n8\n9\n10 end\n11 x \u2190 Vs+1xs+1\nAlgorithm 3: EKSM for\n(A\u22121B)\u22121/2y\n\nthe\n\ncomputation of\n\nFigure 3: Median execution time of 10 runs for different Lapla-\ncians. Graphs have two perfect clusters and 2.5% of edges\namong nodes. LGM (ours) uses Algs 2 and 3, whereas we\nused Matlab\u2019s eigs for the other matrices. The use of eigs\non LGM is prohibitive as it needs the matrix LGM to be built\n(we use the toolbox provided in [2]), destroying the sparsity\nof the original graphs. Experiments are performed using one\nthread.\n\n6 Experiments\n\nSociology Networks We evaluate signed Laplacians LSN , LBN , LAM and LGM through three real-\nworld and moderate size signed networks: Highland tribes (Gahuku-Gama) network [21], Slovene\nParliamentary Parties Network [15] and US Supreme Court Justices Network [7]. For the sake of\ncomparison we take as ground truth the clustering that is stated in the corresponding references. We\nobserve that all signed Laplacians yield zero clustering error.\nExperiments on Wikipedia signed network. We consider the Wikipedia adminship election dataset\nfrom [17], which describes relationships that are positive, negative or non existent. We use Algs. 1\u22123\nand look for 30 clusters. Positive and negative adjacency matrices sorted according to our clustering\nare depicted in Figs. 4(a) and 4(b). We can observe the presence of a large relatively empty cluster.\n\n7\n\n246810sizeofgraph\u00d7104100102104Mediantime(sec)LSN(eigs)LBN(eigs)LGM(eigs)LGM(ours)\fZooming into the denser portion of the graph we can see a k-balanced behavior (see Figs. 4(c) and\n4(d)), i.e. the positive adjacency matrix shows assortative groups - resembling a block diagonal\nstructure - while the negative adjacency matrix shows a disassortative setting. Using LAM and LBN\nwe were not able to \ufb01nd any clustering structure, which corroborates results reported in [4]. This\nfurther con\ufb01rms that LGM overcomes other clustering approaches. To the knowledge of the authors,\nthis is the \ufb01rst time that clustering structure has been found in this dataset.\n\n(a) W +\n\n(b) W \u2212\n\n(c) W +(Zoom)\n\n(d) W \u2212(Zoom)\n\nFigure 4: Wikipedia weight matrices sorted according to the clustering obtained with LGM (Alg. 1).\nExperiments on UCI datasets. We evaluate our method LGM (Algs. 1\u22123) against LSN , LBN ,\nand LAM with datasets from the UCI repository (see Table. 2). We build W + from a symmetric\nk+-nearest neighbor graph, whereas W \u2212 is obtained from the symmetric k\u2212-farthest neighbor\ngraph. For each dataset we test all clustering methods over all possible choices of k+, k\u2212 \u2208\n{3, 5, 7, 10, 15, 20, 40, 60}. In Table 2 we report the fraction of cases where each method achieves\nthe best and strictly best clustering error over all the 64 graphs, per each dataset. We can see that our\nmethod outperforms other methods across all datasets.\nIn the \ufb01gure on the right of Table 2 we present the clustering error on MNIST dataset \ufb01xing k+ = 10.\nWith Q\u2212\nsym one gets the highest clustering error, which shows that the k\u2212-farthest neighbor graph is a\nsource of noise and is not informative. In fact, we observe that a small subset of nodes is the farthest\nneighborhood of a large fraction of nodes. The noise from the k\u2212-farthest neighbor graph is strongly\nin\ufb02uencing the performances of LSN and LBN , leading to a noisy embedding of the datapoints and\nin turn to a high clustering error. On the other hand we can see that LGM is robust, in the sense that\nits clustering performances are not affected negatively by the noise in the negative edges. Similar be-\nhaviors have been observed for the other datasets in Table 2, and are shown in supplementary material.\n\na\n\nMNIST, k+ = 10\n\niris wine ecoli optdig USPS pendig MNIST\n150\n70000\n3\n\n178\n3\n\n310\n3\n\n10992\n\n# vertices\n# classes\n\nLSN\n\nLBN\n\nLAM\n\nLGM\n\nBest (%)\n\nBest (%)\n\nStr. best (%) 7.8\n\n17.2 21.9\n4.7\n\n23.4 40.6 18.8\nStr. best (%) 10.9 21.9 14.1\n7.8\n6.3\n12.5 28.1 14.1\nStr. best (%) 10.9 14.1 12.5\n59.4 42.2 65.6\nStr. best (%) 57.8 35.9 60.9\n\nBest (%)\n\nBest (%)\n\n5620\n10\n28.1\n28.1\n0.0\n0.0\n0.0\n0.0\n71.9\n71.9\n\n9298\n10\n10.9\n9.4\n1.6\n1.6\n0.0\n0.0\n89.1\n87.5\n\n10\n10.9\n10.9\n3.1\n3.1\n1.6\n1.6\n84.4\n84.4\n\n10\n12.5\n12.5\n0.0\n0.0\n0.0\n0.0\n87.5\n87.5\n\nTable 2: Experiments on UCI datasets. Left: fraction of cases where methods achieve best and strictly\nbest clustering error. Right: clustering error on MNIST dataset.\n\na\n\nk\u2212\n\nAcknowledgments. The authors acknowledge support by the ERC starting grant NOLEPRO\n\n8\n\n57101520406000.20.40.6L+symQ-symLSNLBNLAMLGM(ours)\fReferences\n[1] R. Bhatia. Positive de\ufb01nite matrices. Princeton University Press, 2009.\n[2] D. Bini and B. Ianazzo. The Matrix Means Toolbox. http://bezout.dm.unipi.it/\n\nsoftware/mmtoolbox/, May 2015.\n\n[3] D. Cartwright and F. Harary. Structural balance: a generalization of Heider\u2019s theory. Psycholog-\n\nical Review, 63(5):277\u2013293, 1956.\n\n[4] K. Chiang, J. Whang, and I. Dhillon. Scalable clustering of signed networks using balance\n\nnormalized cut. CIKM, pages 615\u2013624, 2012.\n\n[5] J. A. Davis. Clustering and structural balance in graphs. Human Relations, 20:181\u2013187, 1967.\n[6] M. Desai and V. Rao. A characterization of the smallest eigenvalue of a graph. Journal of\n\nGraph Theory, 18(2):181\u2013194, 1994.\n\n[7] P. Doreian and A. Mrvar. Partitioning signed social networks. Social Networks, 31(1):1\u201311,\n\n2009.\n\n[8] V. Druskin and L. Knizhnerman. Extended Krylov subspaces: approximation of the matrix\n\nsquare root and related functions. SIAM J. Matrix Anal. Appl., 19:755\u2013771, 1998.\n\n[9] M. Fasi and B. Iannazzo. Computing the weighted geometric mean of two large-scale matrices\n\nand its inverse times a vector. MIMS EPrint: 2016.29.\n\n[10] F. Harary. On the notion of balance of a signed graph. Michigan Mathematical Journal,\n\n2:143\u2013146, 1953.\n\n[11] N. J. Higham, D. S. Mackey, N. Mackey, and F. Tisseur. Functions preserving matrix groups\n\nand iterations for the matrix square root. SIAM J. Matrix Anal. Appl., 26:849\u2013877, 2005.\n\n[12] B. Iannazzo. The geometric mean of two matrices from a computational viewpoint. Numer.\n\nLinear Algebra Appl., to appear, 2015.\n\n[13] B. Iannazzo and M. Porcelli. The Riemannian Barzilai-Borwein method with nonmonotone\n\nline-search and the Karcher mean computation. Optimization online, December 2015.\n\n[14] L. Knizhnerman and V. Simoncini. A new investigation of the extended Krylov subspace\n\nmethod for matrix function evaluations. Numer. Linear Algebra Appl., 17:615\u2013638, 2009.\n\n[15] S. Kropivnik and A. Mrvar. An Analysis of the Slovene Parliamentary Parties Networks.\n\nDevelopment in Statistics and Methodology, pages 209\u2013216, 1996.\n\n[16] J. Kunegis, S. Schmidt, A. Lommatzsch, J. Lerner, E. Luca, and S. Albayrak. Spectral analysis\nof signed graphs for clustering, prediction and visualization. In ICDM, pages 559\u2013570, 2010.\n[17] J. Leskovec and A. Krevl. SNAP Datasets: Stanford Large Network Dataset Collection.\n\nhttp://snap.stanford.edu/data, June 2014.\n\n[18] S. Liu. Multi-way dual cheeger constants and spectral bounds of graphs. Advances in Mathe-\n\nmatics, 268:306 \u2013 338, 2015.\n\n[19] U. Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395\u2013416, Dec.\n\n2007.\n\n[20] M. Ra\u00efssouli and F. Leazizi. Continued fraction expansion of the geometric matrix mean and\n\napplications. Linear Algebra Appl., 359:37\u201357, 2003.\n\n[21] K. E. Read. Cultures of the Central Highlands, New Guinea. Southwestern Journal of Anthro-\n\npology, 10(1):pp. 1\u201343, 1954.\n\n[22] K. Rohe, S. Chatterjee, B. Yu, et al. Spectral clustering and the high-dimensional stochastic\n\nblockmodel. The Annals of Statistics, 39(4):1878\u20131915, 2011.\n\n[23] J. Tang, Y. Chang, C. Aggarwal, and H. Liu. A survey of signed network mining in social media.\n\narXiv preprint arXiv:1511.07569, 2015.\n\n9\n\n\f", "award": [], "sourceid": 2173, "authors": [{"given_name": "Pedro", "family_name": "Mercado", "institution": "Saarland University"}, {"given_name": "Francesco", "family_name": "Tudisco", "institution": "Saarland University"}, {"given_name": "Matthias", "family_name": "Hein", "institution": "Saarland University"}]}