{"title": "A Unifying Framework for Spectrum-Preserving Graph Sparsification and Coarsening", "book": "Advances in Neural Information Processing Systems", "page_first": 7736, "page_last": 7747, "abstract": "How might one ``reduce'' a graph? \nThat is, generate a smaller graph that preserves the global structure at the expense of discarding local details? \nThere has been extensive work on both graph sparsification (removing edges) and graph coarsening (merging nodes, often by edge contraction); however, these operations are currently treated separately. \nInterestingly, for a planar graph, edge deletion corresponds to edge contraction in its planar dual (and more generally, for a graphical matroid and its dual). \nMoreover, with respect to the dynamics induced by the graph Laplacian (e.g., diffusion), deletion and contraction are physical manifestations of two reciprocal limits: edge weights of $0$ and $\\infty$, respectively. \nIn this work, we provide a unifying framework that captures both of these operations, allowing one to simultaneously sparsify and coarsen a graph while preserving its large-scale structure. \nThe limit of infinite edge weight is rarely considered, as many classical notions of graph similarity diverge. However, its algebraic, geometric, and physical interpretations are reflected in the Laplacian pseudoinverse $\\mat{L}^\\dagger$, which remains finite in this limit. \nMotivated by this insight, we provide a probabilistic algorithm that reduces graphs while preserving $\\mat{L}^\\dagger$, using an unbiased procedure that minimizes its variance. \nWe compare our algorithm with several existing sparsification and coarsening algorithms using real-world datasets, and demonstrate that it more accurately preserves the large-scale structure.", "full_text": "A Unifying Framework for Spectrum-Preserving\n\nGraph Sparsi\ufb01cation and Coarsening\n\nGecia Bravo-Hermsdorff*\n\nPrinceton Neuroscience Institute\n\nPrinceton University\n\nPrinceton, NJ, 08544, USA\ngeciah@princeton.edu\n\nLee M. Gunderson*\n\nDepartment of Astrophysical Sciences\n\nPrinceton University\n\nPrinceton, NJ, 08544, USA\n\nleeg@princeton.edu\n\nAbstract\n\nHow might one \u201creduce\u201d a graph? That is, generate a smaller graph that preserves\nthe global structure at the expense of discarding local details? There has been\nextensive work on both graph sparsi\ufb01cation (removing edges) and graph coarsening\n(merging nodes, often by edge contraction); however, these operations are currently\ntreated separately. Interestingly, for a planar graph, edge deletion corresponds to\nedge contraction in its planar dual (and more generally, for a graphical matroid\nand its dual). Moreover, with respect to the dynamics induced by the graph\nLaplacian (e.g., diffusion), deletion and contraction are physical manifestations\nof two reciprocal limits: edge weights of 0 and 1, respectively. In this work, we\nprovide a unifying framework that captures both of these operations, allowing one\nto simultaneously sparsify and coarsen a graph while preserving its large-scale\nstructure. The limit of in\ufb01nite edge weight is rarely considered, as many classical\nnotions of graph similarity diverge. However, its algebraic, geometric, and physical\ninterpretations are re\ufb02ected in the Laplacian pseudoinverse L\u2020, which remains \ufb01nite\nin this limit. Motivated by this insight, we provide a probabilistic algorithm that\nreduces graphs while preserving L\u2020, using an unbiased procedure that minimizes\nits variance. We compare our algorithm with several existing sparsi\ufb01cation and\ncoarsening algorithms using real-world datasets, and demonstrate that it more\naccurately preserves the large-scale structure.\n\n1 Motivation\n\nMany complex structures and phenomena are naturally described as graphs (eg,1 brains, social\nnetworks, the internet, etc). Indeed, graph-structured data are becoming increasingly relevant to\nthe \ufb01eld of machine learning [2, 3, 4]. These graphs are frequently massive, easily surpassing our\nworking memory, and often the computer\u2019s relevant cache [5]. It is therefore essential to obtain\nsmaller approximate graphs to allow for more ef\ufb01cient computation.\nGraphs are de\ufb01ned by a set of nodes V and a set of edges E \u2713 V \u21e5 V between them, and are often\nrepresented as an adjacency matrix A with size |V |\u21e5| V | and density /| E|. Reducing either of\nthese quantities is advantageous: graph \u201ccoarsening\u201d focuses on the former, aggregating nodes while\nrespecting the overall structure, and graph \u201csparsi\ufb01cation\u201d on the latter, preferentially retaining the\nimportant edges.\n\n\u21e4Both authors contributed equally to this work.\n1The authors agree with the sentiment of the footnote on page xv of [1], viz, omitting super\ufb02uous full stops\n\nto obtain a more ef\ufb01cient compression of, eg: videlicet, exempli gratia, etc.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fSpectral graph sparsi\ufb01cation has revolutionized the \ufb01eld of numerical linear algebra and is used, eg, in\nalgorithms for solving linear systems with symmetric diagonally dominant matrices in nearly-linear\ntime [6, 7] (in contrast to the fastest known algorithm for solving general linear systems, taking\nO(n!)-time, where ! \u21e1 2.373 is the matrix multiplication exponent [8]).\nGraph coarsening appears in many computer science and machine learning applications, eg: as\nprimitives for graph partitioning [9] and visualization algorithms2 [10]; as layers in graph convolution\nnetworks [3, 11]; for dimensionality reduction and hierarchical representation of graph-structured\ndata [12, 13]; and to speed up regularized least square problems on graphs [14], which arise in a\nvariety of problems such as ranking [15] and distributed synchronization of clocks [16].\nA variety of algorithms, with different objectives, have been proposed for both sparsi\ufb01cation and\ncoarsening. However, a frequently recurring theme is to consider the graph Laplacian L = D A,\nwhere D is the diagonal matrix of node degrees. Indeed, it appears in a wide range of applications,\neg: its spectral properties can be leveraged for graph clustering [17]; it can be used to ef\ufb01ciently solve\nmin-cut/max-\ufb02ow problems [18]; and for undirected, positively weighted graphs (the focus of this\npaper), it induces a natural quadratic form, which can be used, eg, to smoothly interpolate functions\nover the nodes [19].\nWork on spectral graph sparsi\ufb01cation focuses on preserving the Laplacian quadratic form ~x>L~x, a\npopular measure of spectral similarity suggested by Spielman & Teng [6]. A key result in this \ufb01eld is\nthat any dense graph can be sparsi\ufb01ed to O(|V | log |V |) edges in nearly linear time using a simple\nprobabilistic algorithm [20]: start with an empty graph, include edges from the original graph with\nprobability proportional to their effective resistance, and appropriately reweight those edges so as to\npreserve ~x>L~xwithin a reasonable factor.\nIn contrast to the \ufb01rm theoretical footing of spectral sparsi\ufb01cation, work on graph coarsening\nhas not reached a similar maturity; while a variety of spectral coarsening schemes have been\nrecently proposed, algorithms frequently rely on heuristics, and there is arguably no consensus. Eg:\nJin & Jaja [21] use k eigenvectors of the Laplacian as feature vectors to perform k-means clustering\nof the nodes; Purohit et al. [22] aim to minimize the change in the largest eigenvalue of the adjacency\nmatrix; and Loukas & Vandergheynst [23] focuses on a \u201crestricted\u201d Laplacian quadratic form.\nAlthough recent work has combined sparsi\ufb01cation and coarsening [24], they used separate algorithmic\nprimitives, essentially analyzing the serial composition of the above algorithms. The primary contri-\nbution of this work is to provide a unifying probabilistic framework that allows one to simultaneously\nsparsify and coarsen a graph while preserving its global structure by using a single cost function that\npreserves the Laplacian pseudoinverse L\u2020.\nCorollary contributions include: 1) Identifying the limit of in\ufb01nite edge weight with edge contraction,\nhighlighting how its algebraic, geometric, and physical interpretations are re\ufb02ected in L\u2020, which\nremains \ufb01nite in this limit (Section 2); 2) Offering a way to quantitatively compare the effects\nof edge deletion and edge contraction (Section 2 and 3); 3) Providing a probabilistic algorithm\nthat reduces graphs while preserving L\u2020, using an unbiased procedure that minimizes its variance\n(Sections 3 and 4); 4) Proposing a more sensitive measure of spectral similarity of graphs, inspired\nby the Poincar\u00e9 half-plane model of hyperbolic space (Section 5.3); and 5) Comparing our algorithm\nwith several existing sparsi\ufb01cation and coarsening algorithms using synthetic and real-world datasets,\ndemonstrating that it more accurately preserves the large-scale structure (Section 5).\n\n2 Why the Laplacian pseudoinverse\n\nMany computations over graphs involve solving L~x= ~b for ~x[25]. Thus, the algebraically relevant\noperator is arguably the Laplacian pseudoinverse L\u2020. In fact, its connection with random walks\nhas been used to derive useful measures of distances on graphs, such as the well-known effective\nresistance [26], and the recently proposed resistance perturbation distance [27]. Moreover, taking\nthe pseudoinverse of L leaves its eigenvectors unchanged, but inverts the nontrivial eigenvalues.\nThus, as the largest eigenpairs of L\u2020 are associated with global structure, preserving its action will\npreferentially maintain the overall \u201cshape\u201d of the graph (see Appendix Section G for details). For\ninstance, the Fielder vector [17] (associated with the \u201calgebraic connectivity\u201d of a graph) will be\n\n2For animated examples using our graph reduction algorithm, see the following link:\n\nyoutube.com/playlist?list=PLm\ufb01Qcz2q6d3sZutLri4ZAIDLqM_4K1p-.\n\n2\n\n\fpreferentially preserved. We now discuss in further detail why L\u2020 is well-suited for both graph\nsparsi\ufb01cation and coarsening.\nAttention is often restricted to undirected, positively weighted graphs [28]. These graphs have\nmany convenient properties, eg, their Laplacians are positive semide\ufb01nite (~x>L~x 0) and have a\nwell-understood kernel and cokernel (L~1 = ~1>L = ~0). The edge weights are de\ufb01ned as a mapping\nW: E ! R>0. When the weights represent connection strength, it is generally understood that\nwe ! 0 is equivalent to removing edge e. However, the closure of the positive reals has a reciprocal\nlimit, namely we ! +1.\nThis limit is rarely considered, as many classical notions of graph similarity diverge. This includes\nthe standard notion of spectral similarity, where eG is a -spectral approximation of G if it preserves\nthe Laplacian quadratic form ~x>LG~xto within a factor of for all vectors ~x2 R|VG| [6]. Clearly, this\nlimit yields a graph that does not approximate the original for any choice of : any ~xwith different\nvalues for the two nodes joined by the edge with in\ufb01nite weight now yields an in\ufb01nite quadratic form.\nThis suggests considering only vectors that have the same value for these two nodes, essentially\ncontracting them into a single \u201csupernode\u201d. Algebraically, this interpretation is re\ufb02ected in L\u2020, which\nremains \ufb01nite in this limit: the pair of rows (and columns) corresponding to the contracted nodes\nbecome identical (see Appendix Section C).\nPhysically, consider the behavior of the heat equation @t~x+ L~x= ~0: as we ! +1, the values on\nthe two nodes immediately equilibrate between themselves, and remain tethered for the rest of the\nevolution.3 Geometrically, the reciprocal limits of we ! 0 and we ! +1 have dual interpretations:\nconsider a planar graph and its planar dual; edge deletion in one graph corresponds to contraction in\nthe other, and vice versa. This naturally extends to nonplanar graphs via their graphical matroids and\ntheir duals [29].\nFinally, while the Laplacian operator is frequently considered in the graph sparsi\ufb01cation and coarsen-\ning literature, its pseudoinverse also has many important applications in the \ufb01eld of machine learning\n[30], eg: online learning over graphs [31]; similarity prediction of network data [32]; determining\nimportant nodes [33]; providing a measure of network robustness to multiple failures [34]; extending\nprincipal component analysis to graphs [35]; and collaborative recommendation systems [36]. Hence,\ngraph reduction algorithms that preserve L\u2020 would be useful to the machine learning community.\n\n3 Our graph reduction framework\n\nWe now describe our framework for constructing probabilistic algorithms that generate a reduced\n\ngraph eG from an initial graph G, motivated by the following desiderata: 1) Reduce the number of\n\nedges/nodes (Section 3.1); 2) Preserve L\u2020 in expectation (Section 3.2); and 3) Minimize the change\nin L\u2020 (Section 3.3).\nWe \ufb01rst de\ufb01ne these goals more formally. Then, in Section 3.4, we combine these requirements\nto de\ufb01ne our cost function and derive the optimal probabilistic action (ie, deletion, contraction, or\nreweight) to perform to an edge.\n\n3.1 Reducing edges and nodes\n\nDepending on the application, it might be more important to reduce the number of nodes (eg,\ncoarsening a sparse network) or the number of edges (eg, sparsifying a dense network). Let r be\nthe number of prioritized items reduced during a particular iteration. When those items are nodes,\nthen r = 0 for a deletion, and r = 1 for a contraction. When those items are edges, then r = 1 for a\ndeletion, however r > 1 for a contraction is possible: if the contracted edge forms a triangle in the\noriginal graph, then the other two edges will become parallel in the reduced graph (see Figure SI 3\nin Appendix Section C). With respect to the Laplacian, this is equivalent to a single edge with\nweight given by the sum of these now parallel edges. Thus, when edge reduction is prioritized, a\ncontraction will have r = 1 + \u2327e, where \u2327e is the number of triangles in the original graph G in which\nthe contracted edge e participates.\n\n3In the spirit of another common analogy (edge weights as conductances of a network of resistors), breaking\n\na resistor is equivalent to deleting that edge, while contraction amounts to completely soldering over it.\n\n3\n\n\fNote that, even when node reduction is prioritized, the number of edges will also necessarily decrease.\nConversely, when edge reduction is prioritized, contraction of an edge is also possible, thereby\nreducing the number of nodes as well. For the case of simultaneously sparsifying and coarsening a\ngraph, we choose to prioritize edge reduction, although nodes could also be a sensible choice.\n\n3.2 Preserving the Laplacian pseudoinverse\nConsider perturbing the weight of a single edge e = (v1, v2) by w. The change in the Laplacian is\n(1)\n\nL\n\nwhere L\nsigned incidence (column) vector associated with edge e, with entries\n\neG and LG are the perturbed and original Laplacians, respectively, and ~be is the (arbitrarily)\n\neG LG = w~be\n\n~b>e ,\n\nThe change in L\u2020 is given by the Woodbury matrix identity4 [39]:\n\nNote that this change can be expressed as a matrix that depends only on the choice of edge e,\nmultiplied by a scalar term that depends (nonlinearly) on the change to its weight:\n\n(be)i =(+1\n\n1\n0\n\ni = v1\ni = v2\notherwise.\n\nL\u2020\n\nG = \n\neG L\u2020\nL\u2020 = f\u21e3 w\n\nwe\n\nw\n\n1 +w~b>e L\u2020\n\nG\n\n~be\n\nL\u2020\n\nG\n\n~b>e L\u2020\n~be\nG.\n\n, we\u2326e\u2318\n{z\n}\n\n\u21e5 Me\n\n,\n\n|{z}\n\nnonlinear scalar\n\nconstant matrix\n\n|\n\nf = \n\n,\n\nwe\u2326e\n\nw\nwe\n1 + w\nwe\n~b>e L\u2020\n~be\nG,\n~be.\n\nG\n\n(2)\n\n(3)\n\n(4)\n\n(5)\n\n(10)\n\nwhere\n\n(6)\n(7)\nHence, if the probabilistic reweight of this edge is chosen such that E[f ] = 0, then we have\nE[L\u2020\n\nG, as desired. Importantly, f remains \ufb01nite in the following relevant limits:\n\nMe = weL\u2020\n\u2326e = ~b>e L\u2020\n\nG\n\ndeletion:\ncontraction:\n\nw\n\nwe ! 1,\nwe ! +1,\n\nw\n\nf ! (1 we\u2326e)1\nf ! (we\u2326e)1 .\n\n(8)\n\neG] = L\u2020\n\nNote that f diverges when considering deletion of an edge with we\u2326e = 1 (ie, an edge cut). Indeed,\nsuch an action would disconnect the graph and invalidate the use of equation 3 (see footnote 4).\nHowever, this possibility is precluded by the requirement that E[f ] = 0.\n\n3.3 Minimizing the error\nMinimizing the magnitude of L\u2020 requires a choice of matrix norm, which we take to be the sum of\nthe squares of its entries (ie, the square of the Frobenius norm). Our motivation is twofold. First, the\nalgebraically convenient fact that the Frobenius norm of a rank one matrix has a simple form, viz,\n\n(9)\nSecond, the square of this norm behaves as a variance; to the extent that the Meassociated to different\nedges can be treated as (entrywise) uncorrelated one can decompose multiple perturbations as follows:\n\nme \u2318 kMekF = we~b>e L\u2020\n\nGL\u2020\n\n~be.\n\nG\n\nE\uf8ffX L\u2020\n\n2\n\nF \u21e1X EhL\u20202\nFi,\n\n4This expression is only of\ufb01cially applicable when the initial and \ufb01nal matrices are full-rank; additional care\nmust be taken when they are not. However, for the case of changing the edge weights of a graph Laplacian, the\noriginal formula remains unchanged [37, 38] (so long as the graph remains connected), provided one uses the\nde\ufb01nitions in Section 3.5 (see also Appendix Sections C and F).\n\n4\n\n\fwhich allows the single-edge results from Section 3.4 to be iteratively applied to our reduction\nalgorithm, which has multiple reductions (Section 4). In Appendix Section A, we empirically validate\nthis approximation using synthetic and real-world networks, showing that this approximation is either\nnearly exact or a conservative estimate.\nFor subtleties associated with edge contraction (see Appendix Section F, in particular equation 39).\n\n3.4 A cost function for spectral graph reduction\n\nCombining the discussed desiderata, we choose to minimize the following cost function:\n\n(11)\n\nFi 2E[r] ,\n\nC = EhL\u20202\nE\u21e5L\u2020\u21e4 = 0 ,\n\nsubject to\n\n(12)\nwhere the parameter controls the tradeoff between number of prioritized items reduced r and error\nincurred in L\u2020. This cost function naturally arises when minimizing the expected squared error for a\ngiven expected amount of reduction (or equivalently maximizing the expected number of reductions\nfor a given expected squared error).\nWe desire to minimize this cost function over all possible reduced graphs. As, when reducing\nmultiple edges, E[r] is additive and the expected squared error is empirically additive, we are able\nto decompose this objective into a sequence of minimizations applied to individual edges. Thus,\nminimization of this cost function for each edge acted upon can be seen as a probabilistic greedy\nalgorithm for minimizing the cost function for the \ufb01nal reduced graph.\nHere, we describe the analytic solution for the optimal action (ie, probabilistically choosing to delete,\ncontract, or reweight) to be applied to a single edge. We provide the solution in Figure 1, and a\ndetailed derivation in Appendix Section B.\nFor a given edge e, the values of me, we\u2326e, and \u2327e are \ufb01xed, and minimizing the cost function (11)\n(given (12)) results in a piecewise solution with three regimes, depending on the value of : 1) When\n< 1(me, we\u2326e,\u2327 e) = min(1d, 1c), is small compared with the error that would be incurred\nby acting on this edge, thus it should not be changed; 2) When > 2(me, we\u2326e,\u2327 e), is large for\nthis edge, and the optimal solution is to probabilistically delete or contract this edge (pd + pc = 1;\nno reweight is required); and 3) In the intermediate case (1 << 2), there are two possibilities,\ndepending on the edge and the choice of prioritized items: if 1d < 1c, the edge is either deleted or\nreweighted, and if 1c < 1d, the edge is either contracted or reweighted.\n\npd = 0,\nc pd = 1 \n\n=\u21e31 pd\n\n1wee\u23181\n\nd pd = 0,\n1\n\n<\n\npd = 1 we\u2326e,\n\npc = 0, w\nwe\n\n(1wee) ,\n\npc = 1 \n\n= pc\n\nwee1+e\n\n2\n\n<\n\n<\n1\n\n\npc = we\u2326e\n\n1\n\n<\nd\n1\n\n\npc = 0,\n\n > 2\n\n < 1\n\n 1\n\nw\nwe\n\nw\nwe\n\n= 0\n\nwee\n\nme\n\n,\n\nc\n1\n\n\nme\n\npc = 0, w\nwe\n\n= 0\n\nprioritizing edges\n\nprioritizing nodes\n\n < 1\n\npd = 0,\nc pd = 1 \n\n2\n\n<\n\n<\n1\n\n\n1\n\n<\nd\n1\n\n\nme\n\n(1wee) ,\n\n=\u21e31 pd\n\n1wee\u23181\n\nw\nwe\n\npc = 0,\n\n 1\n\npc = 1 \n\nme\n\nwee1+e\n\nd pd = 0,\n1\n\n<\n\nc\n1\n\n\nw\nwe\n\n= pc\n\nwee\n\n1d\n\n1c\n\n2\n\n,\n\nme\n\n1wee\n\nme\n\nwee\n\n11+e\n\n1\n\nme\n\nwee\n\nme\n\nwee(1wee)\n\n1\n\n1+1+e\n\nme\n\nwee(1wee)\n\n > 2\n\npd = 1 we\u2326e,\n\npc = we\u2326e\n\nprioritizing edges\n\nprioritizing nodes\n\nFigure 1: Left: Minimizing C for a single edge e. There are three regimes for the solution, depending on the\nvalue of . When node reduction is prioritized, set \u2327e = 0. Right: Values of dividing the three regimes.\nNote that when edge reduction is prioritized, the number of triangles enters the expressions, and when node\nreduction is prioritized, there is no deletion in the intermediate regime. However, for either choice, both deletion\nand contraction can have \ufb01nite probability, and the algorithm does not exclusively reduce one or the other. Thus,\nwhen simultaneously sparsifying and coarsening a graph, the prioritized items may be chosen to be either edges\nor nodes. We remark that the values of 1d, 1c, and 2 might be of independent interest as measures of edge\nimportance for analyzing connections in real-world networks.\n\n11+e\n\n1wee\n\n1\n\nwee\n\nwee\n\n1d\n\n1c\n\nme\n\nme\n\nme\n\n2\n\nme\n\nwee(1wee)\n\n1\n\n1+1+e\n\nme\n\nwee(1wee)\n\n5\n\n\f3.5 Node-weighted Laplacian\n\nMoreover, one must be careful to choose the appropriate pseudoinverse of L\n\nG, one must keep track of the number of original nodes that comprise these \u201csupernodes\u201d and\nassign them proportional weights. The appropriate reduced Laplacian L\nn B>WeB, where the W are the diagonal matrices of the node weights5 and the edge weights of\nW 1\n\nWhen nodes are merged, one often represents the connectivity of the resulting graph eG by a matrix\nof smaller size. To properly compare the spectral properties of eG with those of the original graph\neG (of size |VeG|\u21e5| VeG|) is then\neG, respectively, and B is its signed incidence matrix with columns given by (2).\neG, which is given by\n\neG + J1\nwhere ~wn 2 R|VG|>0 is the vector of node weights. Note that L\u2020\neGL\u2020\neG = I J , the appropriate\neGL\neG = L\nnode-weighted projection matrix.\nTo compare the action of the original and reduced Laplacians on a vector ~x2 R|VG| over the nodes\neG to operate on the same space as LG. We thus de\ufb01ne the\nof the original graph, one must \u201clift\u201d L\nmapping from original to coarsened nodes as a |VeG|\u21e5| VG| matrix C , with entries\n\neG =L\n\n J ,\n\n~1>~wn\n\n~1 ~w>n ,\n\n(13)\n\n(14)\n\n(15)\n\nL\u2020\n\n1\n\nJ =\n\ncij =n1\neG,l = C>L\n4 Our graph reduction algorithm\n\nThe appropriate lifted Laplacian is L\nverse is L\u2020\n\neG,l = C>L\u2020\n\neGW 1\n\n0\n\nnode j in supernode i\notherwise.\n\nn C . Likewise, the lifted Laplacian pseudoin-\nn C (see Appendix Section C for a detailed rationale of these de\ufb01nitions).\n\neGW 1\n\nUsing this framework, we now describe our graph reduction algorithm. Similar to many graph\ncoarsening methods [41, 42], we obtain the reduced graph by acting on the initial graph (as opposed\nto adding edges to an empty graph, as is frequently done in sparsi\ufb01cation [43, 44]).\nCare must be taken, however, as simultaneous deletions/contractions may result in undesirable\nbehavior. Eg, while any edge that is itself a cut-set will never be deleted (as we\u2326e = 1), a collection\nof edges that together make a cut-set might all have \ufb01nite deletion probability. Hence, if multiple\nedges are simultaneously deleted, the graph could become disconnected. In addition, the single-edge\nanalysis could underestimate the change in L\u2020 associated with simultaneous contractions. Eg, consider\ntwo highly-connected nodes that are each the center of a different community, and a third auxiliary\nnode that happens to be connected to both: contracting the auxiliary node into either of the other two\nwould be sensible, but performing both contractions would merge the two communities.\nAlgorithm 1 describes our graph reduction scheme. Its inputs are: G, the original graph; q, the\nfraction of sampled edges to act upon per iteration; d, the minimum expected decrease in prioritized\nitems per edge acted upon; and StopCriterion, a user-de\ufb01ned function. With these inputs, we\nimplicitly select . Let ?,e be the minimum such that E[r] d for edge e. For each iteration, we\ncompute ?,e for all sampled edges, and choose a such that a fraction q of them have ?,e < .\nWe then apply the corresponding probabilistic actions to these edges. The appropriate choice of\nStopCriterion depends on the application. Eg, if one desires to bound the accuracy of an algorithm\nthat uses graph reduction as a primitive, limiting the Frobenius error in L\u2020 is a sensible choice (it is\ntrivial to keep a running total of the estimated error, see Appendix Section A). On the other hand, if\none would like the reduced graph to be no larger than a certain size, then one can simply continue\nreducing until this point. While both criteria may also be implicitly implemented via an upper bound\non , the relationship is nontrivial and depends on the structure of the graph.\nThe aforementioned problems associated with simultaneous deletions/contractions can be eliminated\nby taking a conservative approach: acting on only a single edge per iteration. However, this results\nin an algorithm that does not scale favorably for large graphs. A more scalable solution involves\n\n5Wn is often referred to as the \u201cmass matrix\u201d [40]. We note that the use of the random walk matrix D1L\n\ncan be seen as using the node degrees as a surrogate for the node weights.\n\n6\n\n\f2: Initialize eG0 G, t 0, stop False\n\nSample an independent edge set\nfor (edge e) in (sampled edges) do\n\nCompute \u2326e, me (see equations (7) and (9))\nEvaluate ?e, according to d (see Tables in Figure 1)\n\nend for\nChoose such that a fraction q of the sampled edges (those with the lowest ?e) are acted upon\nProbabilistically choose to reweight, delete, or contract these edges\n\nAlgorithm 1 ReduceGraph\n1: Inputs: graph G, fraction of sampled edges to act upon q, minimum E[r] per edge acted upon d, and a\n\nStopCriterion\n\n3: while not (stop) do\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15: end while\n\nPerform reweights and deletions to eGt\nPerform contractions to eGt\neGt+1 eGt, t t + 1\nstop StopCriterion(eGt)\n16: return reduced graph eGt\n\ncarefully sampling the candidate set of edges. In particular, we are able to signi\ufb01cantly ameliorate\nthese issues by sampling the candidate edges such that they do not have any nodes in common (ie,\nthe sampled edges form an independent edge set). Not only does this eliminate the possibility of\n\u201caccidental\u201d contractions, but, empirically, it also suppresses the occurrence of graph disconnections\n(the small fraction that become disconnected are restarted). At each iteration, our algorithm \ufb01nds a\nrandom maximal independent edge set in O(|V |) time using a simple greedy algorithm.6 In practice,\nthe size of such a set scales as O(|V |) (although it is easy to \ufb01nd families for which this scaling does\nnot hold, eg, star graphs). Our algorithm then computes the \u2326e and me of these sampled edges, and\nacts on the fraction q with the lowest ?e.\nThe main computational bottleneck of our algorithm is computing \u2326e and me (equation (9)). However,\nwe can draw on the work of [20], which describes a method for ef\ufb01ciently computing \"-approximate\n\nvalues of \u2326e for all edges, requiring eO(|E| log |V |/\u270f2) time. With minimal changes, this procedure\ncan also be used to compute approximate values of me with similar ef\ufb01ciency (in Appendix Section F,\nwe discuss the details of how to ef\ufb01ciently compute approximations of me). As we must compute\nthese quantities for each iteration, we multiply the running time by the expected number of iterations,\nO(|E|/qd|V |). Empirically, we \ufb01nd that one is able to set q \u21e0 1/16 and d \u21e0 1/4 with minimal loss\nin reduction quality (see Appendix Section E). Thus, we expect that our algorithm could have a\nrunning time of eO(hki|E|), where hki is the average degree. However, in the following results, we\n\nhave used a naive implementation: computing L\u2020 at the onset, and updating it using the Woodbury\nmatrix identity.\n\n5 Experimental results\n\nIn this section, we empirically validate our framework and compare it with existing algorithms.\nWe consider two cases of our general framework, namely graph sparsi\ufb01cation (excluding regimes\ninvolving edge contraction), and graph coarsening (prioritizing reduction of nodes). In addition,\nas graph reduction is often used in graph visualization, we generated videos of our algorithm\nsimultaneously sparsifying and coarsening several real-world datasets (see footnote 2 and Appendix\nSection I).\n\n5.1 Hyperbolic interlude\n\nWhen comparing a graph G with its reduced approximation eG, it is natural to consider how relevant\neG,l~xis aligned with LG~x, the fractional error\n\nlinear operators treat the same input vector. If the vector L\nin the quadratic form ~x>L~xis a natural quantity to consider, as it corresponds to the relative change in\nthe magnitude of these vectors. However, it is not so clear how to compare output vectors that have\n\n6Speci\ufb01cally, randomly permute the nodes, and sequentially pair them with a random available neighbor (if\n\nthere is one). The obtained set contains at least half as many edges as the maximum matching [45].\n\n7\n\n\fan angular difference. Here, we describe a natural extension of this notion of fractional error, which\ndraws intuition from the Poincar\u00e9 half-plane model of hyperbolic geometry. In particular, we choose\nthe boundary of the half-plane to be perpendicular to ~xand compute the geodesic distance between\nLG~xand L\n\neG,l~x, viz,\n\n(16)\n\n(17)\n\nd~x(L0, L1) def= arccosh 1 +(L0 L1)~x2\n\n2~x>L0~x~x>L1~x ! ,\n2~x2\n\n2\n\nwhere L0 and L1 are positive de\ufb01nite matrices (for now).\nWe de\ufb01ne the hyperbolic distance between these matrices as\n\ndh(L0, L1) def= sup\n~x\n\nd~x(L0, L1) .\n\nThis dimensionless quantity inherits the following standard desirable features of a dis-\ntance: symmetry and non-negativity, dh(L0, L1) = dh(L1, L0) 0; identity of indiscernibles,\ndh(L0, L1) = 0 () L0 = L1; and subadditivity, dh(L0, L2) \uf8ff dh(L0, L1) + dh(L1, L2). In ad-\ndition, we note that dh(cL0, cL1) = dh(L0, L1) 8c 2 R\\{0}, emphasizing its interpretation as a\nfractional error.\nThis notion naturally extends to (positive semide\ufb01nite) graph Laplacians if one considers only vectors\n~xthat are orthogonal to their kernels (ie, require that ~1>~x= 0 when taking the supremum in (17)).\nWith this modi\ufb01cation, the connection with the spectral graph sparsi\ufb01cation can be stated as follows:\n\nTheorem 1. If dhLG, L\n\neG \uf8ff ln(), then eG is a -spectral approximation of G.\n\nHere, the notion of -spectral approximation is the same as in Spielman & Teng [6] (see Section 2),\nand thus is restricted to sparsi\ufb01cation only. The proof is provided in Appendix Section D.\nAs d~xis analogous to the ratio of quadratic forms with ~x, dh is likewise analogous to the notion of a\n-spectral approximation. Moreover, as d~xand dh also consider angular differences between LG~x\nand L\nIn the following sections, we compare our algorithm with other graph reduction methods using d~x,\nwhere we choose ~xto be eigenvectors of the original graph Laplacian. In Appendix Section H, we\nreplicate our results using more standard measures (eg, quadratic forms and eigenvalues).\n\neG,l~x, they serve as more sensitive measures of graph similarity.\n\n5.2 Comparison with spectral graph sparsi\ufb01cation\nFigure 2 compares our algorithm (prioritizing edge reduction, and excluding the possibility of\ncontraction) with the standard spectral sparsi\ufb01cation algorithm of Spielman & Srivastava [20] using\nthree real-world datasets. We choose to compare with this particular sparsi\ufb01cation method because it\ndirectly aims to optimally preserve the Laplacian. To the best of our knowledge, other sparsi\ufb01cation\nmethods either do not explicitly preserve properties associated with the Laplacian [46, 47], or share\nthe same spirit as Spielman & Srivastava\u2019s algorithm [48] (often considering other settings, such\nas distributed [49] or streaming [50] computation). The results in Figure 2 show that our algorithm\nbetter preserves L\u2020 and preferentially preserves its action on eigenvectors associated with global\nstructure.\n\n5.3 Comparison with graph coarsening algorithms\nFigure 3 compares our algorithm (prioritizing node reduction) with several existing coarsening\nalgorithms using three more real-world datasets. In order to make a fair comparison with these\nexisting methods, after contracting their prescribed groups of nodes, we appropriately lift the resulting\nreduced L\u2020\nstructure.\n\neG (see Appendix Section C). We \ufb01nd that our algorithm more accurately preserves global\n\n6 Conclusion\n\nIn this work, we unify spectral graph sparsi\ufb01cation and coarsening through the use of a single cost\nfunction that preserves the Laplacian pseudoinverse L\u2020. We describe a probabilistic algorithm for\n\n8\n\n\fSupplementary Information:\n\nSupplementary Information:\n\nSupplementary Information:\n\nA Unifying Framework for Spectrum-Preserving\n\nA Unifying Framework for Spectrum-Preserving\n\nA Unifying Framework for Spectrum-Preserving\n\nGraph Sparsi\ufb01cation and Coarsening\n\nGraph Sparsi\ufb01cation and Coarsening\n\nGraph Sparsi\ufb01cation and Coarsening\n\n,\nr\ne\nv\nu\no\nc\nn\na\nV\n\nLee M. Gunderson\u2217\n\nDepartment of Astrophysical Sciences\n\nPrinceton University\n\nPrinceton, NJ, 08540, USA\n\nleeg@princeton.edu\n\n,\n)\n9\n1\n0\n2\nS\nP\nI\nr\nu\ne\nN\n\nPrinceton University\n\nGecia Bravo-Hermsdorff\n\nPrinceton Neuroscience Institute\n\nGecia Bravo-Hermsdorff\nDepartment of Astrophysical Sciences\n\ngraph reduction that employs edge deletion, contraction, and reweighting to keep E\u21e5L\u2020\n\nLee M. Gunderson\u2217\nPrinceton Neuroscience Institute\nPrinceton University\nG, and\nPrinceton, NJ, 08540, USA\ngeciah@princeton.edu\nuses a new measure of edge importance (?) to minimize its variance. Using synthetic and real-world\ndatasets, we demonstrate that our algorithm more accurately preserves global structure compared to\nexisting algorithms. We hope that our framework (or some perturbation of it) will serve as a useful\ntool for graph algorithms, numerical linear algebra, and machine learning.\n\nLee M. Gunderson\u2217\nPrinceton Neuroscience Institute\nPrinceton University\n\nGecia Bravo-Hermsdorff\nDepartment of Astrophysical Sciences\n\neG\u21e4 = L\u2020\n\nPrinceton, NJ, 08540, USA\ngeciah@princeton.edu\n\nPrinceton, NJ, 08540, USA\n\nPrinceton, NJ, 08540, USA\n\n|E\u02dcG|/|EG| \u2248 1/2\n\n|E\u02dcG|/|EG| \u2248 1/2\n\n|E\u02dcG|/|EG| \u2248 1/2\n\nleeg@princeton.edu\n\nleeg@princeton.edu\n\nPrinceton University\n\nPrinceton University\n\nG\nE\n|\n/\n|\n\nk\nr\no\nw\ns\ni\nh\nt\n\nG\nV\n|\n/\n|\n\no\nt\n\n|\n\n|\n\ns\n\n(\n\n.\n\nm\ne\nPrinceton, NJ, 08540, USA\nt\ns\ny\ngeciah@princeton.edu\nS\ng\nn\ni\ns\ns\ne\nc\no\nr\nP\nn\no\ni\nt\na\nm\nr\no\nf\nn\nI\n\n)\ns\n/\nq\n(\nn\no\np\nu\nd\ne\nt\nc\na\n\n\u02dcG\nV\n\n\u02dcG\nE\n\ny\nl\nl\na\nu\nq\ne\n\nM\nR\ny\nb\n\nd\ne\nz\ni\nl\na\nm\nr\no\nn\n\n,\n\nl\n\n\u2020\u02dcG,\nL\nd\nn\na\n\n\u2020G\nL\n\nn\ne\ne\nw\nt\ne\nb\n)\n!x\nd\n(\n\nr\no\nr\nr\ne\n\nl\na\nn\no\ni\nt\nc\na\nr\nF\n\n)\ns\n(\n\n|\ng\nn\ni\nn\ni\na\nm\ne\nr\n\n|\ng\n|E\u02dcG|/|EG| \u2248 1/12\nn\ni\nn\ni\na\nSpielman et al\nm\nFractional error (d!x) between\ne\nd\n\u2020\u02dcG,\nr\ne\nL\u2020\u02dcG and L\u2020\nG, normalized by RM\nL\nl\np\nd\nm\nn\na\na\ns\n\nd\ne\nl\np\nm\na\ns\n\ns\ne\ng\nd\ne\n\ns\ne\ng\nd\ne\n\ns\ne\ng\nd\ne\n\nl\n\nOurs\n\nFractional error (d!x)\n\u2020G\nL\nbetween L\u2020\u02dcG and L\u2020\nn\ne\ne\nw\nt\ne\nb\n\nf\no\nn\no\ni\nt\nc\na\nr\nF\n\nG\n\nf\no\nr\ne\nb\nm\nu\nN\n\ns\ne\nd\no\nn\nf\no\nn\no\ni\nt\nc\na\nr\nF\n\nf\no\nn\no\ni\nt\nc\na\nr\nF\n\nFraction of nodes remaining |V\u02dcG|/|VG|\n\n)\n!x\nd\n(\n\nr\no\nr\nr\ne\n\n\u2020\u02dcG\nL\nd\nn\na\n\n\u2020G\nL\nn\ne\ne\nw\nt\ne\nb\n\nl\na\nn\no\ni\nt\nc\na\nr\nF\n\n1\n\n)\n!x\nd\n(\n\n0.1\n\n0.01\n\n0.001\n\nr\no\nr\nr\ne\n\nl\na\nn\no\ni\nt\nc\na\nr\nF\n\n1.0\n\n1\n\n0.1\n\n0.01\n\u221e\n\u2192\n0.001\nw\n\ne\n\nSpielman et al\n\n|E\u02dcG|/|EG| \u2248 1/12\nFractional error (d!x) between\nL\u2020\u02dcG and L\u2020\nG, normalized by RM\n\nd\ne\nt\nu\nb\ni\nr\nt\nn\no\nc\n\nl\na\nr\nu\ne\nN\nn\no\n\nOurs\n\ns\nr\no\nh\nt\nu\na\n\ne\nc\nn\ne\nr\ne\nf\nn\no\nC\nd\nr\n3\n3\n\nh\nt\no\nB\n\u2217\n\nFractional error (d!x)\n\u221e\nbetween L\u2020\u02dcG and L\u2020\n\u2192\nFraction of nodes remaining |V\u02dcG|/|VG|\n\ne\nw\n\nG\n\n1\n\n0.1\n\n|E\u02dcG|/|EG| \u2248 1/12\nFractional error (d!x) between\nL\u2020\u02dcG and L\u2020\nG, normalized by RM\n\nSpielman et al\n\nOurs\n\n0.01\n\n0.001\n\nFractional error (d!x)\nbetween L\u2020\u02dcG and L\u2020\n\nG\n\nFraction of nodes remaining |V\u02dcG|/|VG|\n\n2\n1\n5\n3\n2\n1\n\n0.5\n\n0.0\n\n1.0\n\n0.5\n\n0.0\n\n1.0\n\n0.5\n\n0.0\n\n.\na\nd\na\nn\na\nC\n\nFraction of sampled edges acted upon (q/s)\n\nFraction of edges remaining |E\u02dcG|/|EG|\nNumber of sampled edges (s)\n\nFraction of edges remaining |E\u02dcG|/|EG|\nNumber of sampled edges (s)\nSupplementary Information:\n\nFraction of edges remaining |E\u02dcG|/|EG|\nNumber of sampled edges (s)\nSupplementary Information:\n\nGraph Sparsi\ufb01cation and Coarsening\n\nA Unifying Framework for Spectrum-Preserving\nFraction of sampled edges acted upon (q/s)\n\nFigure 2: Our sparsi\ufb01cation algorithm preferentially preserves global structure. We applied our algorithm\nwithout contraction (Ours) and compare with that of Spielman & Srivastava [20] (Spielman et al) using three\nA Unifying Framework for Spectrum-Preserving\nFraction of sampled edges acted upon (q/s)\ndatasets: Left: a collaboration network of Jazz musicians (198 nodes and 2742 edges) from [51]; Middle:\nthe C. elegans posterior nervous system connectome (269 nodes and 2902 edges) from [52]; and Right: a\nweighted social network of face-to-face interactions between primary school students, with initial edge weights\nproportional to the number of interactions between pairs of students (236 nodes and 5899 edges) from [53]. For\nG~xand L\u2020\nthe two algorithms, we compute the hyperbolic distance d~x(fractional error) between L\u2020\nlevels of sparsi\ufb01cation for two choices of ~x: the smallest non-trivial eigenvector of the original Laplacian (dark\nGecia Bravo-Hermsdorff\nLee M. Gunderson\nshading), which is associated with global structure; and the median eigenvector (light shading). Shading denotes\nDepartment of Astrophysical Sciences\nPrinceton Neuroscience Institute\nPrinceton University\none standard deviation about the mean for 16 runs of the algorithms. The curves end at the minimum edge\nPrinceton, NJ, 08540, USA\ndensity for which the sparsi\ufb01ed graph is connected.\ngeciah@princeton.edu\n\nGraph Sparsi\ufb01cation and Coarsening\n\nLee M. Gunderson\nPrinceton Neuroscience Institute\nPrinceton University\n\nGecia Bravo-Hermsdorff\nDepartment of Astrophysical Sciences\n\neG~xat different\n\nPrinceton, NJ, 08540, USA\ngeciah@princeton.edu\n\nPrinceton, NJ, 08540, USA\ngeciah@princeton.edu\n\nPrinceton Neuroscience Institute\n\nGecia Bravo-Hermsdorff\n\nPrinceton, NJ, 08540, USA\n\nPrinceton, NJ, 08540, USA\n\nleeg@princeton.edu\n\nleeg@princeton.edu\n\nPrinceton University\n\nPrinceton University\n\nPrinceton University\n\n,\n)\n9\n1\n0\n2\nS\nP\nI\nr\nu\ne\nN\n\n,\nr\ne\nv\nu\no\nc\nn\na\nV\n\ns\n\n(\n\n.\n\nSupplementary Information:\n\nA Unifying Framework for Spectrum-Preserving\n\nGraph Sparsi\ufb01cation and Coarsening\n\nLee M. Gunderson\n\nDepartment of Astrophysical Sciences\n\nPrinceton University\n\nPrinceton, NJ, 08540, USA\n\nleeg@princeton.edu\n\nM\nR\ny\nb\n\nd\ne\nz\ni\nl\na\nm\nr\no\nn\n\n,\n\n\u2020\u02dcG\nL\nd\nn\na\n\n\u2020G\nL\n\nn\ne\ne\nw\nt\ne\nb\n)\n!x\nd\n(\n\nr\no\nr\nr\ne\n\nl\na\nn\no\ni\nt\nc\na\nr\nF\n\n2\n/\n1\n\u2248\n\n|\n\nG\nE\n|\n/\n|\n\n\u02dcG\nE\n\n2\n1\n/\n1\n\u2248\n\n|\n\nG\nE\n|\n/\n|\n\n\u02dcG\nE\n\n|\n\n|\n\ny\nt\ni\ns\nr\ne\nv\ni\nn\nU\nn\no\nt\ne\nc\nn\ni\nr\nP\n\nA\nS\nU\n\n,\n\n0\n4\n5\n8\n0\n\n,\nJ\nN\n\n,\nn\no\nt\ne\nc\nn\ni\nr\nP\n\nu\nd\ne\n.\nn\no\nt\ne\nc\nn\ni\nr\np\n@\ng\ne\ne\nl\n\ny\nt\ni\ns\nr\ne\nv\ni\nn\nU\nn\no\nt\ne\nc\nn\ni\nr\nP\n\nA\nS\nU\n\n,\n0\n4\n5\n8\n0\n,\nJ\nN\n\n,\nn\no\nt\ne\nc\nn\ni\nr\nP\n\nu\nd\ne\n.\nn\no\nt\ne\nc\nn\ni\nr\np\n@\nh\na\ni\nc\ne\ng\n\n2\n/\n1\n\u21e1\n\n|\n\nG\nE\n|\n/\n|\n\n\u02dcG\nE\n\n2\n1\n/\n1\n\u21e1\n\n|\n\nG\nE\n|\n/\n|\n\n\u02dcG\nE\n\n|\n\n|\n\nk\nr\no\nw\ns\ni\nh\nt\n\no\nt\n\ny\nl\nl\na\nu\nq\ne\n\nm\ne\nt\ns\ny\nS\ng\nn\ni\ns\ns\ne\nc\no\nr\nP\nn\no\ni\nt\na\nm\nr\no\nf\nn\nI\n\ne\nc\nn\ne\nr\ne\nf\nn\no\nC\nd\nr\n3\n3\n\n)\ns\n/\nq\n(\n\nn\no\np\nu\n\nd\ne\nt\nc\na\n\n|\n\nG\nV\n|\n/\n|\n\n\u02dcG\nV\n\n|\ng\nn\ni\nn\ni\na\nm\ne\nr\n\n|\n\nG\nE\n|\n/\n|\n\n\u02dcG\nE\n\n|\ng\nn\ni\nn\ni\na\nm\ne\nr\n\nM\nR\ny\nb\n\nd\ne\nz\ni\nl\na\nm\nr\no\nn\n\n,\n\n\u2020\u02dcG\nL\nd\nn\na\n\n\u2020G\nL\n\nn\ne\ne\nw\nt\ne\nb\n\n)\n~x\nd\n(\n\nr\no\nr\nr\ne\n\nl\na\nn\no\ni\nt\nc\na\nr\nF\n\nM\nR\ny\nb\n\nd\ne\nz\ni\nl\na\nm\nr\no\nn\n,\n\nl\n\n\u2020\u02dcG,\nL\nd\nn\na\n\n\u2020G\nL\n\nn\ne\ne\nw\nt\ne\nb\n\n)\n~x\nd\n(\n\nr\no\nr\nr\ne\n\nl\na\nn\no\ni\nt\nc\na\nr\nF\n\n|E\u02dcG|/|EG| \u21e1 1/2\n\n\u2217Both authors contributed equally to this work.\n10\n\n)\ns\n(\n\nKMeans\n\n10\n\n|E\u02dcG|/|EG| \u21e1 1/2\n\n\u2217Both authors contributed equally to this work.\n10\n\nKMeans\n\n|E\u02dcG|/|EG| \u21e1 1/2\n\n\u2217Both authors contributed equally to this work.\n\nHEM\n\n|E\u02dcG|/|EG| \u21e1 1/12\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\nRM\n1\n)\nFractional error (d~x) between\n~x\n\u2020\u02dcG\nL\nd\nL\u2020\u02dcG and L\u2020\n(\nG, normalized by RM\nd\nn\na\n\n|E\u02dcG|/|EG| \u21e1 1/12\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\nRM\nFractional error (d~x) between\nL\u2020\u02dcG and L\u2020\nG, normalized by RM\nLV\n\n|E\u02dcG|/|EG| \u21e1 1/12\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\nRM\nFractional error (d~x) between\nL\u2020\u02dcG and L\u2020\nG, normalized by RM\nLV\n\n\u2020\u02dcG,\nL\nd\nn\na\n\nd\ne\nl\np\nm\na\ns\n\nd\ne\nl\np\nm\na\ns\n\nl\na\nr\nu\ne\nN\nn\no\n\ns\ne\ng\nd\ne\n\ns\ne\ng\nd\ne\n\nr\no\nr\nr\ne\n\ns\ne\ng\nd\ne\n\nd\ne\nt\nu\nb\ni\nr\nt\nn\no\nc\n\n)\n~x\nd\n(\n\nHCM\n\nHCM\n\nHCM\n\nHEM\n\nHEM\n\n0.1\n\n1\n\n1\n\nl\n\nKMeans\n\n0.1\n\n0.01\n\nr\no\nr\nr\ne\n\ns\ne\nd\no\nn\n\u2020G\nf\nOurs\nL\nFractional error (d~x)\no\nn\nn\nbetween L\u2020\u02dcG and L\u2020\no\ne\ne\ni\nt\nw\nc\na\nt\ne\nr\n0.5\nF\nb\n\nLV\n\u2020G\nL\nn\ne\ne\nw\nt\ne\nb\n\nl\na\nn\no\ni\nt\nc\na\nr\nF\n\nG\n\nl\na\nn\no\ni\nt\nc\na\nr\n1.0\nF\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\n\nf\no\nr\ne\nb\nm\nu\nN\n\n0.1\n2\n1\n5\n3\n2\n1.0\n1\n\n0.01\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\n\n0.0\n\ns\nr\no\nh\nt\nu\nFractional error (d~x)\na\nh\nOurs\nbetween L\u2020\u02dcG and L\u2020\nt\no\nB\nG\n\u21e4\n0.5\n\n1\n!\n\n1\n!\n\ne\nw\n\nw\n\ne\n\nOurs\nFractional error (d~x)\nbetween L\u2020\u02dcG and L\u2020\n0.5\n\nG\n\n0.01\n\n0.0\n\n1.0\n\n0.0\n\nFraction of nodes remaining |V\u02dcG|/|VG|\n\nFraction of nodes remaining |V\u02dcG|/|VG|\n\nFraction of nodes remaining |V\u02dcG|/|VG|\n\nFraction of sampled edges acted upon (q/s)\n\nFraction of edges remaining |E\u02dcG|/|EG|\nNumber of sampled edges (s)\n\nFraction of edges remaining |E\u02dcG|/|EG|\nNumber of sampled edges (s)\n\nFraction of edges remaining |E\u02dcG|/|EG|\nNumber of sampled edges (s)\n\nFigure 3: Our algorithm preserves global structure more accurately than other coarsening algorithms.\nWe compare our algorithm (prioritizing node reduction) (Ours) to several existing coarsening algorithms: two\nclassical methods for graph coarsening (heavy-edge matching (HEM) [54] and heavy-clique matching (HCM)\n[54]), and two recently proposed spectral coarsening algorithms (local variation by Loukas [55] (LV) and the k-\nFraction of sampled edges acted upon (q/s)\nmeans method by Jin & Jaja [21] (KMeans)). We ran the comparisons using three datasets: Left: a transportation\nnetwork of European cities and roads between them (1039 nodes and 1305 edges) from [56]; Middle: a triangular\nmesh of the text \u201cNeurIPS\u201d (567 nodes and 1408 edges); and Right: a weighted social network of face-to-face\ninteractions during an exhibition on infectious diseases, with initial edge weights proportional to the number of\ninteractions between pairs of people (410 nodes and 2765 edges) from [57]. For all algorithms considered, we\ncompute the hyperbolic distance d~x(fractional error) between L\u2020\neigenvector of the original Laplacian (associated with global structure). To provide a baseline, we plot their mean\nfractional error normalized by that obtained by random matching (RM) [54] for the same level of coarsening.\nShading denotes one standard deviation about the mean for 16 runs of the algorithms.\n\neG,l~x, where ~xis the smallest non-trivial\n\nFraction of sampled edges acted upon (q/s)\n\nG~xand L\u2020\n\n\u21e4Both authors contributed equally to this work.\n\n\u21e4Both authors contributed equally to this work.\n\n\u21e4Both authors contributed equally to this work.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n9\n\n\fAcknowledgments\n\nWe would like to thank Matthew de Courcy-Ireland for insightful discussions\nand Ashlyn Maria Bravo Gundermsdorff for unique perspectives.\n\nReferences\n[1] Chazelle, B. The Discrepancy Method: Randomness and Complexity (Cambridge University Press, 2000).\n[2] Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A. & Vandergheynst, P. Geometric deep learning: Going\n\nbeyond Euclidean data. IEEE Signal Processing Magazine 34, 18\u201342 (2017).\n\n[3] Bruna, J., Zaremba, W., Szlam, A. & LeCun, Y. Spectral networks and locally connected networks on\n\ngraphs. International Conference on Learning Representations (2014).\n\n[4] Henaff, M., Bruna, J. & LeCun, Y.\n\narXiv:1506.05163 (2015).\n\nDeep convolutional networks on graph-structured data.\n\n[5] Batson, J., Spielman, D. A., Srivastava, N. & Teng, S.-H. Spectral sparsi\ufb01cation of graphs: Theory and\n\nalgorithms. Communications of the ACM 56, 87\u201394 (2013).\n\n[6] Spielman, D. A. & Teng, S.-H. Spectral sparsi\ufb01cation of graphs. SIAM Journal on Computing 40,\n\n981\u20131025 (2011).\n\n[7] Cohen, M. B. et al. Solving SDD linear systems in nearly m log1/2 n time. Proceedings of the 46th Annual\n\nACM Symposium on Theory of Computing (2014).\n\n[8] Le Gall, F. Powers of tensors and fast matrix multiplication. Proceedings of the 39th International\n\nSymposium on Symbolic and Algebraic Computation (2014).\n\n[9] Safro, I., Sanders, P. & Schulz, C. Advanced coarsening schemes for graph partitioning. Journal of\n\nExperimental Algorithmics 19, 1.1\u20131.24 (2015).\n\n[10] Harel, D. & Koren, Y. A fast multi-scale method for drawing large graphs. Graph Drawing 183\u2013196\n\n(2001).\n\n[11] Simonovsky, M. & Komodakis, N. Dynamic edge-conditioned \ufb01lters in convolutional neural networks on\n\ngraphs. IEEE Conference on Computer Vision and Pattern Recognition 3693\u20133702 (2017).\n\n[12] Lafon, S. & Lee, A. Diffusion maps and coarse-graining: A uni\ufb01ed framework for dimensionality reduction,\ngraph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine\nIntelligence 28, 1393\u20131403 (2006).\n\n[13] Chen, H., Perozzi, B., Hu, Y. & Skiena, S. HARP: Hierarchical representation learning for networks. 32nd\n\nAAAI Conference on Arti\ufb01cial Intelligence (2018).\n\n[14] Hirani, A., Kalyanaraman, K. & Watts, S. Graph Laplacians and least squares on graphs. IEEE International\n\nParallel and Distributed Processing Symposium Workshop 812\u2013821 (2015).\n\n[15] Negahban, S., Oh, S. & Shah, D. Iterative ranking from pairwise comparisons. Advances in Neural\n\nInformation Processing Systems 2474\u20132482 (2012).\n\n[16] Solis, R., Borkar, V. S. & Kumar, P. A new distributed time synchronization protocol for multihop wireless\n\nnetworks. Proceedings of the 45th IEEE Conference on Decision and Control 2734\u20132739 (2006).\n\n[17] Fiedler, M. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal 23, 298\u2013305 (1973).\n[18] Christiano, P., Kelner, J. A., Madry, A., Spielman, D. A. & Teng, S.-H. Electrical \ufb02ows, Laplacian systems,\nand faster approximation of maximum \ufb02ow in undirected graphs. Proceedings of the 43rd Annual ACM\nSymposium on Theory of Computing 273\u2013282 (2011).\n\n[19] Kyng, R., Rao, A., Sachdeva, S. & Spielman, D. A. Algorithms for Lipschitz learning on graphs.\n\nConference on Learning Theory (2015).\n\n[20] Spielman, D. A. & Srivastava, N. Graph sparsi\ufb01cation by effective resistances. SIAM Journal on Computing\n\n40, 1913\u20131926 (2011).\n\n[21] Jin, Y. & JaJa, J. F. Network summarization with preserved spectral properties. arXiv:1802.04447 (2018).\n[22] Purohit, M., Prakash, B. A., Kang, C., Zhang, Y. & Subrahmanian, V. Fast in\ufb02uence-based coarsening for\nlarge networks. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery\nand Data Mining 1296\u20131305 (2014).\n\n[23] Loukas, A. & Vandergheynst, P. Spectrally approximating large graphs with smaller graphs. International\n\nConference on Machine Learning 80, 3237\u20133246 (2018).\n\n10\n\n\f[24] Zhao, Z., Wang, Y. & Feng, Z. Nearly-linear time spectral graph reduction for scalable graph partitioning\n\nand data visualization. arXiv:1812.08942 (2018).\n\n[25] Teng, S.-H. The Laplacian paradigm: Emerging algorithms for massive graphs. Theory and Applications\n\nof Models of Computation 2\u201314 (2010).\n\n[26] Chandra, A. K., Raghavan, P., Ruzzo, W. L., Smolensky, R. & Tiwari, P. The electrical resistance of a\n\ngraph captures its commute and cover times. Computational Complexity 6, 312\u2013340 (1996).\n\n[27] Monnig, N. D. & Meyer, F. G. The resistance perturbation distance: A metric for the analysis of dynamic\n\nnetworks. Discrete Applied Mathematics 236, 347\u2013386 (2018).\n\n[28] Cohen, M. B. et al. Almost-linear-time algorithms for Markov chains and new spectral primitives for\ndirected graphs. Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing\n410\u2013419 (2017).\n\n[29] Oxley, J. G. Matroid Theory, vol. 3 (Oxford University Press, USA, 2006).\n[30] Ranjan, G., Zhang, Z.-L. & Boley, D. Incremental computation of pseudoinverse of Laplacian. Lecture\n\nNotes in Computer Science 729\u2013749 (2014).\n\n[31] Herbster, M., Pontil, M. & Wainer, L. Online learning over graphs. Proceedings of the 22nd International\n\nConference on Machine Learning 305\u2013312 (2005).\n\n[32] Gentile, C., Herbster, M. & Pasteris, S. Online similarity prediction of networked data from known and\n\nunknown graphs. Conference on Learning Theory 662\u2013695 (2013).\n\n[33] Van Mieghem, P., Devriendt, K. & Cetinay, H. Pseudoinverse of the Laplacian and best spreader node in a\n\nnetwork. Physical Review E 96, 032311 (2017).\n\n[34] Ranjan, G. & Zhang, Z.-L. Geometry of complex networks and topological centrality. Physica A: Statistical\n\nMechanics and its Applications 392, 3833\u20133845 (2013).\n\n[35] Saerens, M., Fouss, F., Yen, L. & Dupont, P. The principal components analysis of a graph, and its\n\nrelationships to spectral clustering. European Conference on Machine Learning 371\u2013383 (2004).\n\n[36] Pirotte, A., Renders, J.-M., Saerens, M. & Fouss, F. Random-walk computation of similarities between\nnodes of a graph with application to collaborative recommendation. IEEE Transactions on Knowledge &\nData Engineering 355\u2013369 (2007).\n\n[37] Riedel, K. S. A Sherman\u2013Morrison\u2013Woodbury identity for rank augmenting matrices with application to\n\ncentering. SIAM Journal on Matrix Analysis and Applications 13, 659\u2013662 (1992).\n\n[38] Meyer, C. D., Jr. Generalized inversion of modi\ufb01ed matrices. SIAM Journal on Applied Mathematics 24,\n\n315\u2013323 (1973).\n\n[39] Woodbury, M. A.\n\nInverting Modi\ufb01ed Matrices. Memorandum Rept 42, Statistical Research Group\n\n(Princeton University, Princeton, NJ, 1950).\n\n[40] Koren, Y., Carmel, L. & Harel, D. ACE: A fast multiscale eigenvectors computation for drawing huge\n\ngraphs. IEEE Symposium on Information Visualization 137\u2013144 (2002).\n\n[41] Hendrickson, B. & Leland, R. W. A multilevel algorithm for partitioning graphs. Proceedings of the 1995\n\nACM/IEEE Conference on Supercomputing 95, 1\u201314 (1995).\n\n[42] Ron, D., Safro, I. & Brandt, A. Relaxation-based coarsening and multiscale graph organization. SIAM\n\nJournal on Multiscale Modeling & Simulation 9, 407\u2013423 (2011).\n\n[43] Kyng, R., Pachocki, J., Peng, R. & Sachdeva, S. A framework for analyzing resparsi\ufb01cation algorithms.\n\nProceedings of the 38th Annual ACM-SIAM Symposium on Discrete Algorithms 2032\u20132043 (2017).\n\n[44] Lee, Y. T. & Sun, H. An SDP-based algorithm for linear-sized spectral sparsi\ufb01cation. Proceedings of the\n\n49th Annual ACM SIGACT Symposium on Theory of Computing 678\u2013687 (2017).\n\n[45] Ausiello, G. et al. Complexity and Approximation: Combinatorial Optimization Problems and their\n\nApproximability Properties (Springer Science & Business Media, 2012).\n\n[46] Satuluri, V., Parthasarathy, S. & Ruan, Y. Local graph sparsi\ufb01cation for scalable clustering. In Proceedings\n\nof the 2011 ACM SIGMOD International Conference on Management of data, 721\u2013732 (ACM, 2011).\n[47] Ahn, K. J., Guha, S. & McGregor, A. Graph sketches: sparsi\ufb01cation, spanners, and subgraphs.\n\nIn\nProceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems,\n5\u201314 (ACM, 2012).\n\n[48] Fung, W.-S., Hariharan, R., Harvey, N. J. & Panigrahi, D. A general framework for graph sparsi\ufb01cation.\n\nSIAM Journal on Computing 48, 1196\u20131223 (2019).\n\n[49] Koutis, I. & Xu, S. C. Simple parallel and distributed algorithms for spectral graph sparsi\ufb01cation. ACM\n\nTransactions on Parallel Computing (TOPC) 3, 14 (2016).\n\n11\n\n\f[50] Kapralov, M., Lee, Y. T., Musco, C., Musco, C. P. & Sidford, A. Single pass spectral sparsi\ufb01cation in\n\ndynamic streams. SIAM Journal on Computing 46, 456\u2013477 (2017).\n\n[51] Gleiser, P. M. & Danon, L. Community structure in jazz. Advances in Complex Systems 6, 565\u2013573 (2003).\n[52] Jarrell, T. A. et al. The connectome of a decision-making neural network. Science 337, 437\u2013444 (2012).\n[53] Stehl\u2019e, J. et al. High-resolution measurements of face-to-face contact patterns in a primary school. PloS\n\nOne 6, e23176 (2011).\n\n[54] Karypis, G. & Kumar, V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM\n\nJournal on Scienti\ufb01c Computing 20, 359\u2013392 (1998).\n\n[55] Loukas, A. Graph reduction with spectral and cut guarantees. arXiv:1808.10650v2 (2018).\n[56]\n\n\u02d8Subelj, L. & Bajec, M. Robust network community detection using balanced propagation. The European\nPhysical Journal B 81, 353\u2013362 (2011).\n\n[57] Isella, L. et al. What\u2019s in a crowd? Analysis of face-to-face behavioral networks. Journal of Theoretical\n\nBiology 271, 166\u2013180 (2011).\n\n[58] Bell, W., Olson, L. & Schroder, J. Pyamg: Algebraic multigrid solvers in python v3. 0, 2015. URL\n\nhttp://www. pyamg. org. Release 3 (2015).\n\n12\n\n\f", "award": [], "sourceid": 4192, "authors": [{"given_name": "Gecia", "family_name": "Bravo Hermsdorff", "institution": "Princeton University"}, {"given_name": "Lee", "family_name": "Gunderson", "institution": "Princeton University"}]}