{"title": "Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 88, "page_last": 98, "abstract": "For the purpose of learning on graphs, we hunt for a graph feature representation that exhibit certain uniqueness, stability and sparsity properties while also being amenable to fast computation. This leads to the discovery of family of graph spectral distances (denoted as FGSD) and their based graph feature representations, which we prove to possess most of these desired properties. To both evaluate the quality of graph features produced by FGSD and demonstrate their utility, we apply them to the graph classification problem. Through extensive experiments, we show that a simple SVM based classification algorithm, driven with our powerful FGSD based graph features, significantly outperforms all the more sophisticated state-of-art algorithms on the unlabeled node datasets in terms of both accuracy and speed; it also yields very competitive results on the labeled datasets - despite the fact it does not utilize any node label information.", "full_text": "Hunt For The Unique, Stable, Sparse And Fast\n\nFeature Learning On Graphs\n\nSaurabh Verma\n\nDepartment of Computer Science\n\nUniversity of Minnesota, Twin Cities\n\nverma@cs.umn.edu\n\nZhi-Li Zhang\n\nDepartment of Computer Science\n\nUniversity of Minnesota, Twin Cities\n\nzhang@cs.umn.edu\n\nAbstract\n\nFor the purpose of learning on graphs, we hunt for a graph feature representation\nthat exhibit certain uniqueness, stability and sparsity properties while also being\namenable to fast computation. This leads to the discovery of family of graph\nspectral distances (denoted as FGSD) and their based graph feature representations,\nwhich we prove to possess most of these desired properties. To both evaluate\nthe quality of graph features produced by FGSD and demonstrate their utility, we\napply them to the graph classi\ufb01cation problem. Through extensive experiments, we\nshow that a simple SVM based classi\ufb01cation algorithm, driven with our powerful\nFGSD based graph features, signi\ufb01cantly outperforms all the more sophisticated\nstate-of-art algorithms on the unlabeled node datasets in terms of both accuracy\nand speed; it also yields very competitive results on the labeled datasets \u2013 despite\nthe fact it does not utilize any node label information.\n\nIntroduction\n\n1\nIn the past decade, there has been tremendous interests in learning on collection of graphs for\nvarious purposes, in particular for solving graph classi\ufb01cation problem. Several applications of graph\nclassi\ufb01cation can be found in the domain of bioinformatics, or chemoinformatics, or social networks.\nA fundamental question inherent in graph classi\ufb01cation is determining whether two graph structures\nare identical, i.e., the graph isomorphism problem, which was not known to belong either P or NP\nuntil recently. In the seminal paper [2], Babai shows that the graph isomorphism can be solved in\nquasipolynomial time; while of enormous theoretical sign\ufb01cance, the implication of this result in\ndeveloping practical algorithms is still unclear. Fortunately, in graph classi\ufb01cation problems one\nis more interested in whether two graphs have \u201csimilar\u201d (as opposed to identical) structures. This\nallows for potentially much faster (yet not fully explored) algorithms to be successfully applied to the\ngraph classi\ufb01cation while also accounting for graph isomorphism. One approach to get around both\nthese intimately tied problems together is to learn an explicit graph representation that is invariant\nunder graph isomorphism1 but also useful for extracting graph features.\nMore speci\ufb01cally, given a graph G, we are interested in learning a graph representation (or spectrum),\nR : G \u2192 (g1, g2, ..., gr), that captures certain inherent \u201catomic\u201d (unique) sub-structures of the graph\nand is invariant under graph isomorphism (i.e., two isomorphic graphs yield the same representation).\nSubsequently, we want to learn a feature function F : R \u2192 (f1, f2, ..., fd) from R such that the\ngraph features {fi}d\ni=1 can be employed for solving the graph classi\ufb01cation problem. However, in\nmachine learning, not much attention has been given towards learning R and most of the previous\nstudies have focused on designing graph kernels and thus bypasses computing any explicit graph\nrepresentation. The series of papers (19, 20, 22) by Kondor et al. are some of the \ufb01rst (and few) that\nare concerned with constructing explicit graph features \u2013 using a group theoretic approach \u2013 that are\ninvariant to graph isomorphism and can be successfully applied to the graph classi\ufb01cation problem.\n\n1That is, invariant under permutation of graph vertex labels.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFigure 1: Graph Generation Model: Graph\nspectrum is assumed to be encoded in pairwise\nnode distances which are generated from some\ndistribution. Nodes connect together to form\na graph in such a way that pairwise node dis-\ntances are preserved (eg. ( \u2013 ) node-pair with\ndistance 0.75 is preserved even though they are\nnot directly connected).\n\nPairs of graph nodes\nare generated from an\nunknown distribution\n\nNodes connect together\nto form a graph such that\nthe pairwise distances are\n\npreserved\n\nInspired by such an approach, we also explicitly deal with learning a graph representation R and\nshow how to derive graph features F from R.\nOur approach is quite novel and builds upon the following assumption: Graph atomic structure (or\nspectrum) is encoded in the multiset2 of all node pairwise distances. Figure 1 shows the complete\ngraph generation model based on this premise. The origin of our assumption can be traced back to\nthe study of homometric structure, i.e, structures with the same multiset of interatomic distances [28].\nOn graphs, two vertex sets are called non-homometric if the multisets of distances determined by\nthem are different. (It is an unexplored problem whether there exists any distance metric on the graph\nfor which two vertex sets of non-isomorphic graphs are always non-homometric; but the converse\nis not true, an example is the shortest path distance.) This argument provides the validity of our\nassumption that the graph atomic structure is being encoded in pairwise distances. Further, we have\nempirically found that the biharmonic distance [23] multisets are unique for at-least upto 10-vertex\nsize simple connected graphs (\u223c 11 million graphs) and it remains as an open problem to show a\ncontradictory example. Moreover, we show that for a certain distance function Sf on the graph, one\ncan uniquely recover all the graph intrinsic properties while also being able to capture both local &\nglobal information about the graph. Thus, we de\ufb01ne R as the multiset of node pairwise distances\nbased on some distance function Sf , which will be the main focus of this paper.\nWe hunt for such a family of distances on graphs and its core members for which most of the properties\nof an ideal graph spectrum (see Section 3) hold, including invariance under graph isomorphism and\nthe uniqueness property. This hunt leads us to the discovery of a family of graph spectral distance\n(FGSD) and one would \ufb01nd harmonic (effective resistance) and biharmonic distance on graphs as the\nsuitable members of this family for graph representation R. Finally, for solving graph classi\ufb01cation\n(where graphs can be of different nodes sizes), we simply construct F feature vector from the\nhistogram of R (a multiset) and feed it to a standard classi\ufb01cation algorithm.\nOur current work focuses only on unlabeled graphs but can be extended to labeled graphs using the\nsame strategy as in shortest path kernel [4]. Nevertheless, our comprehensive results show that FGSD\ngraph features are powerful enough to signi\ufb01cantly outperform the current state-of-art algorithms on\nunlabeled datasets and are very competitive on labeled datasets \u2013 despite the fact that they do not\nutilize any node label information. In summary, the major contributions of our paper are:\n\u2022 Introducing a novel & conceptually simple yet powerful graph feature representation (or spectrum)\n\u2022 Discovering FGSD as a well-suited candidate for our proposed graph spectrum.\n\u2022 Proving that FGSD based graph features exhibit certain uniqueness, stability, sparsity properties\nand can be computationally fast with O(N 2) complexity, where N is the number of graph nodes\nin a graph.\n\nbased on the multiset of node pairwise distances.\n\n\u2022 Showing the superior performance of FGSD based graph features on graph classi\ufb01cation tasks.\n\n2 Related Work\n\nPrevious studies on graph classi\ufb01cation can be grouped into three main categories. The \ufb01rst category\nis concerned with constructing explicit graph features such as the skew spectrum 20 and its successor,\ngraphlet spectrum [22] based on group-theoretic approaches. Both are computational expensive. The\nsecond and more popular category deals with designing graph kernels, among which, strong ones\nare graphlets [30], random walks or shortest paths [4], neighborhood subgraph pairwise distance\n\n2A set in which an element can occur multiple of times.\n\n2\n\n0.550.800.750.800.750.55\fkernel [9], Weisfeiler-Lehman kernel [31], deep graph kernels [34], graph invariant kernels [27] and\nmultiscale Laplacian graph kernel [21]. A tangential work [24] related to constructing features based\non atoms 3D space coordinates rather than operating on a graph structure, can be also considered in\nthis category. Our effort on learning R from FGSD can be seen as a part of \ufb01rst category, since we\nexplicitly investigate numerous properties of our proposed graph spectrum. While, extracting F from\nR is more inspired from the work of graph kernels.\nThe third category involves developing convolutional neural networks (CNNs) for graphs, where\nseveral models have been proposed to de\ufb01ne convolution networks on graphs. The most common\nmodel is based on generalizing convolutional networks through the graph Fourier transform via\na graph Laplacian [7, 16]. Defferrard et al. [11] extend this model by constructing fast localized\nspectral \ufb01lters for ef\ufb01cient graph coarsening as a pooling operation for CNNs on graphs. Some\nvariants of these models were considered in [18, 1], where the output of each neural network layer is\ncomputed using a propagation rule that takes the graph adjacency matrix and node feature vectors into\naccount while updating the network weights. In [12], the convolution operation is de\ufb01ned by hashing\nof local graph node features along with the local structure information. Likewise, in [26] local node\nsequences are \u201ccanonicalized\u201d to create receptive \ufb01elds and then fed into a 1D convolutional neural\nnetwork for classi\ufb01cation. Among the aforementioned graph CNNs models, only those in [26, 1, 12]\nare relevant to this work since they are designed to account for graphs of different sizes, while others\nassume a global structure where the one-to-one correspondence of input vertices are already known.\n\n3 Family of Graph Spectral Distances and Graph Spectrum\n\nBasic Setup and Notations: Consider a weighted, undirected (and connected) graph G = (V, E, W )\nof size N = |V |, where V is the vertex set, E the edge set (with no self-loops) and W = [wxy] the\nnonnegative weighted adjacency matrix. The standard graph Laplacian is de\ufb01ned as L = D \u2212 W ,\nwhere D is the degree matrix. It is semi-de\ufb01nite and admits an eigen-decomposition of the form\nL = \u03a6\u039b\u03a6T , where \u039b = diag[\u03bbk] is the diagonal matrix formed by the eigenvalues \u03bb0 = 0 <\n\u03bb1 \u2264 \u00b7\u00b7\u00b7 \u2264 \u03bbN\u22121, and \u03a6 = [\u03c60, ..., \u03c6N\u22121] is an orthogonal matrix formed by the corresponding\neigenvectors \u03c6k\u2019s. For x \u2208 V , we use \u03c6k(x) to denote the x-entry value of \u03c6k. Let f be an\narbitrary nonnegative (real-analytical) function on R+ with f (0) = 0, 1 = [1, .., 1]T is the all-one\nvector and J = 11T . Then, using slight abuse of notion, we de\ufb01ne f (L) := \u03a6f (\u039b)\u03a6T and\nf (\u039b) := diag[f (\u03bbk)]. Also, f (L)xy represent xy-entry value in f (L) matrix. Lastly, I is identity\nmatrix and L+ is Moore-Penrose Pseudoinverse of L.\nFGSD De\ufb01nition: For x, y \u2208 V , we de\ufb01ne the f-spectral distance between x and y on G as follows:\n\nN\u22121(cid:88)\n\nSf (x, y) =\n\nf (\u03bbk)(\u03c6k(x) \u2212 \u03c6k(y))2\n\n(1)\n\nk=0\n\nWe will refer to {Sf (x, y)|f}, as the family of graph spectral distances. Without loss of generality,\nwe assume that the derivative f(cid:48)(\u03bb) (cid:54)= 0 for \u03bb > 0, and then by Lagrange Inversion Theorem [33], f\nis invertible and thus bijective. For reasons that will be clear shortly, we are particularly interested in\ntwo sub-families of FGSD, where f is monotonic function (increasing or decreasing) of \u03bb. Depending\non the sub-family, the f-spectral distance can capture different type of information in a graph.\nFGSD Elements Encode Local Structure Information: For f (\u03bb) = \u03bbp (p \u2265 1), one can show\nthat Sf (x, y) = (Lp)xx + (Lp)yy \u2212 2(Lp)xy. If the shortest path from x to y is larger than p, then\n(Lp)xy = 0. This is based on the fact (Lp)xy captures only p-hop local neighborhood information [32]\non the graph. Hence, broadly for an increasing function of f (e.g., a polynomial function of degree\natleast p \u2265 1), Sf (x, y) captures the local structure information.\nFGSD Elements Encode Global Structure Information: On the other hand, f as a decreasing\nfunction yields Sf (x, y) = ((L+)p)xx + (((L+)p)yy \u2212 2((L+)p)xy. This captures the global\ninformation, since the xy-entry of L+ = (L + J\nN accounts for all paths from node x to y\n(and so does (L+)p). Several known globally aware graph distances can be derived from this FGSD\nsub-family. For f (\u03bb) = 1/\u03bb where \u03bb > 0, Sf (x, y) is the harmonic (or effective resistance) distance.\nMore generally, for f (\u03bb) = 1/\u03bbp, p \u2265 1, Sf (x, y) is the polyharmonic distance (p = 2 is biharmonic\ndistance). Lastly f (\u03bbk) = e\u22122t\u03bbk yields Sf (x, y) that is equivalent to the heat diffusion distance.\n\nN )\u22121 \u2212 J\n\n3\n\n\fwhere \u03c8f,x(y) =(cid:80)N\u22121\n\nFGSD Graph Signal Processing Point of View: From graph signal processing perspective, Sf (x, y)\nis a distance computed based on spectral \ufb01lter properties [32], where f (\u03bb) act as a band-pass \ufb01lter. Or,\nit can be viewed in terms of spectral graph wavelets [15] as: Sf (x, y) = \u03c8f,x(x)+\u03c8f,y(y)\u22122\u03c8f,x(y),\nk=0 f (\u03bbk)\u03c6k(x)\u03c6k(y) (and \u03c8f,x(x), \u03c8f,y(y) are similarly de\ufb01ned) is a spectral\ngraph wavelet of scale 1, centered at node x and f (\u03bb) act as a graph wavelet kernel.\nFGSD Based Graph Spectrum: Using the FGSD based distance matrix Sf = [Sf (x, y)] directly,\ne.g., for graph classi\ufb01cation, requires us being able to solve the graph isomorphism problem ef\ufb01ciently.\nBut no known polynomial time algorithm is available; the best algorithm today theoretically takes\nquasipolynomial time [2]. However, motivated from the study of homometric structure and the fact\nthat each element of FGSD encodes some local or global sub-structure information of the graph,\ninspired us to de\ufb01ne the graph spectrum as R = {Sf (x, y)|\u2200(x, y) \u2208 V }. Thus, comparing two\nR\u2019s implicitly evaluates the sub-structural similarity between two graphs. For instance, R based on\nharmonic distance contains sub-structural properties related to the spanning trees of a graph [29].\nOur main concern in this paper would be choosing an appropriate f (\u03bb) function in order to generate\nR which can exhibit ideal graph spectrum properties as discuss below. Also, we want F to inherent\nthese properties directly from R, which is made possible by de\ufb01ning F as the histogram of R.\nFinally, we lay down those important fundamental properties of an ideal graph spectrum that one\nwould like R & F to obey on a graph G = (V, E, W ).\n1. R & F must be invariant under any permutation \u03c0 of vertex labels. That is, R(G) = R(G\u03c0) or\n2. R & F must have a unique representation for non-isomorphic graphs. That is, R(G1) (cid:54)= R(G2)\nfor any two non-isomorphic graphs G1 and G2.\n3. R & F must be stable under small perturbation. That is, if graph G2(W2) = G1(W1 + \u2206), for a\nsmall perturbation norm matrix (cid:107)\u2206(cid:107), then the norm of (cid:107)F (G2) \u2212 F (G1)(cid:107) should also be small\nor bounded in order to maintain the stability.\n4. F must be sparse (if high-dimensional) for all the sparsity reasons desirable in machine learning.\n5. R & F must be computationally fast for ef\ufb01ciency and scalability purposes.\n\nR(W ) = R(P W P T ) for any permutation matrix P .\n\n4 Uniqueness of Family of Graph Spectral Distances and Graph Spectrum\nWe \ufb01rst start with exploring the graph invariance and uniqueness properties of R & F based on FGSD.\nUniqueness is a very important (desirable) property, since it will determine whether the elements of\nR set are complete (i.e., how good they are), in the sense whether R is suf\ufb01cient enough to recover\nall the intrinsic structural properties of a graph. We state the following important uniqueness theorem.\nTheorem 1 (Uniqueness of FGSD) 3 The f-spectral distance matrix Sf = [Sf (x, y)] uniquely\ndetermines the underlying graph (up to graph isomorphism), and each graph has a unique Sf (up to\npermutation). More precisely, two undirected, weighted (and connected) graphs G1 and G2 have the\nsame FGSD based distance matrix up to permutation, i.e., SG1 = PSG2P T for some permutation\nmatrix P , if and only if the two graphs are isomorphic.\nImplications: Our proof is based on establishing the following key relationship: f (L) =\n\u2212 1\n2 (I \u2212 1\nN J). Since f is bijective, one can uniquely recover \u039b from f (\u039b). One\nof the consequence of Theorem 1 is that the R based on multiset of FGSD is invariant under the\npermutation of graph vertex labels and thus, satis\ufb01es the graph invariance property. Also, F will\ninherent this property since R remains the same. Unfortunately, it is possible that the multiset of\nsome FGSD members can be same for non-isomorphic graphs (otherwise, we would have a O(N 2)\npolynomial time algorithm for solving graph isomorphism problem!). However, it is known that\nall non-isomorphic graphs with less than nine vertices have unique multisets of harmonic distance.\nWhile, for nine & ten vertex (simple) graphs, we have exactly 11 & 49 pairs of non-isomorphic\ngraphs (out of total 274,668 & 12,005,168 graphs) with the same harmonic spectra. These examples\nshow that there are signi\ufb01cantly very low numbers of non-unique harmonic spectrums. Moreover, we\nempirically found that the biharmonic distance has all unique multisets for at-least upto ten vertices\n(\u223c 11 million graphs) and we couldn\u2019t \ufb01nd any non-isomorphic graphs with the same biharmonic\nmultisets. Further, we have the following theorem regarding the uniqueness of R.\n3Variant of Theorem 1 also hold true for the normalized graph Laplacian Lnorm = D\u2212 1\n\nN J)Sf (I \u2212 1\n\n2 LD\u2212 1\n2 .\n\n4\n\n\fTheorem 2 (Uniqueness of Graph Harmonic Spectrum) Let G = (V, E, W ) be a graph of size\n|V | with an unweighted adjacency matrix W . Then, if two graphs G1 and G2 have the same number\nof nodes but different number of edges, i.e, |V1| = |V2| but |E1| (cid:54)= |E2|, then with respect to the\nharmonic distance multiset, R(G1) (cid:54)= R(G2).\nImplications: Our proof relies on the fact that the effective resistance distance is a monotone function\nwith respect to adding or removing edges. It shows that R based on some FGSD members specially\nharmonic distance is atleast theoretically known to be unique to a certain degree. F also inherent this\nproperty, fully under the condition h \u2192 0 (or for small enough h), where h is the histogram binwidth.\nOverall the certain uniqueness of R along with containing local or global structural properties in its\neach element dictate that the R is capable enough to serve as the complete powerful Graph Spectrum.\n\n4.1 Unifying Relationship Between FSGD and Graph Embedding and Dimension Reduction\n\nBefore delving into other properties, we uncover an essential relationship between FGSD and Graph\nEmbedding in Euclidean space and Dimension Reduction techniques. Let f (\u039b) 1\n2 . Then, the f-spectral distance can be expressed as Sf (x, y) = ||\u03a8(x) \u2212\nand de\ufb01ne \u03a8 = \u03a6f (\u039b) 1\n\u03a8(y)||2\n2, where \u03a8(x) is the xth row of \u03a8. Thus, \u03a8 represents an Euclidean embedding of G where\neach node x is represented by the vector \u03a8(x). Now for instance, if f (\u03bb) = 1, then by taking the \ufb01rst\np columns of \u03a8 yields embedding exactly equal to Laplacian Eigenmap (LE) [3] based on random\nwalk graph Laplacian (Lrw = D\u22121L). For f (\u03bb) = \u03bb2t and L = D\u22121W , we get the Diffusion\nMap [25]. Thus, f (\u03bb) function has one-to-one correspondence relationship with spectral dimension\nreduction techniques. We have the following theorem concerning Graph Embedding based on FGSD.\n\n2 = diag[(cid:112)f (\u03bbk)]\n\nTheorem 3 (Uniqueness of FGSD Graph Embedding) Each graph G can be isometrically embed-\nded into a Euclidean space using FGSD as an isometric measure. This isometric embedding is unique,\n(cid:48)\nif all the eigenvalues of G Laplacian are distinct and there does not exist any other graph G\nwith\nLaplacian eigenvectors \u03c6\n\nj)\u03c6k, \u2200k \u2208 [1, N \u2212 1].\n\nf (\u03bbj)/f (\u03bb\n\n(cid:113)\n\n(cid:48)\nk =\n\n(cid:48)\n\nImplications: The above theorem shows that FGSD provides a unique way to embed the graph\nvertices into Euclidean space possibly without loosing any structural information of the graph. This\ncould potentially serve as a cogent tool to convert an unstructured data into a structure data (similar to\nstructure2vec 10 or node2vec 14 tool) which can enable us to perform standard inference tasks\nin Euclidean space. Note that the uniqueness condition is quite strict and holds for co-spectral graphs.\nIn short, we have following uniqueness relationship, where \u03a8 is the Euclidean embedding of G graph.\n\nSf\n\nf (LG )\n\nLG\n\nf (LG )\n\n\u03a8G\n\nStability of Family of Graph Spectral Distances and Graph Spectrum\n\n5\nNext, we hunt for the stable members of the FGSD that are robust against the perturbation or noise\nin the datasets. Speci\ufb01cally, we will look at the stability of R and F based on FGSD from f (\u03bb)\nperspective by \ufb01rst analyzing its in\ufb02uence on a single edge perturbation (or in other words analyzing\nrank one modi\ufb01cation of Laplacian matrix). This will lead us to \ufb01nd the stable members and what\nrestrictions we need to impose on f (\u03bb) function for stability. We will further show that f-spectral\ndistance function also satis\ufb01es the notion of uniform stability [6] in a certain sense. For our analysis,\nwe will restrict f (\u03bb) as a monotone function of \u03bb, for \u03bb > 0. Let (cid:52)w \u2265 0 be the change after\nmodifying w weight on any single edge to w\nTheorem 4 (Eigenfunction Stability of FGSD) Let (cid:52)Sxy be the change in Sf (x, y) distance with\nrespect to (cid:52)w change in weight of any single edge on the graph. Then, (cid:52)Sxy for any vertex pair\n(x, y) is bounded with respect to the function of eigenvalue as follows,\n\non the graph, where (cid:52)w = w\n\n(cid:48) \u2212 w.\n\n(cid:48)\n\n(cid:52)Sxy \u2264 2(cid:0)|f (\u03bbN\u22121 + 2(cid:52)w) \u2212 f (\u03bb1)|\n\nImplications: Since, R = {Sf (x, y)|\u2200(x, y) \u2208 V }, then each element of R is itself bounded\nby (cid:52)Sxy. Now, recall that F is a histogram of R, then F won\u2019t change, if binwdith is large\nenough to accommodate the perturbation i.e., h \u2265 2(cid:52)Sxy \u2200(x, y) assuming all elements of R are\n\n5\n\n\f1 \u2212 1/(\u03bbN\u22121 + 2(cid:52)w)|p|(cid:17)\n\n|p|\n\n1/\u03bb\n\n1\n\n(cid:16)\n\nat the center of their respective histogram bins. Besides h, the other way to make R robust is by\nchoosing a suitable f (\u03bb) function. Lets consider the behavior (cid:52)Sxy on f (\u03bb) = \u03bbp for p > 0. Then,\n\n(cid:1) and as a result, (cid:52)Sxy is an increasing function with respect\n\n(cid:52)Sxy \u2264 2(cid:0)(\u03bbN\u22121 + 2(cid:52)w)p \u2212 \u03bbp\n\nto p which implies that stability decreases with increase in p. For p = 0, stability does not change\nwith respect to \u03bb. While, for p < 0, (cid:52)Sxy \u2264 2\n. Here, (cid:52)Sxy is\na decreasing function with respect to |p|, which implies that stability increases with decrease in p.\nThe results conforms with the reasoning that eigenvectors corresponding to smaller eigenvalues are\nsmoother (i.e., oscillates slowly) than large eigenvectors (corresponding to large eigenvalues) and\ndecreasing p will attenuate the contribution of large eigenvectors, making the f-spectral distance\nmore stable and less susceptible towards perturbation or noise. However, decreasing p too much could\nresult in lost of local information contained in eigenvectors with larger eigenvalues and therefore, a\nbalance needs to be maintained. Overall, Theorem 4 shows that either through suitable h or decreasing\nf (\u03bb) function, stability of R & F can be controlled to satisfy the Ideal Spectrum Property 3.\nInfact, we can further show that Sf (x, y) between any two vertex (x, y) on a graph, with 0 < \u03b1 \u2264\nw \u2264 \u03b2 bounded weights, is tightly bounded to a certain expected value.\nTheorem 5 (Uniform Stability of FGSD) Let E[Sf (x, y)] be the expected value of Sf (x, y) be-\ntween vertex pair (x, y), over all possible graphs with \ufb01xed ordering of N vertices. Then we have,\nwith probability 1 \u2212 \u03b4, where \u03b4 \u2208 (0, 1) and \u03b8 depends upon \u03b1, \u03b2, N.\n\n(cid:114)\n(cid:12)(cid:12)Sf (x, y) \u2212 E[Sf (x, y)](cid:12)(cid:12) \u2264 f (\u03b8)(cid:112)N (N \u2212 1)\n\nlog\n\n1\n\u03b4\n\nImplications: The above theorem is based on the fact (cid:52)Sxy can itself be upper bounded over all\npossible graphs generated on a \ufb01xed ordering of N vertices. This is a very similar condition needed\nfor a learning algorithm to satisfy the notion of uniform stability in order to give generalization\nguarantees. The f-spectral distance function can itself be thought of as a learning algorithm which\nadmits uniform stability (precise de\ufb01nition in supplementary) and indicates a strong stability behavior\nover all possible graphs and further act as a generalizable learning algorithm on the graph. Theorem 5\nalso reveals that the deviation can be minimized by choosing decreasing f (\u03bb) function and it would\n\nbe suitable, if f (\u03bb) grow with O(cid:0)1/(cid:112)N (N \u2212 1)(cid:1) rate in order to maintain stability for large graphs.\n\nSo far, we have narrow down our interest to R & F based on the bijective and decreasing f (\u03bb) func-\ntion for achieving both uniqueness and stability. This eliminates all forms of increasing polynomial\nfunctions as a good choice of f (\u03bb). As a result, we can focus on inverse (or rational) form of polyno-\nmial functions such as polyharmonic distances. A by-product of our analysis results in revealing a\nnew class of stable dimension reduction techniques, possible by scaling Laplacian eigenvectors with\ndecreasing function of f (\u03bb), although such connections have already been known before.\n\n6 Sparsity of Family of Graph Spectral Distances and Graph Spectrum\n\nFigure 2: Figure shows the number of unique\nelements present in R formed by different f-\nspectral distance on all graphs (of |V | = 9,\ntotal 261, 080 graphs). Graph enumeration in-\n\ndices are sorted according to(cid:12)(cid:12)R( 1\n\n\u03bb )(cid:12)(cid:12)G. We can\n\nobserve that f (\u03bb) = 1\n\u03bb increases in form of a\nstep function and lower bounds all other f (\u03bb)\nwith an addition constant. (Best viewed in color\nand when zoom-in.)\n\nSparsity is desirable for both computational and statistical ef\ufb01ciency. In this section, we investigate\nthe sparsity produced in F by choosing different f (\u03bb) functions. Here, sparsity refers to its usual\nde\ufb01nition of \u201chow many zero features are present in F graph feature vector\u201d. Since F is a histogram\nof R, number of non-zero elements in F will always be less than equal to number of unique (or\ndistinct) elements in R. However, due to the lack of any theoretical support, we rely on empirical\nevidence and conjecture the following statement.\n\n6\n\n50000100000150000200000250000GraphEnumerationIndex010203040--R(f(6))--Gf(6)=16f(6)=160:2f(6)=160:5f(6)=161:5f(6)=162f(6)=163\fa. Harmonic distance based graph feature\nmatrix (matrix sparsity= 97.12%). Presence\nof blue dot ( ) indicates feature value > 0.\n\nb. Biharmonic distance based graph feature\nmatrix (matrix sparsity= 94.28%). Presence\nof blue dot ( ) indicates feature value > 0.\n\nc. Harmonic distance based feature matrix\nsparsity shown with respect to per class label.\n\nd. Biharmonic distance based feature matrix\nsparsity shown with respect to per class label.\n\nFigure 3: Feature space for MUTAG (composed of two class sizes 125 & 63): Both harmonic &\nbiharmonic based graph spectrum encodes a sparse high dimensional feature representation F for\ngraphs which can clearly distinguish the two classes as depicted in above sub-\ufb01gures.\n\nConjecture (Sparsity of FGSD Graph Spectrum) For any graph G, let(cid:12)(cid:12)R(f (\u03bb))(cid:12)(cid:12)G represents the\n\nnumber of unique elements present in the multiset of R, computed on an unweighted graph G based\non some monotonic decreasing f (\u03bb) function. Then, the following holds,\n\n+ 2\n\n(cid:16)(cid:12)(cid:12)(cid:12)R(cid:16) 1\n\n(cid:17)\n\n(cid:17)(cid:12)(cid:12)(cid:12) + 2\n\n(cid:17)(cid:12)(cid:12)(cid:12)G\n\n(cid:12)(cid:12)(cid:12)R(f (\u03bb))\n\n(cid:12)(cid:12)(cid:12)G\n\u2265(cid:12)(cid:12)(cid:12)R(cid:16) 1\n(cid:17)(cid:12)(cid:12)(cid:12), we further observe that f (\u03bb) = 1\n\n\u03bb\n\n\u03bb\n\n\u03bb\n\n(cid:12)(cid:12)(cid:12)R(cid:16) 1\n\nThe conjecture is based on the observation that, in the Figure 2,\nlower bounds all\ngiven monotonic decreasing f (\u03bb) along with an addition constant of 2. Same trends are observed\nfor different graph sizes |V |. Interestingly, when graph enumeration indices are sorted according\nto size\n\u03bb increases in the form of a step function. From\nthis conjecture, we can directly conclude that the F based on f (\u03bb) = 1\n\u03bb produce the most sparse\nfeatures because number of unique elements in its R is always less than any other R. Figure 3,\nfurther supports this conjecture which shows the feature space computed for MUTAG dataset in\ncase of harmonic and biharmonic spectrums. However, this raises a question of trade-off between\nmaintaining uniqueness and sparsity, since biharmonic distance multisets are found to be unique\nfor more number of graphs than harmonic distance. Nonetheless, some preliminary experiments\nmeasuring harmonic vs. biharmonic performance on graph classi\ufb01cation (in supplementary), suggest\nthat the sparsity is more favorable than uniqueness since it results in higher classi\ufb01cation accuracy.\n\nFast Computation of Family of Graph Spectral Distances and Spectrum\n\npolynomials) as follows: f (\u03bb) =(cid:80)r\nfrom few lower order terms (Ti\u22121(x), Ti\u22122(x), ..., Ti\u2212c(x)). Then it follows, f (L) =(cid:80)r\n\n7\nFinally, we provide the general recipe of computing any member of FGSD in fast manner. In order to\navoid direct eigenvalue decomposition, we can either perform approximation or leverage structural\nproperties and sparsity of f (L) for ef\ufb01cient exact computation of Sf and thus, R.\nApproximation: Inspired from the spectral graph wavelet work [32], the recipe for approximating\nFGSD is to decompose f (\u03bb) possibly into an approximate polynomial series (for example, chebyshev\ni=0 aiTi(\u03bb) such that Ti(x) can be computed in recursive manner\ni=0 aiTi(L).\nIn this case, the cost of computing will reduce to O(r|E|) for sparse L which is very less expensive,\nsince O(r|E|) (cid:28) O(N 2). But, if f (\u03bb) is an inverse polynomial form of function, then computing\nr ), boils down to ef\ufb01ciently computing (a single) Moore\nf (L) =\nPenrose Pseudo inverse of a matrix.\nEf\ufb01cient Exact Computation: By leveraging f (L) structural properties and its sparsity, we can\nef\ufb01ciently perform exact computation of f (L+) in much more better way than the eigenvalue\n\n(cid:16)(cid:80)r\n\n(cid:17)\u22121\n\ni=0 aiTi(L)\n\n= f (L+\n\n7\n\n0200400600800100012001400160018002000FeatureIndex050100150DataSampleIndex0200400600800100012001400160018002000FeatureIndex050100150DataSampleIndex\fN . Therefore, f (L)l+\n\ndecomposition. We propose such a method 1 which is the generalization of [23] work. We can show\nthat, f (L)f (L+)\u22121 = I \u2212 J\nk and Bk are the kth column of\nf (L+) and B = I \u2212 J\nN matrices, respectively. So, \ufb01rst we can \ufb01nd a particular solution of following\n(sparse) linear system: f (L)x = Bk and then obtain l+\n1T 1 x. The particular solution x can\nbe obtained by replacing any single row and corresponding column of f (L) by zeros, and setting\ndiagonal entry at their intersection to one, and replacing corresponding row of B by zeros. This gives\na (non-singular) sparse linear system which can be solved very ef\ufb01ciently by performing cholesky\nfactorization and back-substitution, resulting in overall O(N 2) complexity as shown in [5]. Beside\nthis, there are few other fast methods to compute Pseudo inverse, particularly given by [17].\n\nk = Bk, where l+\nk = x \u2212 1T x\n\nComplexity\n\nSP [4]\n\nApproximate\n\nWorst-Case\n\n\u2014\n\nO(N 3)\n\nGK[34](k \u2208\n\n{3, 4, 5}) (d \u2264 N )\n\nO(N dk\u22121)\n\nO(N k)\n\nSGS[20]\n\n\u2014\n\nO(N 3)\n\nGS [22](k \u2208\n\n[2, 6])\n\n\u2014\n\nO(N 2+k)\n\nDCNN[1]\n\n\u2014\n\nO(N 2)\n\nMLG[21]\n\n((cid:101)N < N )\nO((cid:101)N 3)\n\nO(N 3)\n\nFGSD\nO(r|E|)\nO(N 2)\n\nTable 1: FGSD complexity comparison with few strong state-of-art algorithms (showing variables\nthat are only dependent on N & |E|). It reveals that the FGSD complexity is better than the most.\n\ni=1, f (\u03bb),\n\nend for\n\nfor i = 1 to M do\n\nAlgorithm 1 Computing R and F based on FGSD.\nInput: Given graphs {Gi = (Vi, Ei, Wi)}M\nnumber of bins b, binwidth h.\nOutput: Ri and Fi \u2200i \u2208 [1, M ].\n\nCompute f (Li) using approx. or exact method 1.\nCompute Si = diag(f (Li))J + Jdiag(f (Li)) \u2212\n2f (Li).\nSet Ri = {Sxy|\u2200(x, y) \u2208 |Vi|}.\nCompute Fi = histogram(Ri, b, h).\n\nAs a result, it leads to a very ef\ufb01cient\nO(r|E|) complexity through approx-\nimation with the worst-case O(N 2)\ncomplexity in exact computation of R.\nTable 1, shows the complexity com-\nparison with other state-of-art meth-\nods. Since, number of elements in R\nare O(N 2), then F is also bounded\nby O(N 2) and thus satis\ufb01es the ideal\ngraph spectrum Property 5. Finally,\nAlgorithm 1 summarizes the complete\nprocedure of computing R & F.\n8 Experiments and Results\nFGSD Graph Spectrum Settings: We chose harmonic distance as an ideal candidate for F. For fast\ncomputation, we adopted our proposed ef\ufb01cient exact computation method 1. And for computing\nhistogram, we \ufb01x binwidth size and set the number of bins such that its range covers all {Ri}M\n1\nelements of M number of graphs. Therefore, we had only one parameter, binwidth size, chosen from\nthe set {0.001, 0.0001, 0.00001}. This results in F feature vector dimension in range 100\u22121000, 000\nwith feature matrix sparsity > 90% in all cases. Our FGSD code is available at github4.\nDatasets: We employed wide variety of datasets considered as benchmark [1, 34, 21, 26] in graph\nclassi\ufb01cation task to evaluate the quality of produce FGSD graph features. We adopted 7 bioinfor-\nmatics datasets: Mutag, PTC, Proteins, NCI1, NCI109, D&D, MAO and 5 social network datasets:\nCollab, REDDIT-Binary, REDDIT-Multi-5K, IMDB-Binary, IMDB-Multi. D&D dataset contains\n691 enzymes and 587 non-enzymes proteins structures. While, MAO dataset contains 38 molecules\nthat are antidepressant drugs and 30 do not. For other datasets, details can be found in [34].\nExperimental Set-up: All experiments were performed on a single Intel-Core i7-4790@3.60GHz\nand 64GB RAM machine. We compare our method with 6 state-of-art Graphs Kernels: Random\nWalk (RW) [13], Shortest Path Kernel (SP) [4], Graphlet Kernel (GK) [30], Weisfeiler-Lehman Kernel\n(WL) [31], Deep Graph Kernels (DGK) [34], Multiscale Laplacian Graph Kernels (MLK) [21]. And\nproposed, 2 recent state-of-art Graph Convolutional Networks: PATCHY-SAN (PSCN) [26],\nDiffusion CNNs (DCNN) [1]. And, 2 strong Graph Spectrums: the Skew Spectrum (SGS) [20],\nGraphlet Spectrum (GS) [22]. We adopt the same procedure from previous works [26, 34] to make a\nfair comparison and used 10-fold cross validation with LIBSVM [8] library to test the classi\ufb01cation\nperformance. Parameters of SVM are independently tuned using training folds data and best average\nclassi\ufb01cation accuracies is reported for each method. We provide node degree as the labeled data for\nalgorithms that do not operate directly on unlabeled data. Further details about parameters selection\nfor baseline methods are present in supplementary materials.\n\n4https://github.com/vermaMachineLearning/FGSD\n\n8\n\n\fDataset (No. Graphs,\nMax. Nodes)\nMUTAG (188, 28)\n\nRW\n[2003]\n\nSP\n\n[2005]\n\nGK\n[2009]\n\nWL\n[2011]\n\nDGK\n[2015]\n\nMLG\nTime) [2016]\n\n(Wall-\n\nDCNN\n[2016]\n\nSGS\n[2008]\n\nFGSD\n\n(Wall-Time)\n\n83.50\n\n87.23\n\n84.04\n\n87.28\n\n86.17\n\n87.23(5s)\n\n66.51\n\n88.61\n\n92.12(0.3s)\n\nPTC (344, 109)\n\n55.52\nPROTEINS (1113, 620) 68.46\nNCI1 (4110, 111)\n\n> D\n\nNCI109 (4127, 111)\n\nD & D (1178, 5748)\n\nMAO (68, 27)\n\n> D\n\n> D\n\n58.72\n\n60.17\n\n55.61\n\n59.88\n\n62.20(18s)\n\n55.79\n\n72.14\n\n71.78\n\n70.06\n\n71.69\n\n71.35(277s)\n\n65.22\n\n\u2014\n\n\u2014\n\n62.80(0.07s)\n\n73.42(5s)\n\n68.15\n\n62.07\n\n77.23\n\n64.40\n\n77.57(620s)\n\n63.10\n\n62.72\n\n79.80(31s)\n\n68.30\n\n62.04\n\n78.43 67.14\n\n75.91(600s)\n\n60.67\n\n62.62\n\n78.84(35s)\n\n> D\n\n75.05\n\n73.76\n\n72.75\n\n77.02(7.5hr) OMR\n\n83.52\n\n90.35\n\n80.88\n\n89.79\n\n87.76\n\n91.17(13s)\n\n76.10\n\n\u2014\n\n\u2014\n\n77.10(25s)\n\n95.58(0.1s)\n\nTable 2: Classi\ufb01cation accuracy on unlabeled bioinformatics datasets. Results in bold indicate all\nmethods with accuracy within range 2.0 from the top result and blue color (for range > 2.0), indicates\nthe new state-of-art result. Green color highlights the best time computation, if it\u2019s 5\u00d7faster (among\nthe mentioned). \u2018OMR\u2019 is out of memory error, \u2018> D\u2019 is computation exceed 24hrs.\n\nDataset\n(Graphs)\nCOLLAB\n(5000)\nREDDIT-B\n(2000)\nREDDIT-M\n(5000)\nIMDB-B\n(1000)\nIMDB-M\n(1500)\n\nGK\n[2009]\n\nDGK\n[2015]\n\nPSCN\n[2016]\n\nFGSD\n\n72.84\n\n73.09\n\n72.60\n\n80.02\n\n77.34\n\n78.04\n\n86.30\n\n86.50\n\n41.01\n\n41.27\n\n49.10\n\n47.76\n\n65.87\n\n66.96\n\n71.00\n\n73.62\n\n43.89\n\n44.55\n\n45.23\n\n52.41\n\nTable 3: Classi\ufb01cation accuracy on social\nnetwork datasets. FGSD signi\ufb01cantly out-\nperforms other methods.\n\nDataset\n\nMUTAG\n\nPTC\n\nNCI1\n\nD & D\n\nMAO\n\nMLG\n[2016]\n87.94\n(4s)\n63.26\n(21s)\n81.75\n(621s)\n\n78.18\n(7.5hr)\n\n88.29\n(12s)\n\nDCNN\n[2016]\n\n66.98\n\n56.60\n\n62.61\n\nOMR\n\nPSCN\n[2016]\n92.63\n(3s)\n\n62.90\n(6s)\n\n78.59\n(76s)\n77.12\n(154s)\n\n75.14\n\n\u2014\n\nGS\n[2009]\n\n88.11\n\n\u2014\n\n65.0\n\n\u2014\n\n\u2014\n\nFGSD*\n\n92.12\n(0.3s)\n\n62.80\n(0.07s)\n\n79.80\n(31s)\n\n77.10\n(25s)\n\n95.58\n(0.1s)\n\nTable 4: Classi\ufb01cation accuracy on labeled bioin-\nformatics datasets. * emphasize that FGSD did not\nutilize any node labels.\n.\n\nClassi\ufb01cation Results: From Table 2, it is clear that FGSD consistently outperforms every other\nstate-of-art algorithms on unlabeled bioinformatics datasets and that too signi\ufb01cantly in many cases.\nFGSD even performs better for social network graphs as shown in Table 3 and achieves a very\nsigni\ufb01cant 7% \u2212 8% more accuracy than the current state-of-art PSCNs on COLLAB and IMDB-M\ndatasets. Also from run-time perspective (excluding any data loading or classi\ufb01cation time for all\nalgorithms), it is pretty fast (2x\u20131000x times faster) as compare to others. These appealing results\nfurther motivated us to compare FGSD on the labeled datasets (even though, it is not a complete\nfair comparison). Table 4 shows that FGSD is still very competitive with all other strong (recent)\nalgorithms that utilize node labeled data. Infact on MAO dataset, FGSD sets a new state-of-art result\nand stays within 0% \u2212 2% range of accuracy from the best on all labeled datasets. On few labeled\ndatasets, we found MLG to have slightly better performance than the others, but it is 1000 times\nslower than FGSD when graph size jumps to few thousand nodes (see D&D Results). Altogether,\nFGSD shows very promising results in both accuracy & speed on all type of datasets and over all the\nmore sophisticated algorithms. These results also point out the fact that there is untapped hidden\npotential in the graph structure which current algorithms are not harnessing despite having labeled\ndata at their disposal.\n9 Conclusion\nWe present a conceptually simple yet powerful and theoretically motivated graph representation. In\nparticular, our graph representation based on the discovery of family of graph spectral distances can\nexhibits uniqueness, stability, sparsity and are computationally fast. Moreover, our hunt speci\ufb01cally\nleads to the harmonic and next to it, biharmonic distances as an ideal members of this family for\nextracting graph features. Finally, our extensive results show that FGSD based graph features are\npowerful enough to dominate the unlabeled graph classi\ufb01cation task over all the more sophisticated\nalgorithms and competitive enough to yield high classi\ufb01cation accuracy on labeled data even without\nutilizing any node labels. In our future work, we plan to generalize the FGSD for labeled dataset in\norder to utilize the useful node and edge label information in the graph representation.\n\n9\n\n\f10 Acknowledgments\nThis research was supported in part by ARO MURI Award W911NF-12-1-0385, DTRA grant\nHDTRA1-14-1-0040, and NSF grants CNS-1618339, CNS-1618339 and CNS-1617729.\n\nReferences\n[1] J. Atwood and D. Towsley. Diffusion-convolutional neural networks. In Advances in Neural Information\n\nProcessing Systems, pages 1993\u20132001, 2016.\n\n[2] L. Babai. Graph isomorphism in quasipolynomial time. CoRR, abs/1512.03547, 2015.\n[3] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.\n\nNeural computation, 15(6):1373\u20131396, 2003.\n\n[4] K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs.\n\nIn Data Mining, Fifth IEEE\n\nInternational Conference on, pages 8\u2013pp. IEEE, 2005.\n\n[5] M. Botsch, D. Bommes, and L. Kobbelt. Ef\ufb01cient linear system solvers for mesh processing. In Mathematics\n\nof Surfaces XI, pages 62\u201383. Springer, 2005.\n\n[6] O. Bousquet and A. Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2\n\n[7] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on\n\n[8] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent\n\n(Mar):499\u2013526, 2002.\n\ngraphs. arXiv preprint arXiv:1312.6203, 2013.\n\nSystems and Technology (TIST), 2(3):27, 2011.\n\n[9] F. Costa and K. De Grave. Fast neighborhood subgraph pairwise distance kernel. In Proceedings of the\n\n26th International Conference on Machine Learning, pages 255\u2013262. Omnipress, 2010.\n\n[10] H. Dai, B. Dai, and L. Song. Discriminative embeddings of latent variable models for structured data. In\nProceedings of the 33rd International Conference on International Conference on Machine Learning -\nVolume 48, 2016.\n\n[11] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast\nlocalized spectral \ufb01ltering. In Advances in Neural Information Processing Systems, pages 3837\u20133845,\n2016.\n\n[12] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams.\nConvolutional networks on graphs for learning molecular \ufb01ngerprints. In Advances in neural information\nprocessing systems, pages 2224\u20132232, 2015.\n\n[13] T. G\u00e4rtner, P. Flach, and S. Wrobel. On graph kernels: Hardness results and ef\ufb01cient alternatives. In\n\nLearning Theory and Kernel Machines, pages 129\u2013143. Springer, 2003.\n\n[14] A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd\n\nACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.\n\n[15] D. K. Hammond, P. Vandergheynst, and R. Gribonval. Wavelets on graphs via spectral graph theory.\n\nApplied and Computational Harmonic Analysis, 30(2):129\u2013150, 2011.\n\n[16] M. Henaff, J. Bruna, and Y. LeCun. Deep convolutional networks on graph-structured data. arXiv preprint\n\narXiv:1506.05163, 2015.\n\n[17] V. N. Katsikis, D. Pappas, and A. Petralias. An improved method for the computation of the moore\u2013penrose\n\ninverse matrix. Applied Mathematics and Computation, 217(23):9828\u20139834, 2011.\n\n[18] T. N. Kipf and M. Welling. Semi-supervised classi\ufb01cation with graph convolutional networks. arXiv\n\n[19] R. Kondor. A complete set of rotationally and translationally invariant features for images. CoRR,\n\npreprint arXiv:1609.02907, 2016.\n\nabs/cs/0701127, 2007.\n\n[20] R. Kondor and K. M. Borgwardt. The skew spectrum of graphs. In Proceedings of the 25th international\n\nconference on Machine learning, pages 496\u2013503. ACM, 2008.\n\n[21] R. Kondor and H. Pan. The multiscale laplacian graph kernel.\n\nIn Advances in Neural Information\n\nProcessing Systems, pages 2982\u20132990, 2016.\n\n[22] R. Kondor, N. Shervashidze, and K. M. Borgwardt. The graphlet spectrum. In Proceedings of the 26th\n\nAnnual International Conference on Machine Learning, pages 529\u2013536. ACM, 2009.\n\n[23] Y. Lipman, R. M. Rustamov, and T. A. Funkhouser. Biharmonic distance. ACM Transactions on Graphics\n\n(TOG), 29(3):27, 2010.\n\n[24] G. Montavon, K. Hansen, S. Fazli, M. Rupp, F. Biegler, A. Ziehe, A. Tkatchenko, A. V. Lilienfeld, and\nK.-R. M\u00fcller. Learning invariant representations of molecules for atomization energy prediction. In\nAdvances in Neural Information Processing Systems, pages 440\u2013448, 2012.\n\n[25] B. Nadler, S. Lafon, R. Coifman, and I. Kevrekidis. Diffusion maps, spectral clustering and eigenfunctions\n\nof fokker-planck operators. In NIPS, pages 955\u2013962, 2005.\n\n[26] M. Niepert, M. Ahmed, and K. Kutzkov. Learning convolutional neural networks for graphs. In Proceedings\n\nof the 33rd annual international conference on machine learning. ACM, 2016.\n\n[27] F. Orsini, P. Frasconi, and L. De Raedt. Graph invariant kernels. In IJCAI, pages 3756\u20133762, 2015.\n[28] J. Rosenblatt and P. D. Seymour. The structure of homometric sets. SIAM Journal on Algebraic Discrete\n\nMethods, 3(3):343\u2013350, 1982.\n\n10\n\n\f[29] L. W. Shapiro. An electrical lemma. Mathematics Magazine, 60(1):36\u201338, 1987.\n[30] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. M. Borgwardt. Ef\ufb01cient graphlet kernels\n\nfor large graph comparison. In AISTATS, volume 5, pages 488\u2013495, 2009.\n\nInternational Conference on Knowledge Discovery and Data Mining, pages 1365\u20131374. ACM, 2015.\n\n[31] N. Shervashidze, P. Schweitzer, E. J. v. Leeuwen, K. Mehlhorn, and K. M. Borgwardt. Weisfeiler-lehman\n\ngraph kernels. Journal of Machine Learning Research, 12(Sep):2539\u20132561, 2011.\n\n[32] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst. The emerging \ufb01eld of signal\nprocessing on graphs: Extending high-dimensional data analysis to networks and other irregular domains.\nIEEE Signal Processing Magazine, 30(3):83\u201398, 2013.\n\n[33] E. T. Whittaker and G. N. Watson. A course of modern analysis. Cambridge university press, 1996.\n[34] P. Yanardag and S. Vishwanathan. Deep graph kernels.\n\nIn Proceedings of the 21th ACM SIGKDD\n\n11\n\n\f", "award": [], "sourceid": 73, "authors": [{"given_name": "Saurabh", "family_name": "Verma", "institution": "University of Minnesota Twin Cities"}, {"given_name": "Zhi-Li", "family_name": "Zhang", "institution": "University of Minnesota"}]}