{"title": "On the equivalence between graph isomorphism testing and function approximation with GNNs", "book": "Advances in Neural Information Processing Systems", "page_first": 15894, "page_last": 15902, "abstract": "Graph neural networks (GNNs) have achieved lots of success on graph-structured data. In light of this, there has been increasing interest in studying their representation power. One line of work focuses on the universal approximation of permutation-invariant functions by certain classes of GNNs, and another demonstrates the limitation of GNNs via graph isomorphism tests.\n \nOur work connects these two perspectives and proves their equivalence. We further develop a framework of the representation power of GNNs with the language of sigma-algebra, which incorporates both viewpoints. Using this framework, we compare the expressive power of different classes of GNNs as well as other methods on graphs. In particular, we prove that order-2 Graph G-invariant networks fail to distinguish non-isomorphic regular graphs with the same degree. We then extend them to a new architecture, Ring-GNN, which succeeds in distinguishing these graphs as well as for tasks on real-world datasets.", "full_text": "On the equivalence between graph isomorphism\ntesting and function approximation with GNNs\n\nCourant Institute of Mathematical Sciences\n\nCourant Institute of Mathematical Sciences\n\nCourant Institute of Mathematical Sciences\n\nCourant Institute of Mathematical Sciences\n\nZhengdao Chen\n\nNew York University\nzc1216@nyu.edu\n\nLei Chen\n\nNew York University\nlc3909@nyu.edu\n\nSoledad Villar\n\nCenter for Data Science\nNew York University\n\nsoledad.villar@nyu.edu\n\nJoan Bruna\n\nCenter for Data Science\nNew York University\nbruna@cims.nyu.edu\n\nAbstract\n\nGraph neural networks (GNNs) have achieved lots of success on graph-structured\ndata. In light of this, there has been increasing interest in studying their repre-\nsentation power. One line of work focuses on the universal approximation of\npermutation-invariant functions by certain classes of GNNs, and another demon-\nstrates the limitation of GNNs via graph isomorphism tests.\nOur work connects these two perspectives and proves their equivalence. We further\ndevelop a framework of the representation power of GNNs with the language of\nsigma-algebra, which incorporates both viewpoints. Using this framework, we\ncompare the expressive power of different classes of GNNs as well as other methods\non graphs. In particular, we prove that order-2 Graph G-invariant networks fail to\ndistinguish non-isomorphic regular graphs with the same degree. We then extend\nthem to a new architecture, Ring-GNN, which succeeds in distinguishing these\ngraphs as well as for tasks on real-world datasets.\n\n1\n\nIntroduction\n\nGraph structured data naturally occur in many areas of knowledge, including computational biology,\nchemistry and social sciences. Graph neural networks, in all their forms, yield useful representations\nof graph data partly because they take into consideration the intrinsic symmetries of graphs, such as\ninvariance and equivariance with respect to a relabeling of the nodes [27, 7, 15, 8, 10, 28, 3, 36].\nAll these different architectures are proposed with different purposes (see [31] for a survey and\nreferences therein), and a priori it is not obvious how to compare their power. The recent work [32]\nproposes to study the representation power of GNNs via their performance on graph isomorphism\ntests. They developed the Graph Isomorphism Networks (GINs) that are as powerful as the one-\ndimensional Weisfeiler-Lehman (1-WL or just WL) test for graph isomorphism [30], and showed\nthat no other neighborhood-aggregating (or message passing) GNN can be more powerful than the\n1-WL test. Variants of message passing GNNs include [27, 9].\nOn the other hand, for feed-forward neural networks, many results have been obtained regarding\ntheir ability to approximate continuous functions, commonly known as the universal approximation\ntheorems, such as the seminal works of [6, 12]. Following this line of work, it is natural to study\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fthe expressivity of graph neural networks in terms of function approximation. Since we could argue\nthat many if not most functions on a graph that we are interested in are invariant or equivariant to\npermutations of the nodes in the graph, GNNs are usually designed to be invariant or equivariant,\nand therefore the natural question is whether certain classes GNNs can approximate any continuous\nand invariant or equivariant functions. Recent work [19] showed the universal approximation of\nG-invariant networks, constructed based on the linear invariant and equivariant layers studied in\n[18], if the order of the tensor involved in the networks can grow as the graph gets larger. Such a\ndependence on the graph size was been theoretically overcame by the very recent work [13], though\nthere is no known upper bound on the order of the tensors involved. With potentially very-high-order\ntensors, these models that are guaranteed of univeral approximation are not quite feasible in practice.\nThe foundational part of this work aims at building the bridge between graph isomorphism testing\nand invariant function approximation, the two main perspectives for studying the expressive power\nof graph neural networks. We demonstrate an equivalence between the the ability of a class of\nGNNs to distinguish between any pairs of non-isomorphic graph and its power of approximating\nany (continuous) invariant functions, for both the case with \ufb01nite feature space and the case with\ncontinuous feature space. Furthermore, we argue that the concept of sigma-algebras on the space\nof graphs is a natural description of the power of graph neural networks, allowing us to build\na taxonomy of GNNs based on how their respective sigmas-algebras interact. Building on this\ntheoretical framework, we identify an opportunity to increase the expressive power of order-2 G-\ninvariant networks with computational tractability, by considering a ring of invariant matrices under\naddition and multiplication. We show that the resulting model, which we refer to as Ring-GNN,\nis able to distinguish between non-isomorphic regular graphs where order-2 G-invariant networks\nprovably fail. We illustrate these gains numerically in synthetic and real graph classi\ufb01cation tasks.\nSummary of main contributions:\n\n\u2022 We show the equivalence between graph isomorphism testing and approximation of\npermutation-invariant functions for analyzing the expressive power of graph neural networks.\n\u2022 We introduce a language of sigma algebra for studying the representation power of graph\nneural networks, which uni\ufb01es both graph isomorphism testing and function approximation,\nand use this framework to compare the power of some GNNs and other methods.\n\n\u2022 We propose Ring-GNN, a tractable extension of order-2 Graph G-invariant Networks that\nuses the ring of matrix addition and multiplication. We show this extension is necessary and\nsuf\ufb01cient to distinguish Circular Skip Links graphs.\n\n2 Related work\n\nGraph Neural Networks and graph isomorphism. Graph isomorphism is a fundamental problem\nin theoretical computer science. It amounts to deciding, given two graphs A, B, whether there exists a\npermutation \u21e1 such that \u21e1A = B\u21e1. There exists no known polynomial-time algorithm to solve it, but\nrecently Babai made a breakthrough by showing that it can be solved in quasi-polynomial-time [1].\nRecently [32] introduced graph isomorphism tests as a characterization of the power of graph neural\nnetworks. They show that if a GNN follows a neighborhood aggregation scheme, then it cannot\ndistinguish pairs of non-isomorphic graphs that the 1-WL test fails to distinguish. Therefore this\nclass of GNNs is at most as powerful as the 1-WL test. They further propose the Graph Isomorphism\nNetworks (GINs) based on approximating injective set functions by multi-layer perceptrons (MLPs),\nwhich can be as powerful as the 1-WL test. Based on k-WL tests [4], [20] proposes k-GNN, which\ncan take higher-order interactions among nodes into account. Concurrently to this work, [17] proves\nthat order-k invariant graph networks are at least as powerful as the k-WL tests, and similarly to us, it\nand augments order-2 networks with matrix multiplication. They show they achieve at least the power\nof 3-WL test. [21] proposes relational pooling (RP), an approach that combines permutation-sensitive\nfunctions under all permutations to obtain a permutation-invariant function. If RP is combined with\npermutation-sensitive functions that are suf\ufb01ciently expressive, then it can be shown to be a universal\napproximator. A combination of RP and GINs is able to distinguish certain non-isomorphic regular\ngraphs which GIN alone would fail on. A drawback of RP is that its full version is intractable\ncomputationally, and therefore it needs to be approximated by averaging over randomly sampled\npermutations, in which case the resulting functions is not guaranteed to be permutation-invariant.\n\n2\n\n\f2\n\nUniversal approximation of functions with symmetry. Many works have discussed the function\napproximation capabilities of neural networks that satisfy certain symmetries. [2] studies the probab-\nlisitic and functional symmetry in neural networks, and we discuss its relationship to our work in more\ndetail in Appendix D. [25] shows that equivariance of a neural network corresponds to symmetries in\nits parameter-sharing scheme. [35] proposes a neural network architecture with polynomial layers\nthat is able to achieve universal approximation of invariant or equivariant functions. [18] studies the\nspaces of all invariant and equivariant linear functions, and obtained bases for such spaces. Building\nupon this work, [19] proposes the G-invariant network for a symmetry group G, which achieves\nuniversal approximation of G-invariant functions if the maximal tensor order involved in the network\nto grow as n(n1)\n, but such high-order tensors are prohibitive in practice. Upper bounds on the\napproximation power of the G-invariant networks when the tensor order is limited remains open\nexcept for when G = An [19]. The very recent work [13] extends the result to the equivariant case,\nalthough it suffers from the same problem of possibly requiring high-order tensors. Speci\ufb01cally for\nlearning in graphs, [26] proposes the compositional networks, which achieve equivariance and are\ninspired by the WL test. In the context of machine perception of visual scenes, [11] proposes an\narchitecture that can potentially express all equivariant functions.\nTo the best our knowledge, this is the \ufb01rst work that shows an explicit connection between the two\naforementioned perspectives of studying the representation power of graph neural networks - graph\nisomorphism testing and universal approximation. Our main theoretical contribution lies in showing\nan equivalence between them, for both \ufb01nite and continuous feature space cases, with a natural\ngeneralization of the notion of graph isomorphism testing to the latter case. Then we focus on the\nGraph G-invariant network based on [18, 19], and showed that when the maximum tensor order\nis restricted to be 2, then it cannot distinguish between non-isomorphic regular graphs with equal\ndegrees. As a corollary, such networks are not universal. Note that our result shows an upper bound\non order 2 G-invariant networks, whereas concurrently to us, [17] provides a lower bound by relating\nto k-WL tests. Concurrently to [17], we propose a modi\ufb01ed version of order-2 graph networks to\ncapture higher-order interactions among nodes without computing tensors of higher-order.\n\n3 Graph isomorphism testing and universal approximation\n\nIn this section we show that there exists a very close connection between the universal approximation\nof permutation-invariant functions by a class of functions, and its ability to perform graph isomor-\nphism tests. We consider graphs with nodes and edges labeled by elements of a compact set X\u21e2 R.\nWe represent graphs with n nodes by an n by n matrix G 2X n\u21e5n, where a diagonal term Gii\nrepresents the label of the ith node, and a non-diagonal Gij represents the label of the edge from the\nith node to the jth node. An undirected graph will then be represented by a symmetric G.\nThus, we focus on analyzing a collection C of functions from X n\u21e5n to R. We are especially\ninterested in collections of permutation-invariant functions, de\ufb01ned so that f (\u21e1|G\u21e1) = f (G), for\nall G 2X n\u21e5n, and all \u21e1 2 Sn, where Sn is the permutation group of n elements. For classes of\nfunctions, we de\ufb01ne the property of being able to discriminate non-isomorphic graphs, which we call\nGIso-discriminating, which as we will see generalizes naturally to the continuous case.\nDe\ufb01nition 1. Let C be a collection of permutation-invariant functions from X n\u21e5n to R. We say C is\nGIso-discriminating if for all non-isomorphic G1, G2 2X n\u21e5n (denoted G1 6' G2), there exists a\nfunction h 2C such that h(G1) 6= h(G2). This de\ufb01nition is illustrated by \ufb01gure 2 in the appendix.\nDe\ufb01nition 2. Let C be a collection of permutation-invariant functions from X n\u21e5n to R. We say C is\nuniversally approximating if for all permutation-invariant function f from X n\u21e5n to R, and for all\n\u270f> 0, there exists hf,\u270f 2C such that kf hf,\u270fk1 := supG2X n\u21e5n |f (G) h(G)| <\u270f\n3.1 Finite feature space\n\nAs a warm-up we \ufb01rst consider the space of graphs with a \ufb01nite set of possible features for nodes and\nedges, X = {1, . . . , M}.\nTheorem 1. Universally approximating classes of functions are also GIso-discriminating.\nProof. Given G1, G2 2X n\u21e5n, we consider the permutation-invariant function 1\nsuch that 1\n\n'G1 : X n\u21e5n ! R\n'G1(G) = 1 if G is isomorphic to G1 and 0 otherwise. Therefore, it can be approximated\n\n3\n\n\fL0\n\nwith \u270f = 0.1 by a function h 2C . Then h is a function that distinguishes G1 from G2, as in\nDe\ufb01nition 1. Hence C is GIso-discriminating.\nTo obtain a result on the reverse direction, we \ufb01rst introduce the concept of an augmented collection\nof functions, which is especially natural when C is a collection of neural networks.\nDe\ufb01nition 3. Given C, a collection of functions from X n\u21e5n to R, we consider an augmented\ncollection of functions also from X n\u21e5n to R consisting of functions that map an input graph G to\nNN ([h1(G), ..., hd(G)]) for some \ufb01nite d, where NN is a feed-forward neural network / multi-layer\nperceptron, and h1, ..., hd 2C . When NN is restricted to have L layers, we denoted this augmented\ncollection by C+L. In this work, we consider ReLU as the nonlinear activation function in the neural\nnetworks.\nRemark 1. If CL0 is the collection of feed-forward neural networks with L0 layers, then C+L\nrepresents the collection of feed-forward neural networks with L0 + L layers.\nRemark 2. If C is a collection of permutation-invariant functions, so is C+L.\nTheorem 2. If C is GIso-discriminating, then C+2 is universal approximating.\nThe proof is simple and it is a consequence of the following lemmas that we prove in Appendix A.\nLemma 1. If C is GIso-discriminating, then for all G 2X n\u21e5n, there exists a function \u02dchG 2C +1\nsuch that for all G0, \u02dchG(G0) = 0 if and only if G ' G0.\nLemma 2. Let C be a class of permutation-invariant functions from X n\u21e5n to R satisfying the\nconsequences of Lemma 1, then C+1 is universally approximating.\n3.2 Extension to the case of continuous (Euclidean) feature space\nGraph isomorphism is an inherently discrete problem, whereas universal approximation is usually\nmore interesting when the input space is continuous. With our de\ufb01nition 1 of GIso-discriminating,\nwe can achieve a natural generalization of the above results to the scenarios of continuous input space.\nAll proofs for this section can be found in Appendix A.\nLet X be a compact subset of R, and we consider graphs with n nodes represented by G 2 K =\nX n\u21e5n; that is, the node features are {Gii}i=1,...,n and the edge features are {Gij}i,j=1,...,n;i6=j.\nTheorem 3. If C is universally approximating, then it is also GIso-discriminating\nThe essence of the proof is similar to that of Theorem 1. The other direction - showing that pairwise\ndiscrimination can lead to universal approximation - is less straightforward. As an intermediate step\nbetween, we make the following de\ufb01nition:\nDe\ufb01nition 4. Let C be a class of functions K ! R. We say it is able to locate every isomorphism\nclass if for all G 2 K and for all \u270f> 0 there exists hG 2C such that:\n\nd is the Euclidean distance de\ufb01ned on Rn\u21e5n\n\n\u2022 for all G0 2 K, hG(G0) 0;\n\u2022 for all G0 2 K, if G0 ' G, then hG(G0) = 0; and\n\u2022 there exists G > 0 such that if hG < G, then 9\u21e1 2 Sn such that d(\u21e1(G0), G) <\u270f , where\nLemma 3. If C, a collection of continuous permutation-invariant functions from K to R, is GIso-\ndiscriminating, then C+1 is able to locate every isomorphism class.\nHeuristically, we can think of the hG in the de\ufb01nition above as a \u201closs function\u201d that penalizes the\ndeviation of G0 from the equivalence class of G. In particular, the second condition says that if the\nloss value is small enough, then we know that G0 has to be close to the equivalence class of G.\nLemma 4. Let C be a class of permutation-invariant functions K ! R. If C is able to locate every\nisomorphism class, then C+2 is universally approximating.\nCombining the two lemmas above, we arrive at the following theorem:\nTheorem 4. If C, a collection of continuous permutation-invariant functions from K to R, is GIso-\ndiscriminating, then C+3 is universaly approximating.\n\n4\n\n\f4 A framework of representation power based on sigma-algebra\n\n4.1\n\nIntroducing sigma-algebra to this context\n\nLet K = X n\u21e5n be a \ufb01nite input space. Let QK := K/' be the set of isomorphism classes under\nthe equivalence relation of graph isomorphism. That is, for all \u2327 2 QK,\u2327 = {\u21e1|G\u21e1 : \u21e1 2 n} for\nsome G 2 K.\nIntuitively, a maximally expressive collection of permutation-invariant functions, C, will allow us to\nknow exactly which isomorphism class \u2327 a given graph G belongs to, by looking at the outputs of\ncertain functions in the collection applied to G. Heuristically, we can consider each function in C as a\n\u201cmeasurement\u201d, which partitions that graph space K according to the function value at each point. If\nC is powerful enough, then as a collection it will partition K to be as \ufb01ne as QK. If not, it is going to\nbe coarser than QK. These intuitions motivate us to introduce the language of sigma-algebra.\nRecall that an algebra on a set K is a collection of subsets of K that includes K itself, is closed\nunder complement, and is closed under \ufb01nite union. Because K is \ufb01nite, we have that an algebra\non K is also a sigma-algebra on K, where a sigma-algebra further satis\ufb01es the condition of being\nclosed under countable unions. Since QK is a set of (non-intersecting) subsets of K, we can obtain\nthe algebra generated by QK, de\ufb01ned as the smallest algebra that contains QK, and use (QK) to\ndenote the algebra (and sigma-algebra) generated by QK.\nObservation 1. If f : X n\u21e5n ! R is a permutation-invariant function, then f is measurable with\nrespect to (QK), and we denote this by f 2M [(QK)]\nNow consider a class of functions C that is permutation-invariant. Then for all f 2C , f 2M [(QK)].\nWe de\ufb01ne the sigma-algebra generated by f as the set of all the pre-images of Borel sets on R under\nf, and denote it by (f ). It is the smallest sigma-algebra on K that makes f measurable. For a class\nof functions C, (C) is de\ufb01ned as the smallest sigma-algebra on K that makes all functions in C\nmeasurable. Because we assume K is \ufb01nite, it does not matter whether C is a countable collection.\n\n4.2 Reformulating graph isomorphism testing and universal approximation with\n\nsigma-algebra\n\nWe restrict our attention to \ufb01nite feature space case. Given a graph G 2X n\u21e5n, we use E(G) to\ndenote its isomorphism class, {G0 2X n\u21e5n : G0 ' G}. We prove the following results in Appendix\nB.\nTheorem 5. If C is a class of permutation-invariant functions on X n\u21e5n and C is GIso-discriminating,\nthen (C) = (QK)\nTogether with Theorem 1, the following is an immediate consequence:\nCorollary 1. If C is a class of permutation-invariant functions on X n\u21e5n and C achieves universal\napproximation, then (C) = (QK).\nTheorem 6. Let be C a class of permutation-invariant functions on X n\u21e5n with (C) = (QK).\nThen C is GIso-discriminating.\nThus, this sigma-algebra language is a natural notion for characterizing the power of graph neural\nnetworks, because as shown above, generating the \ufb01nest sigma-algebra (QK) is equivalent to being\nGIso-discriminating, and therefore to universal approximation.\nMoreover, when C is not GIso-discriminating or universal, we can evaluate its representation power\nby studying (C), which gives a measure for comparing the power of different GNN families. Given\ntwo classes of functions C1,C2, there is (C1) \u2713 (C2) if and only if M[(C1)] \u2713M [(C2)] if and\nonly if C1 is less powerful than C2 in terms of representation power. In Appendix C, we use this\nnotion to compare the expressive power of different families of GNNs as well as other algorithms\nlike 1-WL, linear programming and semide\ufb01nite programming in terms of their ability to distinguish\nnon-isomorphic graphs. We summarize our \ufb01ndings in Figure 1.\n\n5\n\n\fsGNN(I, A)\n\nLP \u2318 1 W L \u2318 GIN\n\nSDP\n\nMPNN\u21e4\n\nsGNN(I, D, A,{min{At, 1}}T\n\nt=1)\n\norder-2 graph G-invariant networks\u21e4\n\nspectral methods\n\nSoS hierarchy\n\nRing-GNN\n\nFigure 1: Comparison of classes of functions on graphs in terms of their expressive power under the sigma-\nalgebra framework proposed in Section 4. Remarks: (a) GIN being de\ufb01ned in [32] as a form of message passing\nneural network (MPNN) justi\ufb01es the inclusion GIN ,! MPNN. (b) [18] shows that message passing neural\nnetworks can be expressed as a modi\ufb01ed form of order-2 graph G-invariant networks (which may not coincide\nwith the de\ufb01nition we consider in this paper). Therefore this branch of the hierarchy has yet to be established\nrigorously. The rest of the \ufb01gure is explained in Appendix C.\n\n5 Ring-GNN: a GNN de\ufb01ned on the ring of equivariant functions\n\n5.1 The limitation of order-2 Graph G-invariant Networks\n\nWe \ufb01rst investigate the G-invariant networks proposed in [19]. They are constructed by interleaving\ncompositions of equivariant linear layers between tensors of potentially different orders and point-\nwise nonlinear activation functions. We de\ufb01ne its adaptation to graph-structured data in Appendix E,\nand refer to it as Graph G-invariant Networks. It is a powerful framework that can achieve universal\napproximation if the order of the tensors can grow polynomially in the number of nodes [19], but\nless is known about its approximation power when the tensor order is restricted. One particularly\ninteresting subclass of G-invariant networks is the ones with maximum tensor order 2 (we will call\nthem order-2 Graph G-invariant Networks), because [18] shows that it can approximate any Message\nPassing Neural Network (MPNN) [8], and moreover, it would be both mathematically cumbersome\nand computationally expensive to include linear layers involving tensors with order higher than 2.\nOur following result shows that the class of order-2 Graph G-invariant Networks is quite restrictive.\nThe proof is given in Appendix E.\nTheorem 7. Order-2 Graph G-invariant Networks cannot distinguish between non-isomorphic\nregular graphs with the same degree.\n\n5.2 Ring-GNN as an extension of order-2 Graph G-invariant Networks\n\nMotivated by this limitation, we propose a GNN architecture that extends the family of order-2\nGraph G-invariant Networks without going into higher order tensors. In particular, we want the new\nfamily to include GNNs that can distinguish some pairs of non-isomorphic regular graphs with the\nsame degree. For instance, take the pair of Circular Skip Link graphs G8,2 and G8,3, illustrated\nin Figure 5.2. Roughly speaking, if all the nodes in both graphs have the same node feature, then\nbecause they all have the same degree, the updates of node states in both graph neural networks\nbased on neighborhood aggregation and the WL test will fail to distinguish the nodes. However,\nthe power graphs1 of G8,2 and G8,3 have different degrees. Another important example comes\nfrom spectral methods that operate on normalized operators, such as the normalized Laplacian\n= I D1/2AD1/2, where D is the diagonal degree operator. Such normalization preserves the\npermutation symmetries and in many clustering applications leads to dramatic improvements [29].\n\n1If A is the adjacency matrix of a graph, its power graph has adjacency matrix min(A2, 1). The matrix\nmin(A2, 1) has been used in [5] in graph neural networks for community detection and in [22] for the quadratic\nassignment problem, and it leverages multiscale information in the graph. Note that it differs from taking the\npower of certain matrices, which is exploited in [16] for example.\n\n6\n\n\fFigure 2: The Circular Skip Link graphs Gn,k are undirected graphs in n nodes q0, . . . , qn1 so that\n(i, j) 2 E if and only if |i j|\u2318 1 or k (mod n). In this \ufb01gure we depict (left) G8,2 and (right)\nG8,3. It is very easy to check that Gn,k and Gn0,k0 are not isomorphic unless n = n0 and k \u2318\u00b1 k0\n(mod n). Both 1-WL and G-invariant networks fail to distinguish them.\n\nThis motivates us to consider a polynomial ring generated by the matrices that are the outputs of\npermutation-equivariant linear layers, rather than just the linear space of those outputs. Together\nwith point-wise nonlinear activation functions such as ReLU, power graph adjacency matrices like\nmin(A2, 1) can be expressed with suitable choices of parameters.\nTo start with, we revisit the theory of linear equivariant functions developed in [18]. It is shown that\nany linear equivariant layer from Rn\u21e5n to Rn\u21e5n can be represented as L\u2713(A) =P15\ni=1 \u2713iLi(A) +\nP17\ni=16 \u2713iLi, where {Li}i=1,...,15 is the set of 15 basis functions for all linear equivariant functions\nfrom Rn\u21e5n to Rn\u21e5n, L16 and L17 are the basis for the bias terms, and \u2713 2 R17 are the parameters\nthat determine L. Generalizing to an equivariant linear layer from Rn\u21e5n\u21e5d to Rn\u21e5n\u21e5d0, it becomes\nL\u2713(A)\u00b7,\u00b7,k0 =Pd\nWith this in mind, we now de\ufb01ne a new architecture. Suppose the input is A(0) 2 Rn\u21e5n\u21e5d, containing\ndata on a graph with n nodes. We \ufb01x some integer T , and for t = {0, ..., T 1}, iteratively de\ufb01ne\n\ni=1 \u2713k,k0,iLi(A\u00b7,\u00b7,i) +P17\n\ni=16 \u2713k,k0,iLi, with \u2713 2 Rd\u21e5d0\u21e517.\n\nk=1P15\n\nB(t)\n1\nB(t)\n2\n\n= (L\u21b5(t)(A(t)))\n= (L(t)(A(t)) \u00b7 L(t)(A(t)))\n\nA(t+1) = k(t)\n\n1 B(t)\n\n1 + k(t)\n\n2 B(t)\n\n2\n\n1 , k(t)\n\nij + \u2713DPi,i A(T )\n\ncompute \u2713SPi,j A(T )\n\n2 2 R, \u21b5(t), (t), (t) 2 Rd(t)\u21e5d0(t)\u21e517 are learnable parameters, and is a pointwise\nwhere k(t)\nnonlinear activation function such as ReLU. If a scalar output is desired, then in the \ufb01nal layer we\n, where \u2713S,\u2713 D 2 R are trainable parameters. We call the\n\nii\nresulting architecture the Ring-GNN.2\nNote that each layer is equivariant, and the map from A to the \ufb01nal scalar output is invariant. A\nRing-GNN can reduce to an order-2 Graph G-invariant Network if k(t)\n2 = 0. With J + 1 layers\nand suitable choices of the parameters, it is possible to obtain min(A2J , 1) in the (J + 1)th layer.\nTherefore, we expect it to succeed in distinguishing certain pairs of regular graphs that order-2\nGraph G-invariant Networks fail on, such as the Circular Skip Link graphs. Indeed, this is veri\ufb01ed\nin the synthetic experiment presented in the next section. The normalized Laplacian can also be\napproximated, since the degree matrix can be inverted by taking the reciprocal on the diagonal, and\nthen entry-wise inversion and square root on the diagonal can be approximated by MLPs.\nComputationally, the complexity of running the forward model grows as O(n3), dominated by the\nmatrix multiplications, which are what enable the computations of certain higher-order information\nas the depth increases. In comparison, a Graph G-invariant Network with maximal tensor order k\nwill have complexity at least O(nk). Therefore, the Ring-GNN is able to explore some higher-order\ninteractions in the graph (which order-2 Graph G-invariant Networks neglect) while remaining\ncomputationally tractable. We note also that Ring-GNN can be augmented with matrix inverses or\nmore generally with functional calculus on the spectrum of any of the intermediate representations3\nwhile keeping O(n3) computational complexity.\n\n2We call it Ring-GNN since the main object we consider is the ring of matrices, but technically we can\n\nexpress an associative algebra since our model includes scalar multiplications.\n\n3When A = A(0) is the adjacency matrix of an undirected graph, one easily veri\ufb01es that A(t) contains only\n\nsymmetric matrices for all t.\n\n7\n\n\f6 Experiments\n\nThe different models and the detailed setup of the experiments are discussed in Appendix F4. All\nexperiments are conducted on GeForce GTX 1080 Ti and RTX 2080 Ti.\n\n6.1 Classifying Circular Skip Links (CSL) graphs\nThe following experiment on synthetic data demonstrates the connection between function \ufb01tting\nand graph isomorphism testing. The Circular Skip Links graphs5 are undirected regular graphs with\nnode degree 4 [21], as illustrated in Figure 5.2. Note that two CSL graphs Gn,k and Gn0,k0 are not\nisomorphic unless n = n0 and k \u2318\u00b1 k0 (mod n). In the experiment, which has the same setup as in\n[21], we \ufb01x n = 41, and set k 2{ 2, 3, 4, 5, 6, 9, 11, 12, 13, 16}, and each k corresponds to a distinct\nisomorphism class. The task is then to classify a graph Gn,k by its skip length k.\nNote that since the 10 classes have the same size, a naive uniform classi\ufb01er would obtain 10%\naccuracy. As we see from Table 1, both GIN and G-invariant network with tensor order 2 do not\noutperform the naive classi\ufb01er. Their failure in this task is unsurprising: WL tests are proved to fall\nshort of distinguishing such pairs of non-isomorphic regular graphs [4], and hence neither can GIN\n[32]; by the theoretical results from the previous section, order-2 Graph G-invariant network are\nunable to distinguish them either. Therefore, their failure as graph isomorphism tests is consistent\nwith their failure in this classi\ufb01cation task, which can be understood as trying to approximate the\nfunction that maps the graph to their class labels.\nIt should be noted that, since graph isomorphism tests are not entirely well-posed as class\ufb01cation\ntasks, the performance of GNN models could vary due to randomness. But the fact that Ring-GNNs\nachieve a relatively high maximum accuracy (compared to RP for example) demonstrates that as a\nclass of GNNs it is rich enough to contain functions that distinguish the CSL graphs to a large extent.\n\nGNN architecture\nRP-GIN \u2020\nGIN \u2020 \u2021\nOrder-2 Graph G-inv. \u2020\nsGNN-5\nsGNN-2\nsGNN-1\nLGNN [5]\nRing-GNN\nRing-GNN (w/ degree) \u2021\n\nIMDBB\n\nCircular Skip Links\nmax min\n10\n53.3\n10\n10\n10\n10\n80\n80\n30\n30\n10\n10\n30\n30\n80\n10\n-\n-\n\nstd\n12.9\n0\n0\n0\n0\n0\n0\n15.7\n-\n\nmean\n-\n75.1\n71.3\n72.8\n73.1\n72.7\n74.1\n73.0\n73.3\n\nIMDBM\nstd\n-\n2.8\n3.9\n3.2\n2.1\n2.1\n3.0\n2.7\n4.2\n\nstd mean\n-\n5.1\n4.5\n3.8\n5.2\n4.9\n4.6\n5.4\n4.9\n\n-\n52.3\n48.6\n49.4\n49.0\n49.0\n50.9\n48.2\n51.3\n\nTable 1: (left) Accuracy of different GNNs at classifying CSL (see Section 6.1). We report the best\nand worst performances among 10 experiments. (right) Accuracy of different GNNs at classifying\nreal datasets IMDBB, IMDBM [34] (see Section 6.2). We report the best performance among all 350\nepochs on 10-fold cross-validation, as was done in [32]. \u2020: Reported performance by [21], [32] and\n[18]. \u2021: On the IMDB datasets, unlike the other models, both GIN and the Ring-GNN (w/ degree) on\nthe last row take the node degrees as input node features (see Section 6.2).\n\nIMDB datasets\n\n6.2\nWe use the two IMDB datasets (IMDBBINARY, IMDBMULTI)6 [34] to test different models in\nreal-world scenarios. Since our focus is on distinguishing graph structures, these datasets are suitable\nas they do not contain node features. IMDBBINARY has 1000 graphs, with average number of nodes\n19.8 and 2 classes. IMDBMULTI has 1500 graphs, with average number of nodes 13.0 and 3 classes.\nBoth datasets are randomly partitioned into 9 : 1 for training/validation. As these two social network\ndatasets have no informative node features, GIN uses one-hot encodings of node degrees as input\nnode features, while the other baseline models treat all nodes to have identical features. For a fairer\ncomparison, we apply two versions of Ring-GNN: the \ufb01rst one treats all nodes as having identical\n\n4The code is available at https://github.com/leichen2018/Ring-GNN.\n5CSL dataset: https://github.com/PurdueMINDS/RelationalPooling/tree/master/\n6IMDB datasets: https://github.com/weihua916/powerful-gnns/blob/master/dataset.zip\n\n8\n\n\finput features and has identical depth and widths as the order-2 Graph G-invariant Network [18],\ndenoted as \u201cRing-GNN\" in Table 1; the second one uses the node degree as input features (though not\nas one-hot encodings, due to computational constraints, but simply as one integer per node), denoted\nas \u201cRing-GNN w/ degree\" in Table 1. All models are evaluated via 10-fold cross validation and best\naccuracy is calculated through averaging across folds followed by maximizing along epochs [32].\nTable 1 shows that Ring-GNN models achieve higher or similar performance compared to the order-2\nGraph G-invariant networks on both datasets, and slightly worse performance compared to GIN.\n\n6.3 Other real-world datasets\nWe perform further experiments on four other real-world datasets for classi\ufb01cation tasks, including a\nsocial network dataset, COLLAB, and three bioinformatics datasets, MUTAG, PTC, PROTEINS7 [34].\nThe experiment setup (10-fold cross validation, training/validation split) is identical to that of the\nIMDB datasets, except that all the bioinformatics datasets contain node features, and more details\nof hyperparameters are included in Appendix F. As shown in Table 2, Ring-GNN outperforms\norder-2 Graph G-invariant Network in all four datasets, and outperforms GIN in one out of the four\ndatasets. Moreover, we note that the main goal of this part of our work is not necessarily to \ufb01nd the\nbest-performing GNN through hyperparameter optimization, but rather to propose Ring-GNN as\nan augmented version of order-2 Graph G-invariant Networks and show experimental results that\nsupport the theory.\n\nCOLLAB MUTAG\n80.1\u00b11.4\n86.8\u00b16.4\n89.4\u00b15.6\n80.2\u00b11.9\n84.6\u00b110.0\n77.9 \u00b1 1.7\n\nPTC\n65.7\u00b17.1\n64.6\u00b17.0\n59.5\u00b17.3\n\nPROTEINS\n75.7\u00b12.9\n76.2\u00b12.8\n75.2\u00b14.3\n\nRing-GNN\nGIN \u2020\nOrder-2 Graph G-inv. \u2020\n\nTable 2: Accuracy of different GNNs evaluated on several other real-world datasets. We report the\nbest performance among all epochs on 10-fold cross-validation. \u2020: Reported by [32] and [18].\n\n7 Conclusions\n\nIn this work we address the important question of organizing the fast-growing zoo of GNN architec-\ntures in terms of what functions they can and cannot represent. We follow the approach via the graph\nisomorphism test, and show that is equivalent to the other perspective via function approximation.\nWe leverage our graph isomorphism reduction to augment order-2 G-invariant nets with the ring of\noperators associated with matrix multiplication, which gives provable gains in expressive power with\ncomplexity O(n3), and is amenable to ef\ufb01ciency gains by leveraging sparsity in the graphs.\nOur general framework leaves many interesting questions unresolved. First, a more comprehensive\nanalysis on which elements of the algebra are really needed depending on the application. Next, our\ncurrent GNN taxonomy is still incomplete, and in particular we believe it is important to further\ndiscern the abilities between spectral and neighborhood-aggregation-based architectures. Finally,\nand most importantly, our current notion of invariance (based on permutation symmetry) de\ufb01nes a\ntopology in the space of graphs that is too strong; in other words, two graphs are either considered\nequal (if they are isomorphic) or not. Extending the theory of symmetric universal approximation to\ntake into account a weaker metric in the space of graphs, such as the Gromov-Hausdorff distance, is a\nnatural next step, that will better re\ufb02ect the stability requirements of powerful graph representations\nto small graph perturbations in real-world applications.\n\nAcknowledgements We would like to thank Haggai Maron and Thomas Kipf for fruitful discus-\nsions and for pointing us towards G-invariant networks as powerful models to study representational\npower in graphs. We thank Prof. Michael M. Bronstein for supporting this research with computing\nresources. This work was partially supported by NSF grant RI-IIS 1816753, NSF CAREER CIF\n1845360, the Alfred P. Sloan Fellowship, Samsung GRP and Samsung Electronics. SV was partially\nfunded by EOARD FA9550-18-1-7007 and the Simons Collaboration Algorithms and Geometry.\n\n7Real datasets: https://github.com/weihua916/powerful-gnns/blob/master/dataset.zip\n\n9\n\n\f", "award": [], "sourceid": 9347, "authors": [{"given_name": "Zhengdao", "family_name": "Chen", "institution": "New York University"}, {"given_name": "Soledad", "family_name": "Villar", "institution": "New York University"}, {"given_name": "Lei", "family_name": "Chen", "institution": "New York University"}, {"given_name": "Joan", "family_name": "Bruna", "institution": "NYU"}]}