{"title": "Approximation Ratios of Graph Neural Networks for Combinatorial Problems", "book": "Advances in Neural Information Processing Systems", "page_first": 4081, "page_last": 4090, "abstract": "In this paper, from a theoretical perspective, we study how powerful graph neural networks (GNNs) can be for learning approximation algorithms for combinatorial problems. \nTo this end, we first establish a new class of GNNs that can solve a strictly wider variety of problems than existing GNNs. Then, we bridge the gap between GNN theory and the theory of distributed local algorithms. We theoretically demonstrate that the most powerful GNN can learn approximation algorithms for the minimum dominating set problem and the minimum vertex cover problem with some approximation ratios with the aid of the theory of distributed local algorithms. We also show that most of the existing GNNs such as GIN, GAT, GCN, and GraphSAGE cannot perform better than with these ratios. This paper is the first to elucidate approximation ratios of GNNs for combinatorial problems. Furthermore, we prove that adding coloring or weak-coloring to each node feature improves these approximation ratios. This indicates that preprocessing and feature engineering theoretically strengthen model capabilities.", "full_text": "Approximation Ratios of Graph Neural Networks for\n\nCombinatorial Problems\n\nRyoma Sato1,2 Makoto Yamada1,2,3 Hisashi Kashima1,2\n\n3JST PRESTO\n\n1Kyoto University\n\n2RIKEN AIP\n\n{r.sato@ml.ist.i, myamada@i, kashima@i}.kyoto-u.ac.jp\n\nAbstract\n\nIn this paper, from a theoretical perspective, we study how powerful graph neural\nnetworks (GNNs) can be for learning approximation algorithms for combinatorial\nproblems. To this end, we \ufb01rst establish a new class of GNNs that can solve\na strictly wider variety of problems than existing GNNs. Then, we bridge the\ngap between GNN theory and the theory of distributed local algorithms. We\ntheoretically demonstrate that the most powerful GNN can learn approximation\nalgorithms for the minimum dominating set problem and the minimum vertex cover\nproblem with some approximation ratios with the aid of the theory of distributed\nlocal algorithms. We also show that most of the existing GNNs such as GIN, GAT,\nGCN, and GraphSAGE cannot perform better than with these ratios. This paper\nis the \ufb01rst to elucidate approximation ratios of GNNs for combinatorial problems.\nFurthermore, we prove that adding coloring or weak-coloring to each node feature\nimproves these approximation ratios. This indicates that preprocessing and feature\nengineering theoretically strengthen model capabilities.\n\n1\n\nIntroduction\n\nGraph neural networks (GNNs) [8, 9, 12, 22] is a novel machine learning method for graph structures.\nGNNs have achieved state-of-the-art performance in various tasks, including chemo-informatics [7],\nquestion answering systems [23], and recommendation systems [31], to name a few.\nRecently, machine learning methods have been applied to combinatorial problems [4, 11, 16, 27] to\nautomatically obtain novel and ef\ufb01cient algorithms. Xu et al. [30] analyzed the capability of GNNs\nfor solving the graph isomorphism problem, and they found that GNNs cannot solve it but they are as\npowerful as the Weisfeiler-Lehman graph isomorphism test.\nThe minimum dominating set problem, minimum vertex cover problem, and maximum matching\nproblem are examples of important combinatorial problems other than the graph isomorphism\nproblem. These problems are all NP-hard. Therefore, under the assumption that P \ufffd= NP, GNNs\ncannot exactly solve these problems because they run in polynomial time with respect to input size.\nFor NP-hard problems, many approximation algorithms have been proposed to obtain sub-optimal\nsolutions in polynomial time [25], and approximation ratios of these algorithms have been studied to\nguarantee the performance of these algorithms.\nIn this paper, we study the approximation ratios of algorithms that GNNs can learn for combinatorial\nproblems. To analyze the approximation ratios of GNNs, we bridge the gap between GNN theory and\nthe theory of distributed local algorithms. Here, distributed local algorithms are distributed algorithms\nthat use only a constant number of synchronous communication rounds [1, 10, 24]. Thanks to their\nrelationship with distributed local algorithms, we can elucidate the lower bound of the approximation\nratios of algorithms that GNNs can learn for combinatorial problems. As an example of our results, if\nthe input feature of each node is the node degree alone, no GNN can solve (\u0394 + 1\u2212 \u03b5)-approximation\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\ffor the minimum dominating set problem or (2 \u2212 \u03b5)-approximation for the minimum vertex cover\nproblem, where \u03b5 > 0 is any real number and \u0394 is the maximum node degree.\nIn addition, thanks to this relationship, we \ufb01nd vector-vector consistent GNNs (VVC-GNNs), which\nare a novel class of GNNs. VVC-GNNs have strictly stronger capability than existing GNNs and\nhave the same capability as a computational model of distributed local algorithms. Based on our\nkey \ufb01nding, we propose the consistent port numbering GNNs (CPNGNNs), which is the most\npowerful GNN model among VVC-GNNs. That is, for any graph problem that a VVC-GNN can\nsolve, there exists a parameter of CPNGNNs that can also solve it. Interestingly, CPNGNNs are\nstrictly more powerful than graph isomorphism networks (GIN), which were considered to be the\nmost powerful GNNs [30]. Furthermore, CPNGNNs achieve optimal approximation ratios among\nGNNs: CPNGNNs can solve (\u0394 + 1)-approximation for the minimum dominating set problem and\n2-approximation for the minimum vertex cover problem.\nHowever, these approximation ratios are unsatisfactory because they are as high as those of simple\ngreedy algorithms. One of the reasons for these high approximation ratios is that we only use node\ndegrees as node features. We show that adding coloring or weak coloring to each node feature\nstrengthens the capability of GNNs. For example, if we use weak 2-coloring as a node feature in\naddition to node degree, CPNGNNs can solve ( \u0394+1\n2 )-approximation for the minimum dominating\nset problem. Considering that any graph has weak 2-coloring and that we can easily calculate\nweak 2-coloring in linear time, it is interesting that such preprocessing and feature engineering can\ntheoretically strengthen the model capability.\nThe contributions of this paper are summarized as follows:\n\n\u2022 We reveal the relationships between the theory of GNNs and distributed local algorithms.\nNamely, we show that the set of graph problems that GNN classes can solve is the same as\nthe set of graph problems that distributed local algorithm classes can solve.\n\n\u2022 We propose CPNGNNs, which is the most powerful GNN among the proposed GNN class.\n\u2022 We elucidate the approximation ratios of GNNs for combinatorial problems including the\nminimum dominating set problem and the minimum vertex cover problem. This is the \ufb01rst\npaper to elucidate the approximation ratios of GNNs for combinatorial problems.\n\n2 Related Work\n\n2.1 Graph Neural Networks\n\nGNNs were \ufb01rst introduced by Gori et al. [8] and Scarselli et al. [22]. They obtained the node\nembedding by recursively applying the propagation function until convergence. Recently, Kipf\nand Welling [12] proposed graph convolutional networks (GCN), which signi\ufb01cantly outperformed\nexisting methods, including non-neural network-based approaches. Since then, many graph neural\nnetworks have been proposed, such as GraphSAGE [9] and the graph attention networks (GATs) [26].\nVinyals et al. [27] proposed pointer networks, which can solve combinatorial problems on a plane,\nsuch as the convex hull problem and the traveling salesman problem. Bello et al. [4] trained pointer\nnetworks using reinforcement learning to automatically obtain novel algorithms for these problems.\nNote that pointer networks are not GNNs. However, we introduce them here because they were the\n\ufb01rst to solve combinatorial problems using deep learning. Khalil et al. [11] and Li et al. [16] used\nGNNs to solve combinatorial problems. They utilized search methods with GNNs, whereas we use\nonly GNNs to focus on the capability of GNNs.\nXu et al. [30] analyzed the capability of GNNs. They showed that GNNs cannot solve the graph\nisomorphism problem and that the capability of GNNs is at most the same as that of the Weisfeiler-\nLehman graph isomorphism test. They also proposed the graph isomorphism networks (GIN),\nwhich are as powerful as the Weisfeiler-Lehman graph isomorphism test. Therefore, the GIN is\nthe most powerful GNNs. The motivation of this paper is the same as that of Xu et al.\u2019s work [30]\nbut we consider not only the graph isomorphism problem but also the minimum dominating set\nproblem, minimum vertex cover problem, and maximum matching problem. Furthermore, we \ufb01nd\nthe approximation ratios of these problems for the \ufb01rst time and propose GNNs more powerful than\nGIN.\n\n2\n\n\fAlgorithm 1 Calculating the embedding of a node using GNNs\nRequire: Graph G = (V, E, X); Parameters \u03b8; Aggregation function f (l)\nEnsure: Embedding of nodes z \u2208 Rn\u00d7dL+1\n1: z(1)\nv \u2190 xv (\u2200v \u2208 V )\n2: for l = 1, . . . , L do\nfor v \u2208 V do\n3:\nz(l+1)\nv \u2190 f (l)\n4:\nend for\n5:\n6: end for\n7: return z(L+1)\n\n\u03b8 (aggregated information from neighbor nodes of v)\n\n\u03b8 (l = 1, . . . , L).\n\n2.2 Distributed Local Algorithms\n\nA distributed local algorithm is a distributed algorithm that runs in constant time. More speci\ufb01cally, in\na distributed local algorithm, we assume each node has in\ufb01nite computational resources and decides\nthe output within a constant number of communication rounds with neighboring nodes. For example,\ndistributed local algorithms are used for controlling wireless sensor networks [13], constructing\nself-stabilization algorithms [14, 18], and building sublinear-time algorithms [20].\nDistributed local algorithms were \ufb01rst studied by Angluin [1], Linial [17], and Naor and Stockmeyer\n[18]. Angluin [1] showed that deterministic distributed algorithms cannot \ufb01nd a center of a graph\nwithout any unique node identi\ufb01ers. Linial [17] showed that no distributed local algorithms can solve\n3-coloring of cycles, and they require \u03a9(log\u2217 n) communication rounds for distributed algorithms to\nsolve the problem. Naor and Stockmeyer [18] showed positive results for distributed local algorithms\nfor the \ufb01rst time. For example, distributed local algorithms can \ufb01nd weak 2-coloring and solve a\nvariant of the dining philosophers problem. Later, several non-trivial distributed local algorithms\nwere found, including 2-approximation for the minimum vertex cover problem [2].\nThere are many computational models of distributed local algorithms. Some computational models\nuse unique identi\ufb01ers of nodes [18], port numbering [1], and randomness [19, 28], and other models\ndo not [10]. Furthermore, some results use the following assumptions about the input: degrees are\nbounded [2], degrees are odd [18], graphs are planar [6], and graphs are bipartite [3]. In this paper,\nwe do not use any unique identi\ufb01ers nor randomness, but we do use port numbering, and we assume\nthe degrees are bounded. We describe our assumptions in detail in Section 3.1.\n\n3 Preliminaries\n\n3.1 Problem Setting\n\nHere, we \ufb01rst describe the notation used in this paper and then we formulate the graph problem.\nNotation. For a positive integer k \u2208 Z+, let [k] be the set {1, 2, . . . , k}. Let G = (V, E, X) be a\ninput graph, where V is a set of nodes, E is a set of edges, and X \u2208 R|V |\u00d7d0 is a feature matrix. We\nrepresent an edge of a graph G = (V, E, X) as an unordered pair {u, v} with u, v \u2208 V . We write\nn = |V | for the number of nodes and m = |E| for the number of edges. The nodes V are considered\nto be numbered with [n]. (i.e., we assume V = [n].) For a node v \u2208 V , deg(u) denotes the degree of\nnode v and N (v) denotes the set of neighbors of node v.\nA GNN model N\u03b8(G, v) is a function parameterized by \u03b8 that takes a graph G and a node v \u2208\nV as input and output the label yv \u2208 Y of node v, where Y is a set of labels. We study the\nexpression capability of the function family N\u03b8 for combinatorial graph problems with the following\nassumptions.\n\nAssumption 1 (Bounded-Degree Graphs). In this paper, we consider only bounded-degree graphs.\nIn other words, for a \ufb01xed (but arbitrary) constant \u0394, we assume that the degree of each node of\nthe input graphs is at most \u0394. This assumption is natural because there are many bounded-degree\ngraphs in the real world. For example, degrees in molecular graphs are bounded by four, and the\ndegrees in computer networks are bounded by the number of LAN ports of routers. Moreover, the\n\n3\n\n\fbounded-degree assumption is often used in distributed local algorithms [17, 18, 24]. For each\npositive integer \u0394 \u2208 Z+, let F(\u0394) be the set of all graphs with maximum degrees of \u0394 at most.\nAssumption 2 (Node Features). We do not consider node features other than those that can be\nderived from the input graph itself for focusing on graph theoretic properties. When there are no node\nfeatures available, the degrees of nodes are sometimes used [9, 21, 30]. Therefore, we use only the\ndegree of a node as the node feature (i.e., z(1)\nv = ONEHOT(deg(v))) unless speci\ufb01ed. Later, we\nshow that using coloring or weak coloring of the input graph in addition to degrees of nodes as node\nfeatures makes models theoretically more powerful.\n\nGraph Problems. A graph problem is a function \u03a0 that associates a set \u03a0(G) of solutions with\neach graph G = (V, E). Each solution S \u2208 \u03a0(G) is a function S : V \u2192 Y . Y is a \ufb01nite set that is\nindependent of G. We say a GNN model N\u03b8 solves a graph problem \u03a0 if for any \u0394 \u2208 Z+, there\nexists a parameter \u03b8 such that for any graph G \u2208 F(\u0394), N\u03b8(G,\u00b7) is in \u03a0(G). For example, let Y\nbe a set of labels of nodes, let L(G) : V \u2192 Y be the ground truth of a multi-label classi\ufb01cation\nproblem for a graph G (i.e., L(G)(v) denotes the ground truth label of node v \u2208 V ), and let \u03a0(G) =\n{f : V \u2192 {0, 1} | |{v \u2208 V | f (v) = L(G)(v)}| \u2265 0.9 \u00b7 |V |}. This graph problem \u03a0 corresponds to\na multi-label classi\ufb01cation problem. A GNN model N\u03b8 solves \u03a0 means there exists a parameter \u03b8 of\nthe model such that achieves an accuracy 0.9 for this problem. Other examples of graph problems are\ncombinatorial problems. Let C(G) \u2282 V be the minimum vertex cover of a graph G, let Y = {0, 1},\nand let \u03a0(G) = {f : V \u2192 {0, 1} | D = {v | f (v) = 1} is a vertex cover and |D| \u2264 2 \u00b7 |C(G)|}.\nThis graph problem \u03a0 corresponds to 2-approximation for the minimum vertex cover problem.\n\n3.2 Known Model Classes\n\nWe introduce two known classes of GNNs, which include GraphSAGE [9], GCN [12], GAT [26],\nand GIN [30].\n\nMB-GNNs. A layer of an existing GNN can be written as\n\nz(l+1)\nv\n\n= f (l)\n\n\u03b8 (z(l)\n\nv , MULTISET(z(l)\n\nu | u \u2208 N (v))),\n\nwhere f (l)\n\u03b8 is a learnable aggregation function. We call GNNs that can be written in this form multiset-\nbroadcasting GNNs (MB-GNNs) \u2014 multiset because they aggregate features from neighbors as a\nmultiset and broadcasting because for any v \u2208 N (u), the \u201cmessage\u201d [7] from u to v is the same (i.e.,\nzu). GraphSAGE-mean [9] is an example of MB-GNNs because a layer of GraphSAGE-mean is\nrepresented by the following equation:\n\nz(l+1)\nv\n\n= CONCAT(z(l)\nv ,\n\nW (l)z(l)\n\nu ),\n\n1\n\n|N (v)| \ufffdu\u2208N (v)\n\nwhere CONCAT concatenates vectors into one vector. Other examples of MB-GNNs are GCN [12],\nGAT [26], and GIN [30].\n\nSB-GNNs. The another existing class of GNNs in the literature is set-broadcasting GNNs (SB-GNNs),\nwhich can be written as the following form:\n\nGraphSAGE-pool [9] is an example of SB-GNNs because a layer of GraphSAGE-mean is represented\nby the following equation:\n\nz(l+1)\nv\n\n= f (l)\n\n\u03b8 (z(l)\n\nv , SET(z(l)\n\nu | u \u2208 N (v))).\n\nz(l+1)\nv\n\n= max({\u03c3(W (l)z(l)\n\nu + b(l)) | u \u2208 N (v)}).\n\nClearly, SB-GNNs are a subclass of MB-GNNs. Xu et al. [30] discussed the differences in capability\nof SB-GNNs and MB-GNNs. We show that MB-GNNs are strictly stronger than SB-GNNs in another\nway in this paper.\n\n4 Novel Class of GNNs\n\nIn this section, we \ufb01rst introduce a GNN class that is more powerful than MB-GNNs and SB-GNNs.\nTo make GNN models more powerful than MB-GNNs, we introduce the concept of port numbering\n[1, 10] to GNNs.\n\n4\n\n\fPort Numbering. A port of a graph G is a pair (v, i), where v \u2208 V and i \u2208 [deg(v)]. Let\nP (G) = {(v, i) | v \u2208 V, i \u2208 [deg(v)]} be the set of all ports of a graph G. A port numbering of a\ngraph G is the function p : P (G) \u2192 P (G) such that for any edge {u, v}, there exist i \u2208 [deg(u)] and\nj \u2208 [deg(v)] such that p(u, i) = (v, j). We say that a port numbering is consistent if p is an involution\n(i.e., \u2200(v, i) \u2208 P (G) p(p(v, i)) = (v, i)). We de\ufb01ne the functions ptail : V \u00d7 \u0394 \u2192 V \u222a {\u2212} and\npn : V \u00d7 \u0394 \u2192 \u0394 \u222a {\u2212} as follows:\n\nptail(v, i) =\ufffdu \u2208 V (\u2203j \u2208 [deg(u)] s.t. p(u, j) = (v, i))\npn(v, i) =\ufffdj \u2208 [deg(ptail(v, i))] (p(ptail(v, i), j) = (v, i))\n\n\u2212\n\n\u2212\n\n(i \u2264 deg(v))\n(otherwise),\n(i \u2264 deg(v))\n(otherwise),\n\nwhere \u2212 is a special symbol that denotes the index being out of range. Note that these functions are\nwell-de\ufb01ned because there always exists only one u \u2208 V for ptail and j \u2208 [deg(ptail(v, i))] for pn if\ni \u2264 deg(v). Intuitively, ptail(v, i) represents the node that sends messages to the port i of node v and\npn(v, i) represents the port number of the node ptail(v, i) that sends messages to the port i of node v.\nThe GNN class we introduce in the following uses a consistent port numbering to calculate embed-\ndings. Intuitively, SB-GNNs and MB-GNNs send the same message to all neighboring nodes. GNNs\ncan send different messages to neighboring nodes by using port numbering, and this strengthens\nmodel capability.\n\nVVC-GNNs. Vector-vector consistent GNNs (VVC-GNNs) are a novel class of GNNs that we\nintroduce in this paper. They calculate an embedding with the following formula:\n\nz(l+1)\nv\n\n= f (l)\n\n\u03b8 (z(l)\n\nv , z(l)\n\nptail(v,1), pn(v, 1), z(l)\n\nptail(v,2), pn(v, 2), . . . , z(l)\n\nptail(v,\u0394), pn(v, \u0394)).\n\nIf the index of z is the special symbol \u2212, we also de\ufb01ne the embedding as the special symbol \u2212\n(i.e., z\u2212 = \u2212). To calculate embeddings of nodes of a graph G using a GNN with port numbering,\nwe \ufb01rst calculate one consistent port numbering p of G, and then we input G and p to the GNN.\nNote that we can calculate a consistent port numbering of a graph in linear time by numbering\nedges one by one. We say a GNN class N with port numbering solves a graph problem \u03a0 if for\nany \u0394 \u2208 Z+, there exists a GNN N\u03b8 \u2208 N and its parameter \u03b8 such that for any graph G \u2208 F(\u0394),\nfor any consistent port numbering p of G, the output N\u03b8(G, p,\u00b7) is in \u03a0(G). We show that using\nport numbering theoretically improves model capability in Section 5.2. We propose CPNGNNs, an\nexample of VVC-GNNs, in Section 6.\n\n5 GNNs with Distributed Local Algorithms\n\nIn this section, we discuss the relationship between GNNs and distributed local algorithms. Thanks\nto this relationship, we can elucidate the theoretical properties of GNNs.\n\n5.1 Relationship with Distributed Local Algorithms\n\nA distributed local algorithm is a distributed algorithm that runs in constant time. More speci\ufb01cally, in\na distributed local algorithm, we assume each node has in\ufb01nite computational resources and decides\nthe output within a constant number of communication rounds with neighboring nodes. In this paper,\nwe show a clear relationship between distributed local algorithms and GNNs for the \ufb01rst time.\nThere are several well-known models of distributed local algorithms [10]. Namely, in this paper,\nwe introduce the SB(1), MB(1), and VVC(1) models. As their names suggest, they correspond to\nSB-GNNs, MB-GNNs, and VVC-GNNs, respectively.\nAssumption 3 (Finite Node Features): The number of possible node features is \ufb01nite.\nAssumption 3 restricts node features be discrete. However, Assumption 3 does include the node\ndegree feature (\u2208 [\u0394]) and node coloring feature (\u2208 {0, 1}).\nTheorem 1. Let L be SB, MB, or VVC. Under Assumption 3, the set of graph problems that at least\none L-GNN can solve is the same as the set of graph problems that at least one distributed local\nalgorithm on the L(1) model solve.\n\n5\n\n\fRdl+1\u00d7(dl+\u0394(dl+1))(l = 1, . . . , L).\n\nAlgorithm 2 CPNGNN: The most powerful VVC-GNN\nRequire: Graph G = (V, E, X); Maximum degree \u0394 \u2208 Z+; Weight matrix W (l) \u2208\nEnsure: Output for the graph problem y \u2208 Y n\n1: calculate a consistent port numbering p\n2: z(1)\nv \u2190 xv (\u2200v \u2208 V )\n3: for l = 1, . . . , L do\nfor v \u2208 V do\n4:\n5:\n\nv , z(l)\n\nptail(v,1), pn(v, 1), z(l)\n\nptail(v,2), pn(v, 2), . . . , z(l)\n\nptail(v,\u0394), pn(v, \u0394))\n\nz(l+1)\nv \u2190 W (l) CONCAT(z(l)\nz(l+1)\nv \u2190 RELU(z(l+1)\nend for\n\n)\n\nv\n\n6:\n7:\n8: end for\n9: for v \u2208 V do\n10:\n11:\n12: end for\n13: return y\n\nzv \u2190 MULTILAYERPERCEPTRON(z(L+1)\nyv \u2190 argmaxi\u2208[dL+1]zvi\n\nv\n\n)\n\n# calculate the \ufb01nal embedding of a node v.\n# output the index of the maximum element.\n\nAll proofs are available in the supplementary materials. In fact, the following stronger properties\nhold: (i) any L-GNN can be simulated by the L(1) model and (ii) any distributed local algorithm on\nL(1) model can be simulated by an L-GNN. The former is obvious because GNNs communicate\nwith neighboring nodes in L rounds, where L is the number of layers. The latter is natural because\nthe de\ufb01nition of L-GNNs (Section 3.2 and 4) is intrinsically the same as the de\ufb01nition of the L(1)\nmodel. Thanks to Theorem 1, we can prove which combinatorial problems GNNs can/cannot solve\nby using theoretical results on distributed local algorithms.\n\n5.2 Hierarchy of GNNs\n\nThere are obvious inclusion relations among classes of GNNs. Namely, SB-GNNs are a subclass of\nMB-GNNs, and MB-GNNs are a subclass of VVC-GNNs. If a model class A is a subset of a model\nclass B, the graph problems that A solves is a subset of the graph problems that B solves. However,\nit is not obvious whether the proper inclusion property holds or not. Let PSB-GNNs, PMB-GNNs, and\nPVVC-GNNs be the sets of graph problems that SB-GNNs, MB-GNNs, and VVC-GNNs can solve only\nwith the degree features, respectively. Thanks to the relationship between GNNs and distributed local\nalgorithms, we can show that the proper inclusion properties of these classes hold.\nTheorem 2. PSB-GNNs \ufffd PMB-GNNs \ufffd PVVC-GNNs.\nAn example graph problem that MB-GNNs cannot solve but VVC-GNNs can solve is the \ufb01nding\nsingle leaf problem [10]. The input graphs of the problem are star graphs and the ground truth contains\nonly a single leaf node. MB-GNNs cannot solve this problem because for each layer, the embeddings\nof the leaf nodes are exactly same, and the GNN cannot distinguish these nodes. Therefore, if a GNN\nincludes one leaf node in the output, the other leaf nodes are also included to the output. On the other\nhand, VVC-GNNs can distinguish each leaf node using port numbering and can appropriately output\nonly a single node. We con\ufb01rm this fact through experiments in the supplementary materials.\n\n6 Most Powerful GNN for Combinatorial Problems\n\n6.1 Consistent Port Numbering Graph Neural Networks (CPNGNNs)\n\nIn this section, we propose the most powerful VVC-GNNs, CPNGNNs. The most similar algorithm\nto CPNGNNs is GraphSAGE [9]. The key differences between GraphSAGE and CPNGNNs are as\nfollows: (i) CPNGNNs use port numbering and (ii) GPNGNNs aggregate features of neighbors by\nconcatenation. We show pseudo code of CPNGNNs in Algorithm 2. Though CPNGNNs are simple,\nthey are the most powerful among VVC-GNNs. This claim is supported by Theorem 3, where we do\nnot limit node features to the node degree feature.\n\n6\n\n\fTheorem 3. Let PCPNGNNs be the set of graph problems that CPNGNNs can solve and PVVC-GNNs\nbe the set of graph problems that VVC-GNNs can solve. Then, under Appsumtion 3, PCPNGNNs =\nPVVC-GNNs.\nThe advantages of CPNGNNs are twofold: they can solve a strictly wider set of graph problems than\nexisting models (Theorem 2 and 3). There are many distributed local algorithms that can be simulated\nby CPNGNNs and we can prove that CPNGNNs can solve a variety of combinatorial problems (see\nSection 6.2).\n\n6.2 Combinatorial Problems that CPNGNNs Can/Cannot Solve\n\nIn Section 5.2, we found that there exist graph problems that certain GNNs can solve but others cannot.\nHowever, there remains a question. What kind of graph problems can/cannot GNNs solve? In this\npaper, we study combinatorial problems, including the minimum dominating set problem, maximum\nmatching problem, and minimum vertex cover problem. If GNNs can solve combinatorial problems,\nwe may automatically obtain new algorithms for combinatorial problems by simply training GNNs.\nNote that from Theorems 2 and 3, if CPNGNNs cannot solve a graph problem, other GNNs cannot\nsolve the problem. Therefore, it is important to investigate the capability of GPNGNNs to study the\nlimitations of GNNs.\n\nMinimum Dominating Set Problem. First, we investigate the minimum dominating set problem.\nTheorem 4. The optimal approximation ratio of CPNGNNs for the minimum dominating set problem\nis (\u0394 + 1). In other words, CPNGNNs can solve (\u0394 + 1)-approximation for the minimum dominating\nset problem, but for any 1 \u2264 \u03b1 < \u0394 + 1, CPNGNNs cannot solve \u03b1-approximation for the minimum\ndominating set problem.\n\nHere, CPNGNNs can solve f (\u0394) approximation for the minimum dominating set problem means\nthat for all \u0394 \u2208 Z+, there exists a paramter \u03b8 such that for all input G \u2208 F(\u0394), {v \u2208 V |\nCPNGNN\u03b8(G, v) = 1} forms f (\u0394) approximatoin of the minimum dominating set of G. However,\n(\u0394 + 1)-approximation is trivial because it can be achieved by outputting all the nodes. Therefore,\nTheorem 4 says that any GNN is as bad as the trivial algorithm in the worst case, which is unsatisfac-\ntory. This is possibly because we only use the degree information of local nodes, and we may improve\nthe approximation ratio if we use information other than node degree. Interestingly, we can improve\nthe approximation ratio just by using weak 2-coloring as a feature of nodes. A weak 2-coloring is a\nfunction c : V \u2192 {0, 1} such that for any node v \u2208 V , there exists a neighbor u \u2208 N (v) such that\nc(v) \ufffd= c(u). Note that any graph has a weak 2-coloring and that we can calculate a weak 2-coloring\nin linear time by a breadth-\ufb01rst search. In the theorems below, we use not only the degree deg(v) but\nalso the color c(v) as a feature vector of a node v \u2208 V . There may be many weak 2-colorings of a\ngraph G. However, the choice of c is arbitrary.\nTheorem 5. If the feature vector of a node is consisted of the degree and the color of a weak\n2-coloring, the optimal approximation ratio of CPNGNNs for the minimum dominating set problem\nis ( \u0394+1\n2 )-approximation for the minimum dominating\n2 , CPNGNN cannot solve \u03b1-approximation for the minimum\nset problem, and for any 1 \u2264 \u03b1 < \u0394+1\ndominating set problem.\n\n2 ). In other words, CPNGNN can solve ( \u0394+1\n\nIn the minimum dominating set problem, we cannot improve the approximation ratio by using\n2-coloring instead of weak 2-coloring.\nTheorem 6. Even if the feature vector of a node is consisted of the degree and the color of a\n2-coloring, for any 1 \u2264 \u03b1 < \u0394+1\n2 , CPNGNNs cannot solve \u03b1-approximation for the minimum\ndominating set problem.\n\nMinimum Vertex Cover Problem. Next, we investigate the minimum vertex cover problem.\nTheorem 7. The optimal approximation ratio of CPNGNNs for the minimum vertex cover problem\nis 2. In other words, CPNGNNs can solve 2-approximation for the minimum vertex cover problem,\nand for any 1 \u2264 \u03b1 < 2, CPNGNNs cannot solve \u03b1-approximation for the minimum vertex cover\nproblem.\n\nThe simple greedy algorithm can solve 2-approximation for the minimum vertex cover problem.\nHowever, this result is not trivial because the algorithm that GNNs learn is not a regular algorithm but\n\n7\n\n\fa distributed local algorithm. The distributed local algorithm for 2-approximation for the minimum\nvertex cover problem is known but not so simple [2]. This result also says that if one wants to\n\ufb01nd an approximation algorithm using a machine learning approach with better performance than\n2-approximation, they must use a non-GNN model or combine GNNs with other methods (e.g., a\nsearch method).\n\nMaximum Matching Problem. Lastly, we investigate the maximum matching problem. So far, we\nhave only investigated problems on nodes, not edges. We must specify how GNNs output edge labels.\nGraph edge problems are de\ufb01ned similarly to graph problems, but their solutionas are functions\nE \u2192 Y . In this paper, we only consider Y = {0, 1} and we only use VVC-GNNs for solving graph\nedge problems. Let G \u2208 F(\u0394) be a graph and p be a port numbering of G. To solve graph edge\nproblems, GNNs output a vector y(v) \u2208 {0, 1}\u0394 for each node v \u2208 V . For each edge {u, v}, GNNs\ninclude the edge {u, v} in the output if and only if y(u)i = y(v)j = 1, where p(u, i) = (v, j) and\np(v, j) = (u, i). Intuitively, each node outputs \u201cyes\u201d or \u201cno\u201d to each incident edge (i.e., a port) and\nwe include an edge in the output if both ends output \u201cyes\u201d to the edge. As with graph problems,\nwe say a class N of GNNs solves a graph edge problem \u03a0 if for any \u0394 \u2208 Z+, there exists a GNN\nN\u03b8 \u2208 N and its parameter \u03b8 such that for any graph G \u2208 F(\u0394) and any consistent port numbering p\nof G, the output N\u03b8(G, p) is in \u03a0(G).\nWe investigate the maximum matching problem in detail. In fact, GNNs cannot solve the maximum\nmatching problem at all.\nTheorem 8. For any \u03b1 \u2208 R+, CPNGNNs that cannot solve \u03b1-approximation for the maximum\nmatching problem.\n\nHowever, CPNGNNs can approximate the maximum matching problem with weak 2-coloring feature.\nTheorem 9. If the feature vector of a node is consisted of the degree and the color of a weak 2-\ncoloring, the optimal approximation ratio of CPNGNNs for the maximum matching problem is ( \u0394+1\n2 ).\nIn other words, CPNGNNs can solve ( \u0394+1\n2 )-approximation for the maximum matching problem,\n2 , CPNGNNs cannot solve \u03b1-approximation for the maximum matching\nand for any 1 \u2264 \u03b1 < \u0394+1\nproblem.\n\nFurthermore, if we use 2-coloring instead of weak 2-coloring, we can improve the approximation\nratio. In fact, it can achieve any approximation ratio. Note that only a bipartite graph has 2-coloring.\nTherefore, the graph class is implicitly restricted to bipartite graphs in this case.\nTheorem 10. If the feature vector of a node is consisted of the degree and the color of a 2-coloring,\nfor any 1 < \u03b1, CPNGNNs can solve \u03b1-approximation for the maximum matching problem.\n\nIn this paper, we consider only bounded-degree graphs. This assumption is natural, but it is also\nimportant to consider graphs without degree bounds. Dealing with such graphs is dif\ufb01cult because\ngraph problems on them are not constant size [24]. Note that solving graph problems becomes\nmore dif\ufb01cult if we do not have the bounded-degree assumption. Therefore, GNNs cannot solve\n(\u0394 + 1 \u2212 \u03b5)-approximation for the minimum dominating set problems or (2 \u2212 \u03b5)-approximation for\nthe minimum vertex cover problem in the general case.\n\n7 Conclusion\n\nIn this paper, we introduced VVC-GNNs, which are a new class of GNNs, and CPNGNNs, which\nare an example of VVC-GNNs. We showed that VVC-GNNs have the same ability to solve graph\nproblems as a computational model of distributed local algorithms. With the aid of distributed local\nalgorithm theory, we elucidated the approximation ratios of algorithms that CPNGNNs can learn for\ncombinatorial graph problems such as the minimum dominating set problem and the minimum vertex\ncover problem. This paper is the \ufb01rst to show the approximation ratios of GNNs for combinatorial\nproblems. Moreover, this is a lower bound of approximation ratios for all GNNs. We further showed\nthat adding coloring or weak coloring to a node feature improves these approximation ratios. This\nindicates that preprocessing and feature engineering theoretically strengthen model capability.\n\n8\n\n\fAcknowledgments\nThis work was supported by JSPS KAKENHI Grant Number 15H01704. MY is supported by the\nJST PRESTO program JPMJPR165A.\n\nReferences\n[1] Dana Angluin. Local and global properties in networks of processors (extended abstract). In\nProceedings of the 12th Annual ACM Symposium on Theory of Computing, pages 82\u201393, 1980.\n\n[2] Matti \u00c5strand, Patrik Flor\u00e9en, Valentin Polishchuk, Joel Rybicki, Jukka Suomela, and Jara\nUitto. A local 2-approximation algorithm for the vertex cover problem. In Proceedings of 23rd\nInternational Symposium on Distributed Computing, DISC 2009, pages 191\u2013205, 2009.\n\n[3] Matti \u00c5strand, Valentin Polishchuk, Joel Rybicki, Jukka Suomela, and Jara Uitto. Local\n\nalgorithms in (weakly) coloured graphs. CoRR, abs/1002.0125, 2010.\n\n[4] Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. Neural combi-\n\nnatorial optimization with reinforcement learning. CoRR, abs/1611.09940, 2016.\n\n[5] George Cybenko. Approximation by superpositions of a sigmoidal function. MCSS, 2(4):303\u2013\n\n314, 1989.\n\n[6] Andrzej Czygrinow, Michal Hanckowiak, and Wojciech Wawrzyniak. Fast distributed approx-\nimations in planar graphs. In Proceedings of 22nd International Symposium on Distributed\nComputing, DISC 2008, pages 78\u201392, 2008.\n\n[7] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl.\nIn Proceedings of the 34th International\n\nNeural message passing for quantum chemistry.\nConference on Machine Learning, ICML 2017, pages 1263\u20131272, 2017.\n\n[8] Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph\ndomains. In Proceedings of the International Joint Conference on Neural Networks, IJCNN\n2005, volume 2, pages 729\u2013734, 2005.\n\n[9] William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on\nlarge graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on\nNeural Information Processing Systems 2017, NIPS 2017, pages 1025\u20131035, 2017.\n\n[10] Lauri Hella, Matti J\u00e4rvisalo, Antti Kuusisto, Juhana Laurinharju, Tuomo Lempi\u00e4inen, Kerkko\nLuosto, Jukka Suomela, and Jonni Virtema. Weak models of distributed computing, with\nconnections to modal logic. In Proceedings of the ACM Symposium on Principles of Distributed\nComputing, PODC 2012, pages 185\u2013194, 2012.\n\n[11] Elias B. Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial\noptimization algorithms over graphs. In Advances in Neural Information Processing Systems\n30: Annual Conference on Neural Information Processing Systems 2017, NIPS 2017, pages\n6351\u20136361, 2017.\n\n[12] Thomas N. Kipf and Max Welling. Semi-supervised classi\ufb01cation with graph convolutional\n\nnetworks. CoRR, abs/1609.02907, 2016.\n\n[13] Martin Kubisch, Holger Karl, Adam Wolisz, Lizhi Charlie Zhong, and Jan M. Rabaey. Dis-\ntributed algorithms for transmission power control in wireless sensor networks. In Proceedings\nof the 2003 IEEE Wireless Communications and Networking, WCNC 2003, pages 558\u2013563,\n2003.\n\n[14] Christoph Lenzen, Jukka Suomela, and Roger Wattenhofer. Local algorithms: Self-stabilization\non speed. In Proceedings of 11th International Symposium on Stabilization, Safety, and Security\nof Distributed Systems, SSS 2009, pages 17\u201334, 2009.\n\n[15] Christoph Lenzen and Roger Wattenhofer. Leveraging linial\u2019s locality limit. In Proceedings of\n22nd International Symposium on Distributed Computing, DISC 2008, pages 394\u2013407, 2008.\n\n9\n\n\f[16] Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Combinatorial optimization with graph convolu-\ntional networks and guided tree search. In Advances in Neural Information Processing Systems\n31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, pages\n537\u2013546, 2018.\n\n[17] Nathan Linial. Locality in distributed graph algorithms. SIAM J. Comput., 21(1):193\u2013201,\n\n1992.\n\n[18] Moni Naor and Larry J. Stockmeyer. What can be computed locally? SIAM J. Comput.,\n\n24(6):1259\u20131277, 1995.\n\n[19] Huy N. Nguyen and Krzysztof Onak. Constant-time approximation algorithms via local\nimprovements. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer\nScience, FOCS, pages 327\u2013336, 2008.\n\n[20] Michal Parnas and Dana Ron. Approximating the minimum vertex cover in sublinear time and\n\na connection to distributed algorithms. Theor. Comput. Sci., 381(1-3):183\u2013196, 2007.\n\n[21] Leonardo Filipe Rodrigues Ribeiro, Pedro H. P. Saverese, and Daniel R. Figueiredo. struc2vec:\nLearning node representations from structural identity.\nIn Proceedings of the 23rd ACM\nSIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017,\npages 385\u2013394, 2017.\n\n[22] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini.\n\nThe graph neural network model. IEEE Trans. Neural Networks, 20(1):61\u201380, 2009.\n\n[23] Michael Sejr Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov,\nand Max Welling. Modeling relational data with graph convolutional networks. CoRR,\nabs/1703.06103, 2017.\n\n[24] Jukka Suomela. Survey of local algorithms. ACM Comput. Surv., 45(2):24:1\u201324:40, 2013.\n\n[25] Vijay V. Vazirani. Approximation algorithms. Springer, 2001.\n\n[26] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li\u00f2, and Yoshua\nBengio. Graph attention networks. In Proceedings of the 6th International Conference on\nLearning Representations, ICLR 2018, 2018.\n\n[27] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural\nInformation Processing Systems 28: Annual Conference on Neural Information Processing\nSystems 2015, NIPS 2015, pages 2692\u20132700, 2015.\n\n[28] Mirjam Wattenhofer and Roger Wattenhofer. Distributed weighted matching. In Proceedings of\n18th International Symposium on Distributed Computing, DISC 2004, pages 335\u2013348, 2004.\n\n[29] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforce-\n\nment learning. Mach. Learn., 8(3-4):229\u2013256, 1992.\n\n[30] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural\n\nnetworks? CoRR, abs/1810.00826, 2018.\n\n[31] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure\nLeskovec. Graph convolutional neural networks for web-scale recommender systems.\nIn\nProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &\nData Mining, KDD 2018, pages 974\u2013983, 2018.\n\n10\n\n\f", "award": [], "sourceid": 2252, "authors": [{"given_name": "Ryoma", "family_name": "Sato", "institution": "Kyoto University"}, {"given_name": "Makoto", "family_name": "Yamada", "institution": "Kyoto University/RIKEN AIP"}, {"given_name": "Hisashi", "family_name": "Kashima", "institution": "Kyoto University/RIKEN Center for AIP"}]}