{"title": "Link Prediction Based on Graph Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 5165, "page_last": 5175, "abstract": "Link prediction is a key problem for network-structured data. Link prediction heuristics use some score functions, such as common neighbors and Katz index, to measure the likelihood of links. They have obtained wide practical uses due to their simplicity, interpretability, and for some of them, scalability. However, every heuristic has a strong assumption on when two nodes are likely to link, which limits their effectiveness on networks where these assumptions fail. In this regard, a more reasonable way should be learning a suitable heuristic from a given network instead of using predefined ones. By extracting a local subgraph around each target link, we aim to learn a function mapping the subgraph patterns to link existence, thus automatically learning a ``heuristic'' that suits the current network. In this paper, we study this heuristic learning paradigm for link prediction. First, we develop a novel $\\gamma$-decaying heuristic theory. The theory unifies a wide range of heuristics in a single framework, and proves that all these heuristics can be well approximated from local subgraphs. Our results show that local subgraphs reserve rich information related to link existence. Second, based on the $\\gamma$-decaying theory, we propose a new method to learn heuristics from local subgraphs using a graph neural network (GNN). Its experimental results show unprecedented performance, working consistently well on a wide range of problems.", "full_text": "Link Prediction Based on Graph Neural Networks\n\nMuhan Zhang\n\nDepartment of CSE\n\nWashington University in St. Louis\n\nmuhan@wustl.edu\n\nYixin Chen\n\nDepartment of CSE\n\nWashington University in St. Louis\n\nchen@cse.wustl.edu\n\nAbstract\n\nLink prediction is a key problem for network-structured data. Link prediction\nheuristics use some score functions, such as common neighbors and Katz index,\nto measure the likelihood of links. They have obtained wide practical uses due to\ntheir simplicity, interpretability, and for some of them, scalability. However, every\nheuristic has a strong assumption on when two nodes are likely to link, which\nlimits their effectiveness on networks where these assumptions fail. In this regard,\na more reasonable way should be learning a suitable heuristic from a given network\ninstead of using prede\ufb01ned ones. By extracting a local subgraph around each target\nlink, we aim to learn a function mapping the subgraph patterns to link existence,\nthus automatically learning a \u201cheuristic\u201d that suits the current network. In this\npaper, we study this heuristic learning paradigm for link prediction. First, we\ndevelop a novel -decaying heuristic theory. The theory uni\ufb01es a wide range of\nheuristics in a single framework, and proves that all these heuristics can be well\napproximated from local subgraphs. Our results show that local subgraphs reserve\nrich information related to link existence. Second, based on the -decaying theory,\nwe propose a new method to learn heuristics from local subgraphs using a graph\nneural network (GNN). Its experimental results show unprecedented performance,\nworking consistently well on a wide range of problems.\n\n1\n\nIntroduction\n\nLink prediction is to predict whether two nodes in a network are likely to have a link [1]. Given the\nubiquitous existence of networks, it has many applications such as friend recommendation [2], movie\nrecommendation [3], knowledge graph completion [4], and metabolic network reconstruction [5].\nOne class of simple yet effective approaches for link prediction is called heuristic methods. Heuristic\nmethods compute some heuristic node similarity scores as the likelihood of links [1, 6]. Existing\nheuristics can be categorized based on the maximum hop of neighbors needed to calculate the\nscore. For example, common neighbors (CN) and preferential attachment (PA) [7] are \ufb01rst-order\nheuristics, since they only involve the one-hop neighbors of two target nodes. Adamic-Adar (AA) and\nresource allocation (RA) [8] are second-order heuristics, as they are calculated from up to two-hop\nneighborhood of the target nodes. We de\ufb01ne h-order heuristics to be those heuristics which require\nknowing up to h-hop neighborhood of the target nodes. There are also some high-order heuristics\nwhich require knowing the entire network. Examples include Katz, rooted PageRank (PR) [9], and\nSimRank (SR) [10]. Table 3 in Appendix A summarizes eight popular heuristics.\nAlthough working well in practice, heuristic methods have strong assumptions on when links may\nexist. For example, the common neighbor heuristic assumes that two nodes are more likely to connect\nif they have many common neighbors. This assumption may be correct in social networks, but is\nshown to fail in protein-protein interaction (PPI) networks \u2013 two proteins sharing many common\nneighbors are actually less likely to interact [11].\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fB\n\nA\n\nD\n\nC\n\nB\n\n?\n\nA\n\nExtract enclosing \n\nsubgraphs\n\nD\n\n?\n\nC\n\nGraph neural network\n\ncommon neighbors = 3\nJaccard = 0.6\npreferential attachment = 16\n\nKatz \u2248\t0.03\n\u2026\u2026\nKatz \u2248\t0.001\n\u2026\u2026\n\nLearn graph structure features\n\ncommon neighbors = 0\nJaccard = 0\npreferential attachment = 8\n\n1 (link)\n\nPredict links\n\n0 (non-link)\n\nFigure 1: The SEAL framework. For each target link, SEAL extracts a local enclosing subgraph around it, and\nuses a GNN to learn general graph structure features for link prediction. Note that the heuristics listed inside the\nbox are just for illustration \u2013 the learned features may be completely different from existing heuristics.\n\nIn fact, the heuristics belong to a more generic class, namely graph structure features. Graph structure\nfeatures are those features located inside the observed node and edge structures of the network, which\ncan be calculated directly from the graph. Since heuristics can be viewed as prede\ufb01ned graph structure\nfeatures, a natural idea is to automatically learn such features from the network. Zhang and Chen\n[12] \ufb01rst studied this problem. They extract local enclosing subgraphs around links as the training\ndata, and use a fully-connected neural network to learn which enclosing subgraphs correspond to\nlink existence. Their method called Weisfeiler-Lehman Neural Machine (WLNM) has achieved\nstate-of-the-art link prediction performance. The enclosing subgraph for a node pair (x, y) is the\nsubgraph induced from the network by the union of x and y\u2019s neighbors up to h hops. Figure 1\nillustrates the 1-hop enclosing subgraphs for (A, B) and (C, D). These enclosing subgraphs are very\ninformative for link prediction \u2013 all \ufb01rst-order heuristics such as common neighbors can be directly\ncalculated from the 1-hop enclosing subgraphs.\nHowever, it is shown that high-order heuristics such as rooted PageRank and Katz often have much\nbetter performance than \ufb01rst and second-order ones [6]. To effectively learn good high-order features,\nit seems that we need a very large hop number h so that the enclosing subgraph becomes the entire\nnetwork. This results in unaffordable time and memory consumption for most practical networks.\nBut do we really need such a large h to learn high-order heuristics?\nFortunately, as our \ufb01rst contribution, we show that we do not necessarily need a very large h to\nlearn high-order graph structure features. We dive into the inherent mechanisms of link prediction\nheuristics, and \ufb01nd that most high-order heuristics can be uni\ufb01ed by a -decaying theory. We prove\nthat, under mild conditions, any -decaying heuristic can be effectively approximated from an h-hop\nenclosing subgraph, where the approximation error decreases at least exponentially with h. This\nmeans that we can safely use even a small h to learn good high-order features. It also implies that the\n\u201ceffective order\u201d of these high-order heuristics is not that high.\nBased on our theoretical results, we propose a novel link prediction framework, SEAL, to learn general\ngraph structure features from local enclosing subgraphs. SEAL \ufb01xes multiple drawbacks of WLNM.\nFirst, a graph neural network (GNN) [13, 14, 15, 16, 17] is used to replace the fully-connected neural\nnetwork in WLNM, which enables better graph feature learning ability. Second, SEAL permits\nlearning from not only subgraph structures, but also latent and explicit node features, thus absorbing\nmultiple types of information. We empirically veri\ufb01ed its much improved performance.\nOur contributions are summarized as follows. 1) We present a new theory for learning link prediction\nheuristics, justifying learning from local subgraphs instead of entire networks. 2) We propose SEAL,\na novel link prediction framework based on GNN (illustrated in Figure 1). SEAL outperforms all\nheuristic methods, latent feature methods, and recent network embedding methods by large margins.\nSEAL also outperforms the previous state-of-the-art method, WLNM.\n\n2 Preliminaries\n\nNotations Let G = (V, E) be an undirected graph, where V is the set of vertices and E \u2713 V \u21e5 V\nis the set of observed links. Its adjacency matrix is A, where Ai,j = 1 if (i, j) 2 E and Ai,j = 0\n\n2\n\n\fotherwise. For any nodes x, y 2 V , let (x) be the 1-hop neighbors of x, and d(x, y) be the shortest\npath distance between x and y. A walk w = hv0,\u00b7\u00b7\u00b7 , vki is a sequence of nodes with (vi, vi+1) 2 E.\nWe use |hv0,\u00b7\u00b7\u00b7 , vki| to denote the length of the walk w, which is k here.\nLatent features and explicit features Besides graph structure features, latent features and explicit\nfeatures are also studied for link prediction. Latent feature methods [3, 18, 19, 20] factorize some\nmatrix representations of the network to learn a low-dimensional latent representation/embedding for\neach node. Examples include matrix factorization [3] and stochastic block model [18] etc. Recently,\na number of network embedding techniques have been proposed, such as DeepWalk [19], LINE\n[21] and node2vec [20], which are also latent feature methods since they implicitly factorize some\nmatrices too [22]. Explicit features are often available in the form of node attributes, describing all\nkinds of side information about individual nodes. It is shown that combining graph structure features\nwith latent features and explicit features can improve the performance [23, 24].\nGraph neural networks Graph neural network (GNN) is a new type of neural network for learning\nover graphs [13, 14, 15, 16, 25, 26]). Here, we only brie\ufb02y introduce the components of a GNN since\nthis paper is not about GNN innovations but is a novel application of GNN. A GNN usually consists\nof 1) graph convolution layers which extract local substructure features for individual nodes, and 2) a\ngraph aggregation layer which aggregates node-level features into a graph-level feature vector. Many\ngraph convolution layers can be uni\ufb01ed into a message passing framework [27].\nSupervised heuristic learning There are some previous attempts to learn supervised heuristics\nfor link prediction. The closest work to ours is the Weisfeiler-Lehman Neural Machine (WLNM)\n[12], which also learns from local subgraphs. However, WLNM has several drawbacks. Firstly,\nWLNM trains a fully-connected neural network on the subgraphs\u2019 adjacency matrices. Since fully-\nconnected neural networks only accept \ufb01xed-size tensors as input, WLNM requires truncating\ndifferent subgraphs to the same size, which may lose much structural information. Secondly, due\nto the limitation of adjacency matrix representations, WLNM cannot learn from latent or explicit\nfeatures. Thirdly, theoretical justi\ufb01cations are also missing. We include more discussion on WLNM\nin Appendix D. Another related line of research is to train a supervised learning model on different\nheuristics\u2019 combination. For example, the path ranking algorithm [28] trains logistic regression on\ndifferent path types\u2019 probabilities to predict relations in knowledge graphs. Nickel et al. [23] propose\nto incorporate heuristic features into tensor factorization models. However, these models still rely on\nprede\ufb01ned heuristics \u2013 they cannot learn general graph structure features.\n\n3 A theory for unifying link prediction heuristics\n\nIn this section, we aim to understand deeper the mechanisms behind various link prediction heuristics,\nand thus motivating the idea of learning heuristics from local subgraphs. Due to the large number of\ngraph learning techniques, note that we are not concerned with the generalization error of a particular\nmethod, but focus on the information reserved in the subgraphs for calculating existing heuristics.\nDe\ufb01nition 1. (Enclosing subgraph) For a graph G = (V, E), given two nodes x, y 2 V , the\nh-hop enclosing subgraph for (x, y) is the subgraph Gh\nx,y induced from G by the set of nodes\n{ i | d(i, x) \uf8ff h or d(i, y) \uf8ff h }.\nThe enclosing subgraph describes the \u201ch-hop surrounding environment\" of (x, y). Since Gh\nx,y\ncontains all h-hop neighbors of x and y, we naturally have the following theorem.\nTheorem 1. Any h-order heuristic for (x, y) can be accurately calculated from Gh\nFor example, a 2-hop enclosing subgraph will contain all the information needed to calculate any \ufb01rst\nand second-order heuristics. However, although \ufb01rst and second-order heuristics are well covered\nby local enclosing subgraphs, an extremely large h seems to be still needed for learning high-order\nheuristics. Surprisingly, our following analysis shows that learning high-order heuristics is also\nfeasible with a small h. We support this \ufb01rst by de\ufb01ning the -decaying heuristic. We will show\nthat under certain conditions, a -decaying heuristic can be very well approximated from the h-hop\nenclosing subgraph. Moreover, we will show that almost all well-known high-order heuristics can be\nuni\ufb01ed into this -decaying heuristic framework.\nDe\ufb01nition 2. (-decaying heuristic) A -decaying heuristic for (x, y) has the following form:\n\nx,y.\n\nH(x, y) = \u2318\n\nlf (x, y, l),\n\n(1)\n\n1Xl=1\n\n3\n\n\fwhere is a decaying factor between 0 and 1, \u2318 is a positive constant or a positive function of that\nis upper bounded by a constant, f is a nonnegative function of x, y, l under the the given network.\n\nNext, we will show that under certain conditions, a -decaying heuristic can be approximated from\nan h-hop enclosing subgraph, and the approximation error decreases at least exponentially with h.\n\nTheorem 2. Given a -decaying heuristic H(x, y) = \u2318P1l=1 lf (x, y, l), if f (x, y, l) satis\ufb01es:\n\u2022 (property 1) f (x, y, l) \uf8ff l where < 1\n ; and\nx,y for l = 1, 2,\u00b7\u00b7\u00b7 , g(h), where g(h) = ah+b with\n\u2022 (property 2) f (x, y, l) is calculable from Gh\na, b 2 N and a > 0,\nx,y and the approximation error decreases at least expo-\nthen H(x, y) can be approximated from Gh\nnentially with h.\nProof. We can approximate such a -decaying heuristic by summing over its \ufb01rst g(h) terms.\n\nThe approximation error can be bounded as follows.\n\n|H(x, y) eH(x, y)| = \u2318\n\n1Xl=g(h)+1\n\ng(h)Xl=1\n\neH(x, y) := \u2318\nlf (x, y, l) \uf8ff \u2318\n\nlf (x, y, l).\n\n(2)\n\n1Xl=ah+b+1\n\nll = \u2318()ah+b+1(1 )1.\n\nIn practice, a small and a large a lead to a faster decreasing speed. Next we will prove that three\npopular high-order heuristics: Katz, rooted PageRank and SimRank, are all -decaying heuristics\nwhich satisfy the properties in Theorem 2. First, we need the following lemma.\nLemma 1. Any walk between x and y with length l \uf8ff 2h + 1 is included in Gh\nx,y.\nProof. Given any walk w = hx, v1,\u00b7\u00b7\u00b7 , vl1, yi with length l, we will show that every node vi\nis included in Gh\nx,y. Consider any vi. Assume d(vi, x) h + 1 and d(vi, y) h + 1. Then,\n2h + 1 l = |hx, v1,\u00b7\u00b7\u00b7 , vii| +|hvi,\u00b7\u00b7\u00b7 , vl1, yi| d(vi, x) + d(vi, y) 2h + 2, a contradiction.\nThus, d(vi, x) \uf8ff h or d(vi, y) \uf8ff h. By the de\ufb01nition of Gh\nNext we will analyze Katz, rooted PageRank and SimRank one by one.\n\nx,y, vi must be included in Gh\n\nx,y.\n\n3.1 Katz index\nThe Katz index [29] for (x, y) is de\ufb01ned as\n\nKatzx,y =\n\nl|walkshli(x, y)| =\n\nl[Al]x,y,\n\n(3)\n\n1Xl=1\n\n1Xl=1\n\nwhere walkshli(x, y) is the set of length-l walks between x and y, and Al is the lth power of the\nadjacency matrix of the network. Katz index sums over the collection of all walks between x and y\nwhere a walk of length l is damped by l (0 < < 1), giving more weight to shorter walks.\nKatz index is directly de\ufb01ned in the form of a -decaying heuristic with \u2318 = 1, = , and\nf (x, y, l) = |walkshli(x, y)|. According to Lemma 1, |walkshli(x, y)| is calculable from Gh\nx,y for\nl \uf8ff 2h + 1, thus property 2 in Theorem 2 is satis\ufb01ed. Now we show when property 1 is satis\ufb01ed.\nProposition 1. For any nodes i, j, [Al]i,j is bounded by dl, where d is the maximum node degree of\nthe network.\nProof. We prove it by induction. When l = 1, Ai,j \uf8ff d for any (i, j). Thus the base case is correct.\nNow, assume by induction that [Al]i,j \uf8ff dl for any (i, j), we have\n\n[Al+1]i,j =\n\n[Al]i,kAk,j \uf8ff dl\n\nAk,j \uf8ff dld = dl+1.\n\n|V |Xk=1\n\n|V |Xk=1\n\nTaking = d, we can see that whenever d < 1/, the Katz index will satisfy property 1 in Theorem\n2. In practice, the damping factor is often set to very small values like 5E-4 [1], which implies that\nKatz can be very well approximated from the h-hop enclosing subgraph.\n\n4\n\n\f3.2 PageRank\nThe rooted PageRank for node x calculates the stationary distribution of a random walker starting at\nx, who iteratively moves to a random neighbor of its current position with probability \u21b5 or returns\nto x with probability 1 \u21b5. Let \u21e1x denote the stationary distribution vector. Let [\u21e1x]i denote the\nprobability that the random walker is at node i under the stationary distribution.\nLet P be the transition matrix with Pi,j = 1\nif (i, j) 2 E and Pi,j = 0 otherwise. Let ex be a\nvector with the xth element being 1 and others being 0. The stationary distribution satis\ufb01es\n(4)\n\n|(vj )|\n\n\u21e1x = \u21b5P \u21e1x + (1 \u21b5)ex.\n\nWhen used for link prediction, the score for (x, y) is given by [\u21e1x]y (or [\u21e1x]y + [\u21e1y]x for symmetry).\nTo show that rooted PageRank is a -decaying heuristic, we introduce the inverse P-distance theory\n[30], which states that [\u21e1x]y can be equivalently written as follows:\nP [w]\u21b5len(w),\n\n(5)\n\n[\u21e1x]y = (1 \u21b5) Xw:x y\n\nwhere the summation is taken over all walks w starting at x and ending at y (possibly touching x\nand y multiple times). For a walk w = hv0, v1,\u00b7\u00b7\u00b7 , vki, len(w) := |hv0, v1,\u00b7\u00b7\u00b7 , vki| is the length\nof the walk. The term P [w] is de\ufb01ned asQk1\n, which can be interpreted as the probability of\ntraveling w. Now we have the following theorem.\nTheorem 3. The rooted PageRank heuristic is a -decaying heuristic which satis\ufb01es the properties\nin Theorem 2.\nProof. We \ufb01rst write [\u21e1x]y in the following form.\n\n|(vi)|\n\ni=0\n\n1\n\n[\u21e1x]y = (1 \u21b5)\n\n1Xl=1 Xw:x y\n\nlen(w)=l\n\nP [w]\u21b5l.\n\n(6)\n\nP [w] leads to the form of a -decaying heuristic. Note that f (x, y, l)\nis the probability that a random walker starting at x stops at y with exactly l steps, which satis\ufb01es\n\u21b5 (property 1). According to Lemma 1, f (x, y, l) is\n\nDe\ufb01ning f (x, y, l) :=P w:x y\nPz2V f (x, z, l) = 1. Thus, f (x, y, l) \uf8ff 1 < 1\n\nx,y for l \uf8ff 2h + 1 (property 2).\n\nalso calculable from Gh\n\nlen(w)=l\n\n3.3 SimRank\nThe SimRank score [10] is motivated by the intuition that two nodes are similar if their neighbors are\nalso similar. It is de\ufb01ned in the following recursive way: if x = y, then s(x, y) := 1; otherwise,\n\ns(x, y) := Pa2(x)Pb2(y) s(a, b)\ns(x, y) = Xw:(x,y)((z,z)\n\n|(x)| \u00b7 |(y)|\n\nP [w]len(w),\n\n(7)\n\n(8)\n\nwhere is a constant between 0 and 1. According to [10], SimRank has an equivalent de\ufb01nition:\n\nwhere w : (x, y) ( (z, z) denotes all simultaneous walks such that one walk starts at x, the other walk\nstarts at y, and they \ufb01rst meet at any vertex z. For a simultaneous walk w = h(v0, u0),\u00b7\u00b7\u00b7 , (vk, uk)i,\nlen(w) = k is the length of the walk. The term P [w] is similarly de\ufb01ned asQk1\n,\n|(vi)||(ui)|\ndescribing the probability of this walk. Now we have the following theorem.\nTheorem 4. SimRank is a -decaying heuristic which satis\ufb01es the properties in Theorem 2.\nProof. We write s(x, y) as follows.\n\ni=0\n\n1\n\ns(x, y) =\n\n1Xl=1 Xw:(x,y)((z,z)\n\nlen(w)=l\n\nP [w]l,\n\n(9)\n\nP [w] reveals that SimRank is a -decaying heuristic. Note\n\nDe\ufb01ning f (x, y, l) :=Pw:(x,y)((z,z)\nthat f (x, y, l) \uf8ff 1 < 1\n\nlen(w)=l\n\n . It is easy to see that f (x, y, l) is also calculable from Gh\n\nx,y for l \uf8ff h.\n\n5\n\n\fDiscussion There exist several other high-order heuristics based on path counting or random walk\n[6] which can be as well incorporated into the -decaying heuristic framework. We omit the analysis\nhere. Our results reveal that most high-order heuristics inherently share the same -decaying heuristic\nform, and thus can be effectively approximated from an h-hop enclosing subgraph with exponentially\nsmaller approximation error. We believe the ubiquity of -decaying heuristics is not by accident \u2013\nit implies that a successful link prediction heuristic is better to put exponentially smaller weight on\nstructures far away from the target, as remote parts of the network intuitively make little contribution\nto link existence. Our results build the foundation for learning heuristics from local subgraphs, as they\nimply that local enclosing subgraphs already contain enough information to learn good graph\nstructure features for link prediction which is much desired considering learning from the entire\nnetwork is often infeasible. To summarize, from the small enclosing subgraphs extracted around\nlinks, we are able to accurately calculate \ufb01rst and second-order heuristics, and approximate a wide\nrange of high-order heuristics with small errors. Therefore, given adequate feature learning ability of\nthe model used, learning from such enclosing subgraphs is expected to achieve performance at least\nas good as a wide range of heuristics. There is some related work which empirically veri\ufb01es that local\nmethods can often estimate PageRank and SimRank well [31, 32]. Another related theoretical work\n[33] establishes a condition of h to achieve some \ufb01xed approximation error for ordinary PageRank.\n\n4 SEAL: An implemetation of the theory using GNN\n\nIn this section, we describe our SEAL framework for link prediction. SEAL does not restrict the\nlearned features to be in some particular forms such as -decaying heuristics, but instead learns\ngeneral graph structure features for link prediction. It contains three steps: 1) enclosing subgraph\nextraction, 2) node information matrix construction, and 3) GNN learning. Given a network, we aim\nto learn automatically a \u201cheuristic\u201d that best explains the link formations. Motivated by the theoretical\nresults, this function takes local enclosing subgraphs around links as input, and output how likely\nthe links exist. To learn such a function, we train a graph neural network (GNN) over the enclosing\nsubgraphs. Thus, the \ufb01rst step in SEAL is to extract enclosing subgraphs for a set of sampled positive\nlinks (observed) and a set of sampled negative links (unobserved) to construct the training data.\nA GNN typically takes (A, X) as input, where A (with slight abuse of notation) is the adjacency matrix\nof the input enclosing subgraph, X is the node information matrix each row of which corresponds\nto a node\u2019s feature vector. The second step in SEAL is to construct the node information matrix\nX for each enclosing subgraph. This step is crucial for training a successful GNN link prediction\nmodel. In the following, we discuss this key step. The node information matrix X in SEAL has three\ncomponents: structural node labels, node embeddings and node attributes.\n\n4.1 Node labeling\nThe \ufb01rst component in X is each node\u2019s structural label. A node labeling is function fl : V ! N\nwhich assigns an integer label fl(i) to every node i in the enclosing subgraph. The purpose is to use\ndifferent labels to mark nodes\u2019 different roles in an enclosing subgraph: 1) The center nodes x and\ny are the target nodes between which the link is located. 2) Nodes with different relative positions to\nthe center have different structural importance to the link. A proper node labeling should mark such\ndifferences. If we do not mark such differences, GNNs will not be able to tell where are the target\nnodes between which a link existence should be predicted, and lose structural information.\nOur node labeling method is derived from the following criteria: 1) The two target nodes x and y\nalways have the distinctive label \u201c1\u201d. 2) Nodes i and j have the same label if d(i, x) = d(j, x) and\nd(i, y) = d(j, y). The second criterion is because, intuitively, a node i\u2019s topological position within\nan enclosing subgraph can be described by its radius with respect to the two center nodes, namely\n(d(i, x), d(i, y)). Thus, we let nodes on the same orbit have the same label, so that the node labels\ncan re\ufb02ect nodes\u2019 relative positions and structural importance within subgraphs.\nBased on the above criteria, we propose a Double-Radius Node Labeling (DRNL) as follows. First,\nassign label 1 to x and y. Then, for any node i with (d(i, x), d(i, y)) = (1, 1), assign label fl(i) = 2.\nNodes with radius (1, 2) or (2, 1) get label 3. Nodes with radius (1, 3) or (3, 1) get 4. Nodes with\n(2, 2) get 5. Nodes with (1, 4) or (4, 1) get 6. Nodes with (2, 3) or (3, 2) get 7. So on and so forth. In\nother words, we iteratively assign larger labels to nodes with a larger radius w.r.t. both center nodes,\nwhere the label fl(i) and the double-radius (d(i, x), d(i, y)) satisfy\n\n6\n\n\f1) if d(i, x) + d(i, y) 6= d(j, x) + d(j, y), then d(i, x) + d(i, y) < d(j, x) + d(j, y) , fl(i) < fl(j);\n2) if d(i, x) + d(i, y) = d(j, x) + d(j, y), then d(i, x)d(i, y) < d(j, x)d(j, y) , fl(i) < fl(j).\nOne advantage of DRNL is that it has a perfect hashing function\n\nfl(i) = 1 + min(dx, dy) + (d/2)[(d/2) + (d%2) 1],\n\n(10)\nwhere dx := d(i, x), dy := d(i, y), d := dx + dy, (d/2) and (d%2) are the integer quotient and\nremainder of d divided by 2, respectively. This perfect hashing allows fast closed-form computations.\nFor nodes with d(i, x) = 1 or d(i, y) = 1, we give them a null label 0. Note that DRNL is not\nthe only possible way of node labeling, but we empirically veri\ufb01ed its better performance than no\nlabeling and other naive labelings. We discuss more about node labeling in Appendix B. After getting\nthe labels, we use their one-hot encoding vectors to construct X.\n\nIncorporating latent and explicit features\n\n4.2\nOther than the structural node labels, the node information matrix X also provides an opportunity to\ninclude latent and explicit features. By concatenating each node\u2019s embedding/attribute vector to its\ncorresponding row in X, we can make SEAL simultaneously learn from all three types of features.\nGenerating the node embeddings for SEAL is nontrivial. Suppose we are given the observed network\nG = (V, E), a set of sampled positive training links Ep \u2713 E, and a set of sampled negative training\nlinks En with En \\ E = ?. If we directly generate node embeddings on G, the node embeddings\nwill record the link existence information of the training links (since Ep \u2713 E). We observed that\nGNNs can quickly \ufb01nd out such link existence information and optimize by only \ufb01tting this part\nof information. This results in bad generalization performance in our experiments. Our trick is to\ntemporally add En into E, and generate the embeddings on G0 = (V, E [ En). This way, the positive\nand negative training links will have the same link existence information recorded in the embeddings,\nso that GNN cannot classify links by only \ufb01tting this part of information. We empirically veri\ufb01ed the\nmuch improved performance of this trick to SEAL. We name this trick negative injection.\nWe name our proposed framework SEAL (learning from Subgraphs, Embeddings and Attributes for\nLink prediction), emphasizing its ability to jointly learn from three types of features.\n\n5 Experimental results\n\nWe conduct extensive experiments to evaluate SEAL. Our results show that SEAL is a superb and\nrobust framework for link prediction, achieving unprecedentedly strong performance on various\nnetworks. We use AUC and average precision (AP) as evaluation metrics. We run all experiments for\n10 times and report the average AUC results and standard deviations. We leave the the AP and time\nresults in Appendix F. SEAL is \ufb02exible with what GNN or node embeddings to use. Thus, we choose\na recent architecture DGCNN [17] as the default GNN, and node2vec [20] as the default embeddings.\nThe code and data are available at https://github.com/muhanzhang/SEAL.\nDatasets The eight datasets used are: USAir, NS, PB, Yeast, C.ele, Power, Router, and E.coli (please\nsee Appendix C for details). We randomly remove 10% existing links from each dataset as positive\ntesting data. Following a standard manner of learning-based link prediction, we randomly sample\nthe same number of nonexistent links (unconnected node pairs) as negative testing data. We use the\nremaining 90% existing links as well as the same number of additionally sampled nonexistent links\nto construct the training data.\nComparison to heuristic methods We \ufb01rst compare SEAL with methods that only use graph\nstructure features. We include eight popular heuristics (shown in Appendix A, Table 3): common\nneighbors (CN), Jaccard, preferential attachment (PA), Adamic-Adar (AA), resource allocation (RA),\nKatz, PageRank (PR), and SimRank (SR). We additionally include Ensemble (ENS) which trains\na logistic regression classi\ufb01er on the eight heuristic scores. We also include two heuristic learning\nmethods: Weisfeiler-Lehman graph kernel (WLK) [34] and WLNM [12], which also learn from\n(truncated) enclosing subgraphs. We omit path ranking methods [28] as well as other recent methods\nwhich are speci\ufb01cally designed for knowledge graphs or recommender systems [23, 35]. As all the\nbaselines only use graph structure features, we restrict SEAL to not include any latent or explicit\nfeatures. In SEAL, the hop number h is an important hyperparameter. Here, we select h only from\n{1, 2}, since on one hand we empirically veri\ufb01ed that the performance typically does not increase\n\n7\n\n\fafter h 3, which validates our theoretical results that the most useful information is within local\nstructures. On the other hand, even h = 3 sometimes results in very large subgraphs if a hub node\nis included. This raises the idea of sampling nodes in subgraphs, which we leave to future work.\nThe selection principle is very simple: If the second-order heuristic AA outperforms the \ufb01rst-order\nheuristic CN on 10% validation data, then we choose h = 2; otherwise we choose h = 1. For datasets\nPB and E.coli, we consistently use h = 1 to \ufb01t into the memory. We include more details about the\nbaselines and hyperparameters in Appendix D.\n\nTable 1: Comparison with heuristic methods (AUC).\n\nData\nUSAir\nNS\nPB\nYeast\nC.ele\nPower\nRouter\nE.coli\n\nCN\n\n93.80\u00b11.22\n94.42\u00b10.95\n92.04\u00b10.35\n89.37\u00b10.61\n85.13\u00b11.61\n58.80\u00b10.88\n56.43\u00b10.52\n93.71\u00b10.39\n\nJaccard\n89.79\u00b11.61\n94.43\u00b10.93\n87.41\u00b10.39\n89.32\u00b10.60\n80.19\u00b11.64\n58.79\u00b10.88\n56.40\u00b10.52\n81.31\u00b10.61\n\nPA\n\nAA\n\nRA\n\n88.84\u00b11.45\n68.65\u00b12.03\n90.14\u00b10.45\n82.20\u00b11.02\n74.79\u00b12.04\n44.33\u00b11.02\n47.58\u00b11.47\n91.82\u00b10.58\n\n95.06\u00b11.03\n94.45\u00b10.93\n92.36\u00b10.34\n89.43\u00b10.62\n86.95\u00b11.40\n58.79\u00b10.88\n56.43\u00b10.51\n95.36\u00b10.34\n\n95.77\u00b10.92\n94.45\u00b10.93\n92.46\u00b10.37\n89.45\u00b10.62\n87.49\u00b11.41\n58.79\u00b10.88\n56.43\u00b10.51\n95.95\u00b10.35\n\nKatz\n\n92.88\u00b11.42\n94.85\u00b11.10\n92.92\u00b10.35\n92.24\u00b10.61\n86.34\u00b11.89\n65.39\u00b11.59\n38.62\u00b11.35\n93.50\u00b10.44\n\nPR\n\nSR\n\n94.67\u00b11.08\n94.89\u00b11.08\n93.54\u00b10.41\n92.76\u00b10.55\n90.32\u00b11.49\n66.00\u00b11.59\n38.76\u00b11.39\n95.57\u00b10.44\n\n78.89\u00b12.31\n94.79\u00b11.08\n77.08\u00b10.80\n91.49\u00b10.57\n77.07\u00b12.00\n76.15\u00b11.06\n37.40\u00b11.27\n62.49\u00b11.43\n\nENS\n\n88.96\u00b11.44\n97.64\u00b10.25\n90.15\u00b10.45\n82.36\u00b11.02\n74.94\u00b12.04\n79.52\u00b11.78\n47.58\u00b11.48\n91.89\u00b10.58\n\nWLK\n\n96.63\u00b10.73\n98.57\u00b10.51\n93.83\u00b10.59\n95.86\u00b10.54\n89.72\u00b11.67\n82.41\u00b13.43\n87.42\u00b12.08\n96.94\u00b10.29\n\nWLNM\n95.95\u00b11.10\n98.61\u00b10.49\n93.49\u00b10.47\n95.62\u00b10.52\n86.18\u00b11.72\n84.76\u00b10.98\n94.41\u00b10.88\n97.21\u00b10.27\n\nSEAL\n\n96.62\u00b10.72\n98.85\u00b10.47\n94.72\u00b10.46\n97.91\u00b10.52\n90.30\u00b11.35\n87.61\u00b11.57\n96.38\u00b11.45\n97.64\u00b10.22\n\nTable 1 shows the results. Firstly, we observe that methods which learn from enclosing subgraphs\n(WLK, WLNM and SEAL) generally perform much better than prede\ufb01ned heuristics. This indicates\nthat the learned \u201cheuristics\u201d are better at capturing the network properties than manually designed\nones. Among learning-based methods, SEAL has the best performance, demonstrating GNN\u2019s\nsuperior graph feature learning ability over graph kernels and fully-connected neural networks. From\nthe results on Power and Router, we can see that although existing heuristics perform similarly to\nrandom guess, learning-based methods still maintain high performance. This suggests that we can\neven discover new \u201cheuristics\u201d for networks where no existing heuristics work.\n\nLINE\n\nSPC\n\nData\nUSAir\nNS\nPB\nYeast\nC.ele\nPower\nRouter\nE.coli\n\nMF\n\n94.08\u00b10.80\n74.55\u00b14.34\n94.30\u00b10.53\n90.28\u00b10.69\n85.90\u00b11.74\n50.63\u00b11.10\n78.03\u00b11.63\n93.76\u00b10.56\n\n74.22\u00b13.11\n89.94\u00b12.39\n83.96\u00b10.86\n93.25\u00b10.40\n51.90\u00b12.57\n91.78\u00b10.61\n68.79\u00b12.42\n94.92\u00b10.32\n\n81.47\u00b110.71\n80.63\u00b11.90\n76.95\u00b12.76\n87.45\u00b13.33\n69.21\u00b13.14\n55.63\u00b11.47\n67.15\u00b12.10\n82.38\u00b12.19\n\nN2V\n\n91.44\u00b11.78\n91.52\u00b11.28\n85.79\u00b10.78\n93.67\u00b10.46\n84.11\u00b11.27\n76.22\u00b10.92\n65.46\u00b10.86\n90.82\u00b11.49\n\nSBM\n\n94.85\u00b11.14\n92.30\u00b12.26\n93.90\u00b10.42\n91.41\u00b10.60\n86.48\u00b12.60\n66.57\u00b12.05\n85.65\u00b11.93\n93.82\u00b10.41\n\nVGAE\n\n89.28\u00b11.99\n94.04\u00b11.64\n90.70\u00b10.53\n93.88\u00b10.21\n81.80\u00b12.18\n71.20\u00b11.65\n61.51\u00b11.22\n90.81\u00b10.63\n\nSEAL\n97.09\u00b10.70\n97.71\u00b10.93\n95.01\u00b10.34\n97.20\u00b10.64\n89.54\u00b12.04\n84.18\u00b11.82\n95.68\u00b11.22\n97.22\u00b10.28\n\nTable 2: Comparison with latent feature methods (AUC).\n\nComparison to latent\nfeature\nmethods Next we compare SEAL\nwith six state-of-the-art latent feature\nmethods: matrix factorization (MF),\nstochastic block model\n(SBM)\n[18], node2vec (N2V) [20], LINE\n[21],\nspectral clustering (SPC),\nand variational graph auto-encoder\n(VGAE) [36]. Among them, VGAE\nuses a GNN too. Please note the difference between VGAE and SEAL: VGAE uses a node-level\nGNN to learn node embeddings that best reconstruct the network, while SEAL uses a graph-level\nGNN to classify enclosing subgraphs. Therefore, VGAE still belongs to latent feature methods. For\nSEAL, we additionally include the 128-dimensional node2vec embeddings in the node information\nmatrix X. Since the datasets do not have node attributes, explicit features are not included.\nTable 2 shows the results. As we can see, SEAL shows signi\ufb01cant improvement over latent feature\nmethods. One reason is that SEAL learns from both graph structures and latent features simulta-\nneously, thus augmenting those methods that only use latent features. We observe that SEAL with\nnode2vec embeddings outperforms pure node2vec by large margins. This implies that network\nembeddings alone may not be able to capture the most useful link prediction information located\nin the local structures. It is also interesting that compared to SEAL without node2vec embeddings\n(Table 1), joint learning does not always improve the performance. More experiments and discussion\nare included in Appendix F.\n\n6 Conclusions\n\nLearning link prediction heuristics automatically is a new \ufb01eld. In this paper, we presented theoretical\njusti\ufb01cations for learning from local enclosing subgraphs. In particular, we proposed a -decaying\ntheory to unify a wide range of high-order heuristics and prove their approximability from local\nsubgraphs. Motivated by the theory, we proposed a novel link prediction framework, SEAL, to\nsimultaneously learn from local enclosing subgraphs, embeddings and attributes based on graph\nneural networks. Experimentally we showed that SEAL achieved unprecedentedly strong performance\nby comparing to various heuristics, latent feature methods, and network embedding algorithms. We\nhope SEAL can not only inspire link prediction research, but also open up new directions for other\nrelational machine learning problems such as knowledge graph completion and recommender systems.\n\n8\n\n\fAcknowledgments\nThe work is supported in part by the III-1526012 and SCH-1622678 grants from the National Science\nFoundation and grant 1R21HS024581 from the National Institute of Health.\n\nReferences\n\n[1] David Liben-Nowell and Jon Kleinberg. The link-prediction problem for social networks. Journal of the\n\nAmerican society for information science and technology, 58(7):1019\u20131031, 2007.\n\n[2] Lada A Adamic and Eytan Adar. Friends and neighbors on the web. Social networks, 25(3):211\u2013230,\n\n2003.\n\n[3] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems.\n\nComputer, (8):30\u201337, 2009.\n\n[4] Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. A review of relational machine\n\nlearning for knowledge graphs. Proceedings of the IEEE, 104(1):11\u201333, 2016.\n\n[5] Tolutola Oyetunde, Muhan Zhang, Yixin Chen, Yinjie Tang, and Cynthia Lo. Boostgap\ufb01ll: Improving\nthe \ufb01delity of metabolic network reconstructions through integrated constraint and pattern-based methods.\nBioinformatics, 2016.\n\n[6] Linyuan L\u00fc and Tao Zhou. Link prediction in complex networks: A survey. Physica A: Statistical\n\nMechanics and its Applications, 390(6):1150\u20131170, 2011.\n\n[7] Albert-L\u00e1szl\u00f3 Barab\u00e1si and R\u00e9ka Albert. Emergence of scaling in random networks. Science, 286(5439):\n\n509\u2013512, 1999.\n\n[8] Tao Zhou, Linyuan L\u00fc, and Yi-Cheng Zhang. Predicting missing links via local information. The European\n\nPhysical Journal B, 71(4):623\u2013630, 2009.\n\n[9] Sergey Brin and Lawrence Page. Reprint of: The anatomy of a large-scale hypertextual web search engine.\n\nComputer networks, 56(18):3825\u20133833, 2012.\n\n[10] Glen Jeh and Jennifer Widom. Simrank: a measure of structural-context similarity. In Proceedings of the\neighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538\u2013543.\nACM, 2002.\n\n[11] Istv\u00e1n A Kov\u00e1cs, Katja Luck, Kerstin Spirohn, Yang Wang, Carl Pollis, Sadie Schlabach, Wenting Bian,\nDae-Kyum Kim, Nishka Kishore, Tong Hao, et al. Network-based prediction of protein interactions.\nbioRxiv, page 275529, 2018.\n\n[12] Muhan Zhang and Yixin Chen. Weisfeiler-lehman neural machine for link prediction. In Proceedings\nof the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages\n575\u2013583. ACM, 2017.\n\n[13] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected\n\nnetworks on graphs. arXiv preprint arXiv:1312.6203, 2013.\n\n[14] David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Al\u00e1n\nAspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular \ufb01ngerprints.\nIn Advances in neural information processing systems, pages 2224\u20132232, 2015.\n\n[15] Thomas N Kipf and Max Welling. Semi-supervised classi\ufb01cation with graph convolutional networks.\n\narXiv preprint arXiv:1609.02907, 2016.\n\n[16] Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networks for\n\ngraphs. In International conference on machine learning, pages 2014\u20132023, 2016.\n\n[17] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. An end-to-end deep learning architecture\n\nfor graph classi\ufb01cation. In AAAI, pages 4438\u20134445, 2018.\n\n[18] Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing. Mixed membership stochastic\n\nblockmodels. Journal of Machine Learning Research, 9(Sep):1981\u20132014, 2008.\n\n9\n\n\f[19] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In\nProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining,\npages 701\u2013710. ACM, 2014.\n\n[20] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the\n22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855\u2013864.\nACM, 2016.\n\n[21] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale\ninformation network embedding. In Proceedings of the 24th International Conference on World Wide Web,\npages 1067\u20131077. International World Wide Web Conferences Steering Committee, 2015.\n\n[22] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network embedding as matrix\n\nfactorization: Unifyingdeepwalk, line, pte, and node2vec. arXiv preprint arXiv:1710.02971, 2017.\n\n[23] Maximilian Nickel, Xueyan Jiang, and Volker Tresp. Reducing the rank in relational factorization models\nby including observable patterns. In Advances in Neural Information Processing Systems, pages 1179\u20131187,\n2014.\n\n[24] He Zhao, Lan Du, and Wray Buntine. Leveraging node attributes for incomplete relational data. In\n\nInternational Conference on Machine Learning, pages 4072\u20134081, 2017.\n\n[25] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks.\n\narXiv preprint arXiv:1511.05493, 2015.\n\n[26] Hanjun Dai, Bo Dai, and Le Song. Discriminative embeddings of latent variable models for structured\ndata. In Proceedings of The 33rd International Conference on Machine Learning, pages 2702\u20132711, 2016.\n\n[27] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message\n\npassing for quantum chemistry. arXiv preprint arXiv:1704.01212, 2017.\n\n[28] Ni Lao and William W Cohen. Relational retrieval using a combination of path-constrained random walks.\n\nMachine learning, 81(1):53\u201367, 2010.\n\n[29] Leo Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39\u201343, 1953.\n\n[30] Glen Jeh and Jennifer Widom. Scaling personalized web search. In Proceedings of the 12th international\n\nconference on World Wide Web, pages 271\u2013279. Acm, 2003.\n\n[31] Yen-Yu Chen, Qingqing Gan, and Torsten Suel. Local methods for estimating pagerank values.\n\nIn\nProceedings of the thirteenth ACM international conference on Information and knowledge management,\npages 381\u2013389. ACM, 2004.\n\n[32] Xu Jia, Hongyan Liu, Li Zou, Jun He, Xiaoyong Du, and Yuanzhe Cai. Local methods for estimating\nsimrank score. In Web Conference (APWEB), 2010 12th International Asia-Paci\ufb01c, pages 157\u2013163. IEEE,\n2010.\n\n[33] Ziv Bar-Yossef and Li-Tal Mashiach. Local approximation of pagerank and reverse pagerank. In Pro-\nceedings of the 17th ACM conference on Information and knowledge management, pages 279\u2013288. ACM,\n2008.\n\n[34] Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt.\n\nWeisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(Sep):2539\u20132561, 2011.\n\n[35] Federico Monti, Michael Bronstein, and Xavier Bresson. Geometric matrix completion with recurrent\nmulti-graph neural networks. In Advances in Neural Information Processing Systems, pages 3700\u20133710,\n2017.\n\n[36] Thomas N Kipf and Max Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308,\n\n2016.\n\n[37] Ulrike V Luxburg, Agnes Radl, and Matthias Hein. Getting lost in space: Large sample analysis of the\n\nresistance distance. In Advances in Neural Information Processing Systems, pages 2622\u20132630, 2010.\n\n[38] Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. struc2vec: Learning node represen-\ntations from structural identity. In Proceedings of the 23rd ACM SIGKDD International Conference on\nKnowledge Discovery and Data Mining, pages 385\u2013394. ACM, 2017.\n\n[39] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In\n\nAdvances in Neural Information Processing Systems, pages 1025\u20131035, 2017.\n\n10\n\n\f[40] Yi-An Lai, Chin-Chi Hsu, Wen Hao Chen, Mi-Yen Yeh, and Shou-De Lin. Prune: Preserving proximity\nand global ranking for network embedding. In Advances in Neural Information Processing Systems, pages\n5263\u20135272, 2017.\n\n[41] Alberto Garcia Duran and Mathias Niepert. Learning graph representations with embedding propagation.\n\nIn Advances in Neural Information Processing Systems, pages 5125\u20135136, 2017.\n\n[42] Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative \ufb01ltering model. In\nProceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining,\npages 426\u2013434. ACM, 2008.\n\n[43] Steffen Rendle. Factorization machines. In 10th IEEE International Conference on Data Mining (ICDM),\n\npages 995\u20131000. IEEE, 2010.\n\n[44] Vladimir Batagelj and Andrej Mrvar. http://vlado.fmf.uni-lj.si/pub/networks/data/, 2006.\n[45] Mark EJ Newman. Finding community structure in networks using the eigenvectors of matrices. Physical\n\nreview E, 74(3):036104, 2006.\n\n[46] Robert Ackland et al. Mapping the us political blogosphere: Are conservative bloggers more prominent?\nIn BlogTalk Downunder 2005 Conference, Sydney. BlogTalk Downunder 2005 Conference, Sydney, 2005.\n[47] Christian Von Mering, Roland Krause, Berend Snel, Michael Cornell, Stephen G Oliver, Stanley Fields,\nand Peer Bork. Comparative assessment of large-scale data sets of protein\u2013protein interactions. Nature,\n417(6887):399\u2013403, 2002.\n\n[48] Duncan J Watts and Steven H Strogatz. Collective dynamics of \u2018small-world\u2019networks. Nature, 393(6684):\n\n440\u2013442, 1998.\n\n[49] Neil Spring, Ratul Mahajan, David Wetherall, and Thomas Anderson. Measuring isp topologies with\n\nrocketfuel. IEEE/ACM Transactions on networking, 12(1):2\u201316, 2004.\n\n[50] Muhan Zhang, Zhicheng Cui, Shali Jiang, and Yixin Chen. Beyond link prediction: Predicting hyperlinks\n\nin adjacency space. In AAAI, pages 4430\u20134437, 2018.\n\n[51] Christopher Aicher, Abigail Z Jacobs, and Aaron Clauset. Learning latent block structure in weighted\n\nnetworks. Journal of Complex Networks, 3(2):221\u2013248, 2015.\n\n[52] Steffen Rendle. Factorization machines with libfm. ACM Transactions on Intelligent Systems and\n\nTechnology (TIST), 3(3):57, 2012.\n\n[53] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library\n\nfor large linear classi\ufb01cation. Journal of machine learning research, 9(Aug):1871\u20131874, 2008.\n\n[54] S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. Graph kernels.\n\nJournal of Machine Learning Research, 11(Apr):1201\u20131242, 2010.\n\n[55] Mahito Sugiyama and Karsten Borgwardt. Halting in random walk kernels.\n\ninformation processing systems, pages 1639\u20131647, 2015.\n\nIn Advances in neural\n\n[56] Fabrizio Costa and Kurt De Grave. Fast neighborhood subgraph pairwise distance kernel. In Proceedings\n\nof the 26th International Conference on Machine Learning, pages 255\u2013262. Omnipress, 2010.\n\n[57] Nils Kriege and Petra Mutzel. Subgraph matching kernels for attributed graphs. In Proceedings of the 29th\n\nInternational Conference on Machine Learning (ICML-12), pages 1015\u20131022, 2012.\n\n[58] Karsten M Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs. In Data Mining, Fifth\n\nIEEE International Conference on, pages 8\u2013pp. IEEE, 2005.\n\n[59] Marion Neumann, Roman Garnett, Christian Bauckhage, and Kristian Kersting. Propagation kernels:\n\nef\ufb01cient graph kernels from propagated information. Machine Learning, 102(2):209\u2013245, 2016.\n\n[60] Jure Leskovec and Andrej Krevl. {SNAP Datasets}:{Stanford} large network dataset collection. 2015.\n[61] Reza Zafarani and Huan Liu. Social computing data repository at asu, 2009. URL http://socialcomputing.\n\nasu. edu.\n\n[62] Matt Mahoney. Large text compression benchmark, 2011.\n[63] Chris Stark, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, and Mike Tyers.\nBiogrid: a general repository for interaction datasets. Nucleic acids research, 34(suppl_1):D535\u2013D539,\n2006.\n\n11\n\n\f", "award": [], "sourceid": 2478, "authors": [{"given_name": "Muhan", "family_name": "Zhang", "institution": "Washington University in St. Louis"}, {"given_name": "Yixin", "family_name": "Chen", "institution": "Washington University in St. Louis"}]}