{"title": "Hyperbolic Graph Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 8230, "page_last": 8241, "abstract": "Learning from graph-structured data is an important task in machine learning and artificial intelligence, for which Graph Neural Networks (GNNs) have shown great promise. Motivated by recent advances in geometric representation learning, we propose a novel GNN architecture for learning representations on Riemannian manifolds with differentiable exponential and logarithmic maps. We develop a scalable algorithm for modeling the structural properties of graphs, comparing Euclidean and hyperbolic geometry. In our experiments, we show that hyperbolic GNNs can lead to substantial improvements on various benchmark datasets.", "full_text": "Hyperbolic Graph Neural Networks\n\nQi Liu\u2217, Maximilian Nickel and Douwe Kiela\n\nFacebook AI Research\n\n{qiliu,maxn,dkiela}@fb.com\n\nAbstract\n\nLearning from graph-structured data is an important task in machine learning and\narti\ufb01cial intelligence, for which Graph Neural Networks (GNNs) have shown great\npromise. Motivated by recent advances in geometric representation learning, we\npropose a novel GNN architecture for learning representations on Riemannian\nmanifolds with differentiable exponential and logarithmic maps. We develop a\nscalable algorithm for modeling the structural properties of graphs, comparing\nEuclidean and hyperbolic geometry. In our experiments, we show that hyperbolic\nGNNs can lead to substantial improvements on various benchmark datasets.\n\n1\n\nIntroduction\n\nWe study the problem of supervised learning on entire graphs. Neural methods have been applied\nwith great success to (semi) supervised node and edge classi\ufb01cation [26, 51]. They have also shown\npromise for the classi\ufb01cation of graphs based on their structural properties [e.g., 18]. By being\ninvariant to node and edge permutations [3], GNNs can exploit symmetries in graph-structured data,\nwhich makes them well-suited for a wide range of problems, ranging from quantum chemistry [18] to\nmodelling social and interaction graphs [50].\nIn this work, we are concerned with the representational geometry of GNNs. Results in network\nscience have shown that hyperbolic geometry in particular is well-suited for modeling complex\nnetworks. Typical properties such as heterogeneous degree distributions and strong clustering\ncan often be explained by assuming an underlying hierarchy which is well captured in hyperbolic\nspace [28, 35]. These insights led, for instance, to hyperbolic geometric graph models, which allow\nfor the generation of random graphs with real-world properties by sampling nodes uniformly in\nhyperbolic space [1]. Moreover, it has recently been shown that hyperbolic geometry lends itself\nparticularly well for learning hierarchical representations of symbolic data and can lead to substantial\ngains in representational ef\ufb01ciency and generalization performance [32].\nMotivated by these results, we examine if graph neural networks may be equipped with geometrically\nappropriate inductive biases for capturing structural properties, e.g., information about which nodes\nare highly connected (and hence more central) or the overall degree distribution in a graph. For this\npurpose, we extend graph neural networks to operate on Riemannian manifolds with differentiable\nexponential and logarithmic maps. This allows us to investigate non-Euclidean geometries within a\ngeneral framework for supervised learning on graphs \u2013 independently of the underlying space and\nits curvature. Here, we compare standard graph convolutional networks [26] that work in Euclidean\nspace with different hyperbolic graph neural networks (HGNNs): one that operates on the Poincar\u00e9\nball as in [32] and one that operates on the Lorentz model of hyperbolic geometry as in [33]. We\nfocus speci\ufb01cally on the ability of hyperbolic graph neural networks to capture structural properties.\nOur contributions are as follows: we generalize graph neural networks to be manifold-agnostic, and\nshow that hyperbolic graph neural networks can provide substantial improvements for full-graph\n\n\u2217Work done as an AI Resident.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fclassi\ufb01cation. Furthermore, we show that HGNNs are more ef\ufb01cient at capturing structural properties\nof synthetic data than their Euclidean counterpart; that they can more accurately predict the chemical\nproperties of molecules; and that they can predict extraneous properties of large-scale networks, in this\ncase price \ufb02uctuations of a blockchain transaction graph, by making use of the hierarchical structure\npresent in the data. Code and data are available at https://github.com/facebookresearch/\nhgnn\n\n2 Related Work\n\nGraph neural networks (GNNs) have received increased attention in machine learning and arti\ufb01cial\nintelligence due to their attractive properties for learning from graph-structured data [7]. Originally\nproposed by [19, 41] as a method for learning node representations on graphs using neural networks,\nthis idea was extended to convolutional neural networks using spectral methods [9, 13] and the\niterative aggregation of neighbor representations [26, 34, 45]. [22] showed that graph neural networks\ncan be scaled to large-scale graphs. Due to their ability to learn inductive models of graphs, GNNs\nhave found promising applications in molecular \ufb01ngerprinting [14] and quantum chemistry [18].\nThere has been an increased interest in hyperbolic embeddings due to their ability to model data with\nlatent hierarchies.\n[32] proposed Poincar\u00e9 for learning hierarchical representations of symbolic data. Furthermore,\n[33] showed that the Lorentz model of hyperbolic geometry has attractive properties for stochastic\noptimization and leads to substantially improved embeddings, especially in low dimensions. [16]\nextended Poincar\u00e9 embeddings to directed graphs using hyperbolic entailment cones. The representa-\ntion trade-offs for hyperbolic embeddings were analyzed in [12], which also proposed a combinatorial\nalgorithm to compute embeddings.\nGanea et al. [17] and Gulcehre et al. [21] proposed hyperbolic neural networks and hyperbolic\nattention networks, respectively, with the aim of extending deep learning methods to hyperbolic space.\nOur formalism is related to the former in that layer transformations are performed in the tangent\nspace. We propose a model that is applicable to any Riemannian manifold with differentiable log/exp\nmaps, which also allows us to easily extend GNNs to the Lorentz model2. Our formalism is related to\nthe latter in that we perform message passing in hyperbolic space, but instead of using the Einstein\nmidpoint, we generalize to any Riemannian manifold via mapping to and from the tangent space.\nHyperbolic geometry has also shown great promise in network science: [28] showed that typical\nproperties of complex networks such as heterogeneous degree distributions and strong clustering can\nbe explained by assuming an underlying hyperbolic geometry and used these insights to develop a\ngeometric graph model for real-world networks [1]. Furthermore, [27, 6, 28] exploited the property\nof hyperbolic embeddings to embed tree-like graphs with low distortion, for greedy-path routing in\nlarge-scale communication networks.\nConcurrently with this work, Chami et al. [10] also proposed an extension of graph neural networks to\nhyperbolic geometry. The main difference lies in their attention-based architecture for neighborhood\naggregation, which also elegantly supports having trainable curvature parameters at each layer. They\nshow strong performance on link prediction and node classi\ufb01cation tasks, and provide an insightful\nanalysis in terms of a graph\u2019s \u03b4-hyperbolicity.\n\n3 Hyperbolic Graph Neural Networks\n\nGraph neural networks can be interpreted as performing message passing between nodes [18]. We\nbase our framework on graph convolutional networks as proposed in [26], where node representations\nare computed by aggregating messages from direct neighbors over multiple steps. That is, the\nmessage from node v to its receiving neighbor u is computed as mk+1\nv is\nthe representation of node v at step k, Wk \u2208 Rh\u00d7h constitutes the trainable parameters for step k\n(i.e., the k-th layer), and \u02dcA = D\u2212 1\n2 captures the connectivity of the graph. To get \u02dcA,\nthe identity matrix I is added to the adjacency matrix A to obtain self-loops for each node, and the\n\nv = Wk \u02dcAuvhk\n\n2 (A + I)D\u2212 1\n\nv. Here hk\n\n2Other Riemannian manifolds such as spherical space are beyond the scope of this work but might be\n\ninteresting to study in future work.\n\n2\n\n\fresultant matrix is normalized using the diagonal degree matrix (Dii =(cid:80)\n\nj(Aij + Iij)). We then\nobtain a new representation of u at step k + 1 by summing up all the messages from its neighbors\n), where I(u) is the set of\nbefore applying the activation function \u03c3: hk+1\nin-neighbors of u, i.e. v \u2208 I(u) if and if only v has an edge pointing to u. Thus, in a more compact\nnotation, the information propagates on the graph as:\n\nv\u2208I(u) mt+1\n\nv\n\nu = \u03c3((cid:80)\n\uf8eb\uf8ed (cid:88)\n\nv\u2208I(u)\n\n\uf8f6\uf8f8 .\n\nhk+1\n\nu = \u03c3\n\n\u02dcAuvWkhk\nv\n\n(1)\n\n3.1 Graph Neural Networks on Riemannian Manifolds\n\nv\u2208I(u)\n\nhk+1\n\nu = \u03c3\n\n\uf8f6\uf8f8 .\n\n\u02dcAuvWk logx(cid:48)(hk\n\n\uf8eb\uf8edexpx(cid:48)(\n\nA graph neural network comprises a series of basic operations, i.e. message passing via linear maps\nand pointwise non-linearities, on a set of nodes that live in a given space. While such operations\nare well-understood in Euclidean space, their counterparts in non-Euclidean space (which we are\ninterested in here) are non-trivial. We generalize the notion of a graph convolutional network such\nthat the network operates on Riemannian manifolds and becomes agnostic to the underlying space.\nSince the tangent space of a point on Riemannian manifolds always is Euclidean (or a subset of\nEuclidean space), functions with trainable parameters are executed there. The propagation rule for\neach node u \u2208 V is calculated as:\n\n(cid:88)\nv \u2208 M, where v \u2208 I(u) is a neighbor of u, to the\nAt layer k, we map each node representation hk\ntangent space of a chosen point x(cid:48) \u2208 M using the logarithmic map logx(cid:48). Here \u02dcA and Wk are\nthe normalized adjacency matrix and the trainable parameters, respectively, as in Equation 1. An\nexponential map expx(cid:48) is applied afterwards to map the linearly transformed tangent vector back to\n(cid:17)\nthe manifold.\nThe activation \u03c3 is applied after the exponential map to prevent model collapse: if the activation\nwas applied before the exponential map, i.e. hk+1\n, the\nexponential map expx(cid:48) at step k would have been cancelled by the logarithmic map logx(cid:48) at step k + 1\nas logx(cid:48)(expx(cid:48)(h)) = h. Hence, any such model would collapse to a vanilla Euclidean GCN with\na logarithmic map taking the input features of the GCN and an exponential map taking its outputs.\nAn alternative to prevent such collapse would be to introduce bias terms as in [17]. Importantly,\nwhen applying the non-linearity directly on a manifold M, we need to ensure that its application is\nmanifold preserving, i.e., that \u03c3 : M \u2192 M. We will propose possible choices for non-linearities in\nthe discussion of the respective manifolds.\n\n(cid:16)\n\u03c3((cid:80)\n\n\u02dcAuvWk logx(cid:48)(hk\n\nu = expx(cid:48)\n\nv\u2208I(u)\n\nv))\n\nv))\n\n(2)\n\n3.2 Riemannian Manifolds\nA Riemannian manifold (M, g) is a real and smooth manifold equipped with an inner product\ngx : TxM \u00d7 TxM \u2192 R at each point x \u2208 M, which is called a Riemannian metric and allows us to\nde\ufb01ne the geometric properties of a space such as angles and the length of a curve.\nWe experiment with Euclidean space and compare it to two different hyperbolic manifolds (note that\nthere exist multiple equivalent hyperbolic models, such as the Poincar\u00e9 ball and the Lorentz model,\nfor which transformations exist that preserve all geometric properties including isometry).\n\nEuclidean Space The Euclidean manifold is a manifold with zero curvature. The metric tensor is\nde\ufb01ned as gE = diag([1, 1, . . . , 1]). The closed-form distance, i.e. the length of the geodesic, which\nis a straight line in Euclidean space, between two points is given as:\n\n(cid:115)(cid:88)\n\nd(x, y) =\n\n(xi \u2212 yi)2\n\nThe exponential map of the Euclidean manifold is de\ufb01ned as:\n\ni\n\nexpx(v) = x + v\n\n3\n\n(3)\n\n(4)\n\n\fThe logarithmic map is given as:\n\n(5)\nIn order to make sure that the Euclidean manifold formulation is equivalent to the vanilla GCN model\ndescribed in Equation 1, as well as for reasons of computational ef\ufb01ciency, we choose x(cid:48) = x0 (i.e.,\nthe origin) as the \ufb01xed point on the manifold in whose tangent space we operate.\n\nlogx(y) = y \u2212 x\n\nPoincar\u00e9 Ball Model The Poincar\u00e9 ball model with constant negative curvature corresponds to\nthe Riemannian manifold (B, gB\nx ), where B = {x \u2208 Rn : (cid:107)x(cid:107) < 1} is an open unit ball. Its metric\ntensor is gB\nxgE, where \u03bbx =\n1\u2212(cid:107)x(cid:107)2 is the conformal factor and gE is the Euclidean metric\ntensor (see above). The distance between two points x, y \u2208 B is given as:\n\nx = \u03bb2\n\n2\n\n(cid:18)\n\n(cid:19)\n\n(cid:107)x \u2212 y(cid:107)2\n\ndB(x, y) = arcosh\n\n(6)\nFor any point x \u2208 B, the exponential map expx : TxB \u2192 B and the logarithmic map logx : B \u2192 TxB\nare de\ufb01ned for the tangent vector v (cid:54)= 0 and the point y (cid:54)= 0, respectively, as:\n\n(1 \u2212 (cid:107)x(cid:107)2)(1 \u2212 (cid:107)y(cid:107)2)\n\n1 + 2\n\n.\n\nexpx(v) = x \u2295(cid:16)\n\nlogx(y) =\n\n2\n\u03bbx\n\n(cid:16) \u03bbx(cid:107)v(cid:107)\n\n(cid:17) v\n\n(cid:17)\n\ntanh\n\n(cid:107)v(cid:107)\narctanh((cid:107) \u2212 x \u2295 y(cid:107))\n\n2\n\n\u2212x \u2295 y\n(cid:107) \u2212 x \u2295 y(cid:107) ,\n\nwhere \u2295 is the M\u00f6bius addition for any x, y \u2208 B:\n\n(1 + 2(cid:104)x, y(cid:105) + (cid:107)y(cid:107)2)x + (1 \u2212 (cid:107)x(cid:107)2)y\n\nx \u2295 y =\n\n(8)\nSimilar to the Euclidean case, and following [17], we use x(cid:48) = x0. On the Poincar\u00e9 ball, we employ\npointwise non-linearities which are norm decreasing, i.e., where |\u03c3(x)| \u2264 |x| (which is true for e.g.\nReLU and leaky ReLU). This ensures that \u03c3 : B \u2192 B since (cid:107)\u03c3(x)(cid:107) \u2264 (cid:107)x(cid:107).\n\n1 + 2(cid:104)x, y(cid:105) + (cid:107)x(cid:107)2(cid:107)y(cid:107)2\n\nLorentz Model The Lorentz model avoids numerical instabilities that may arise with the Poincar\u00e9\ndistance (mostly due to the division) [33]. Its stability is particularly useful for our architecture, since\nwe have to apply multiple sequential exponential and logarithmic maps in deep GNNs, which would\nnormally compound numerical issues, but which the Lorentz model avoids. Let x, y \u2208 Rn+1, then\nthe Lorentzian scalar product is de\ufb01ned as:\n\n(cid:104)x, y(cid:105)L = \u2212x0y0 +\n\nxnyn\n\n(9)\n\nn(cid:88)\n\ni=1\n\n(7)\n\n(10)\n\n(11)\n\n(12)\n\n(13)\n\nThe Lorentz model of n-dimensional hyperbolic space is then de\ufb01ned as the Riemannian manifold\n(L, gL\nx ), where L = {x \u2208 Rn+1 : (cid:104)x, x(cid:105)L = \u22121, x0 > 0} and where gL = diag([\u22121, 1, . . . , 1]).\nThe induced distance function is given as:\n\nThe exponential map expx : TxL \u2192 L and the logarithmic map logx : L \u2192 TxL are de\ufb01ned as:\n\ndL(x, y) = arcosh(\u2212(cid:104)x, y(cid:105)L)\n\nexpx(v) = cosh((cid:107)v(cid:107)L)x + sinh((cid:107)v(cid:107)L)\n\narcosh(\u2212(cid:104)x, y(cid:105)L)\n\n(cid:112)(cid:104)x, y(cid:105)2L \u2212 1\n\nv\n(cid:107)v(cid:107)L\n(y + (cid:104)x, y(cid:105)Lx),\n\nwhere (cid:107)v(cid:107)L =(cid:112)(cid:104)v, v(cid:105)L.\n\nlogx(y) =\n\nThe origin, i.e., the zero vector in Euclidean space and the Poincar\u00e9 ball, is equivalent to (1, 0, ..., 0)\nin the Lorentz model, which we use as x(cid:48). Since activation functions such as ReLU and leaky ReLU\nare not manifold-preserving in the Lorentz model, we \ufb01rst use Equation 12 to map the point from\nLorentz to Poincar\u00e9 and apply the activation \u03c3, before mapping it back using Equation 13:\n\npL\u2192B(x0, x1, ..., xn) =\n\n(x1, ..., xn)\n\nx0 + 1\n\n(1 + (cid:107)x(cid:107)2, 2x1, ..., 2xn)\n\n1 \u2212 (cid:107)x(cid:107)2\n\npB\u2192L(x0, x1, ..., xn) =\n\n4\n\n\fDimensionality\n\nEuclidean\nPoincare\nLorentz\n\n3\n\n77.2 \u00b1 0.12\n93.0 \u00b1 0.05\n94.1 \u00b1 0.03\n\n5\n\n90.0 \u00b1 0.21\n95.6 \u00b1 0.14\n95.1 \u00b1 0.25\n\n10\n\n90.6 \u00b1 0.17\n95.9 \u00b1 0.14\n96.4 \u00b1 0.23\n\n20\n\n94.8 \u00b1 0.25\n96.2 \u00b1 0.06\n96.6 \u00b1 0.22\n\n256\n\n95.3 \u00b1 0.17\n93.7 \u00b1 0.05\n95.3 \u00b1 0.28\n\nTable 1: F1 (macro) score and standard deviation of classifying synthetically generated graphs\naccording to the underlying graph generation algorithm (high is good).\n\n3.3 Centroid-Based Regression and Classi\ufb01cation\n\n1 , ..., hK|V |}, where each hK\n\nThe output of a hyperbolic graph neural network with K steps consists of a set of node representations\ni \u2208 M. Standard parametric classi\ufb01cation and regression methods in\n{hK\nEuclidean space are not generally applicable in the hyperbolic case. Hence, we propose an extension\nof the underlying idea of radial basis function networks [8, 36] to Riemannian manifolds. The key\nidea is to use a differentiable function \u03c8 : M \u2192 Rd that can be used to summarize the structure of\nthe node embeddings. More speci\ufb01cally, we \ufb01rst introduce a list of centroids C = [c1, c2, ..., c|C|],\nwhere each ci \u2208 M. The centroids are learned jointly with the GNN using backpropagation. The\nj ). Next, we concatenate\npairwise distance between ci and hK\nj\nall distances (\u03c81j, ..., \u03c8|C|j) \u2208 R|C| to summarize the position of hK\nrelative to the centroids. For\nnode-level regression,\n\nis calculated as: \u03c8ij = d(ci, hK\n\nj\n\nwhere wo \u2208 R|C|, and for node-level classi\ufb01cation,\n\n\u02c6y = wT\n\no (\u03c81j, ..., \u03c8|C|j),\n\np(yj) = softmax(cid:0)Wo(\u03c81j, ..., \u03c8|C|j)(cid:1) ,\n\n(14)\n\n(15)\n\nnodes, obtaining (\u03c81, ..., \u03c8|C|), where \u03c8i =(cid:80)|V |\n\nwhere Wo \u2208 Rc\u00d7|C| and c denotes the number of classes.\nFor graph-level predictions, we \ufb01rst use average pooling to combine the distances of different\nj=1 \u03c8ij/|V |, before feeding (\u03c81, ..., \u03c8|C|) into fully\nconnected networks. Standard cross entropy and mean square error are used as loss functions for\nclassi\ufb01cation and regression, respectively.\n\n3.4 Other details\n\nThe input features of neural networks are typically embeddings or features that live in Euclidean\nspace. For Euclidean features xE, we \ufb01rst apply expx(cid:48)(xE) to map it into the Riemannian manifolds.\nTo initialize embeddings E within the Riemannian manifold, we \ufb01rst uniformly sample from a range\n(e.g. [\u22120.01, 0.01]) to obtain Euclidean embeddings, before normalizing the embeddings to ensure\nthat each embedding ei \u2208 M. The Euclidean manifold is normalized into a unit ball to make sure\nwe compare fairly with the Poincar\u00e9 ball and the Lorentz model. This normalization causes minor\ndifferences with respect to the vanilla GCN model of Kipf & Welling [26] but as we show in the\nappendix, in practice this does not cause any signi\ufb01cant dissimilarities. We use leaky ReLU as\nthe activation function \u03c3 with the negative slope 0.5. We use RAMSGrad [4] and AMSGrad for\nhyperbolic parameters and Euclidean parameters, respectively.\n\n4 Experiments\n\nIn the following experiments, we will compare the performance of models using different spaces\nwithin the Riemannian manifold, comparing the canonical Euclidean version to Hyperbolic Graph\nNeural Networks using either Poincar\u00e9 or Lorentz manifolds.\n\n4.1 Synthetic Structures\n\nFirst, we attempt to corroborate the hypothesis that hyperbolic graph neural networks are better at\ncapturing structural information of graphs than their Euclidean counterpart. To that end, we design a\n\n5\n\n\f(a) Barab\u00e1si-Albert\n\n(b) Watts-Strogatz\n\nFigure 1\n\n(c) Erd\u02ddos-R\u00e9nyi\n\nsynthetic experiment, such that we have full control over the amount of structural information that is\nrequired for the classi\ufb01cation decision. Speci\ufb01cally, our task is to classify synthetically generated\ngraphs according to the underlying generation algorithm. We choose 3 distinct graph generation\nalgorithms: Erd\u02ddos-R\u00e9nyi [15], Barab\u00e1si-Albert [2] and Watts-Strogatz [46] (see Figure 1).\nThe graphs are constructed as follows. For each graph generation algorithm we uniformly sample\na number of nodes between 100 and 500 and subsequently employ the graph generation algorithm\non the nodes. For Barab\u00e1si-Albert graphs, we set the number of edges to attach from a new node to\nexisting nodes to a random number between 1 and 100. For Erd\u02ddos-R\u00e9nyi, the probability for edge\ncreation is set to 0.1 \u2212 1. For Watts-Strogatz, each node is connected to 1 \u2212 100 nearest neighbors in\nthe ring topology, and the probability of rewiring each edge is set to 0.1 \u2212 1.\nTable 1 shows the results of classifying the graph generation algorithm (as measured by F1 score\nover the three classes). For our comparison, we follow [32] and show results for different numbers\nof dimensions. We observe that our hyperbolic methods outperform the Euclidean alternative by\na large margin. Owing to the representational ef\ufb01ciency of hyperbolic methods, the difference is\nparticularly big for low-dimensional cases. The Lorentz model does better than the Poincar\u00e9 one in\nall but one case. The differences become smaller with higher dimensionality, as we should expect,\nbut hyperbolic methods still do better in the relatively high dimensionality of 256. We speculate that\nthis is due to their having better inductive biases for capturing the structural properties of the graphs,\nwhich is extremely important for solving this particular task.\n\n4.2 Molecular Structures\n\nGraphs are ubiquitous as data structures, but one domain where neural networks for graph data have\nbeen particularly impactful is in modeling chemical problems. Applications include molecular design\n[31], \ufb01ngerprinting [14] and poly pharmaceutical side-effect modeling [52].\nMolecular property prediction has received attention as a reasonable benchmark for supervised\nlearning on molecules [18]. A popular choice for this purpose is the QM9 dataset [37]. Unfortunately,\nit is hard to compare to previous work on this dataset, as the original splits from [18] are no longer\navailable (per personal correspondence). One characteristic with QM9 is that the molecules are\nrelatively small (around 10 nodes per graph) and that there is high variance in the results. Hence,\nwe instead use the much larger ZINC dataset [44, 24, 23], which has been used widely in graph\ngeneration for molecules using machine learning methods [25, 31]. However, see the appendix for\nresults on QM8 [40, 39] and QM9 [40, 38].\nZINC is a large dataset of commercially available drug-like chemical compounds. For ZINC, the\ninput consists of embedding representations of atoms together with an adjacency matrix, without\nany additional handcrafted features. A master node [18] is added to the adjacency matrix to speed\nup message passing. The dataset consists of 250k examples in total, out of which we randomly\nsample 25k for the validation and test sets, respectively. On average, these molecules are bigger\n(23 heavy atoms on average) and structurally more complex than the molecules in QM9. ZINC is\nmulti-relational, i.e. there are four types of relations for molecules, i.e. single bond, double bond,\ntriple bond and aromatic bond.\n\n6\n\n\fEuclidean\nPoincare\nLorentz\n\n3\n\n6.7 \u00b1 0.07\n5.7 \u00b1 0.00\n5.5 \u00b1 0.02\n\n5\n\n4.7 \u00b1 0.03\n4.6 \u00b1 0.03\n4.5 \u00b1 0.03\n\nEuclidean\nPoincare\nLorentz\n\n3\n\n22.4 \u00b1 0.21\n22.1 \u00b1 0.01\n21.9 \u00b1 0.12\n\n5\n\n15.9 \u00b1 0.14\n14.9 \u00b1 0.13\n14.3 \u00b1 0.12\n\nEuclidean\nPoincare\nLorentz\n\n3\n\n20.5 \u00b1 0.04\n18.8 \u00b1 0.03\n18.0 \u00b1 0.15\n\n5\n\n16.8 \u00b1 0.07\n16.1 \u00b1 0.08\n16.0 \u00b1 0.15\n\nlogP\n10\n\n4.7 \u00b1 0.02\n3.6 \u00b1 0.02\n3.3 \u00b1 0.03\nQED\n10\n\n14.5 \u00b1 0.09\n10.2 \u00b1 0.02\n8.7 \u00b1 0.04\n\nSAS\n10\n\n14.5 \u00b1 0.11\n12.9 \u00b1 0.04\n12.5 \u00b1 0.07\n\n20\n\n3.6 \u00b1 0.00\n3.2 \u00b1 0.01\n2.9 \u00b1 0.01\n\n256\n\n3.3 \u00b1 0.00\n3.1 \u00b1 0.01\n2.4 \u00b1 0.02\n\n20\n\n10.2 \u00b1 0.08\n6.9 \u00b1 0.02\n6.7 \u00b1 0.06\n\n256\n\n6.4 \u00b1 0.06\n6.0 \u00b1 0.04\n4.7 \u00b1 0.00\n\n20\n\n9.6 \u00b1 0.05\n9.3 \u00b1 0.07\n9.1 \u00b1 0.08\n\n256\n\n9.2 \u00b1 0.08\n8.6 \u00b1 0.02\n7.7 \u00b1 0.06\n\nTable 2: Mean absolute error of predicting molecular properties: the water-octanal partition coef\ufb01cient\n(logP); qualitative estimate of drug-likeness (QED); and synthetic accessibility score (SAS). Scaled\nby 100 for table formatting (low is good).\n\nDTNN [43]\nMPNN [18]\nGGNN [29]\nEuclidean\nPoincare\nLorentz\n\nlogP\n\n4.0 \u00b1 0.03\n4.1 \u00b1 0.02\n3.2 \u00b1 0.20\n3.3 \u00b1 0.00\n3.1 \u00b1 0.01\n2.4 \u00b1 0.02\n\nQED\n\n8.1 \u00b1 0.04\n8.4 \u00b1 0.05\n6.4 \u00b1 0.20\n6.4 \u00b1 0.06\n6.0 \u00b1 0.04\n4.7 \u00b1 0.00\n\nSAS\n\n9.9 \u00b1 0.06\n9.2 \u00b1 0.07\n9.1 \u00b1 0.10\n9.2 \u00b1 0.08\n8.6 \u00b1 0.02\n7.7 \u00b1 0.06\n\nTable 3: Mean absolute error of predicting molecular properties logP, QED and SAS, as compared to\ncurrent state-of-the-art deep learning methods. Scaled by 100 for table formatting (low is good).\n\nFirst, in order to enable our graph neural networks to handle multi-relational data, we follow [42]\nr where r \u2208 R is the\nand extend Equation 2 to incorporate a multirelational weight matrix Wk\nset of relations, which we sum over. As before, we compare the various methods for different\ndimensionalities. The results can be found in Table 2.\nWe also compare to three strong baselines on exactly the same data splits of the ZINC dataset: graph-\ngated neural networks [GGNN; 29], deep tensor neural networks [DTNN; 43] and message-passing\nneural networks [MPNN; 18]. GGNN adds a GRU-like update [11] that incorporates information\nfrom neighbors and previous timesteps in order to update node representations. DTNN takes as the\ninput a fully connected weighted graph and aggregates node representations using a deep tensor\nneural network. For DTNN and MPNN we use the implementations in DeepChem3, a well-known\nopen-source toolchain for deep-learning in drug discovery, materials science, quantum chemistry, and\nbiology. For GGNN, we use the publicly available open-source implementation4. A comparison of\nour proposed approach to these methods can be found in Table 3.\nWe \ufb01nd that the Lorentz model outperforms the Poincar\u00e9 ball on all properties, which illustrates\nthe bene\ufb01ts of its improved numerical stability. The Euclidean manifold performs worse than\nthe hyperbolic versions, con\ufb01rming the effectiveness of hyperbolic models for modeling complex\nstructures in graph data. Furthermore, as can be seen in the appendix, the computational overhead of\nusing non-Euclidean manifolds is relatively minor.\n\n3https://deepchem.io/\n4https://github.com/microsoft/gated-graph-neural-network-samples\n\n7\n\n\fNode2vec\nARIMA\nEuclidean\nPoincare\nLorentz\n\nDev\n54.10 \u00b1 1.63\n54.50 \u00b1 0.16\n56.15 \u00b1 0.30\n57.03 \u00b1 0.28\n57.52 \u00b1 0.35\n\nTest\n52.44 \u00b1 1.10\n53.07 \u00b1 0.06\n53.95 \u00b1 0.20\n54.41 \u00b1 0.24\n55.51 \u00b1 0.37\n\n\u201cWhale\u201d nodes All nodes\n0.33178\n\n0.20129\n\nNorm\n\nTable 4: Accuracy of predicting price \ufb02uctations\n(up-down) for the Ether/USDT market rate based\non graph dynamics.\n\nTable 5: Average norm of in\ufb02uential \u201cwhale\u201d\nnodes. Whales are signi\ufb01cantly closer to the\norigin than average, indicating their importance.\n\nGGNN obtains comparable results to the Euclidean GNN. DTNN performs worse than the other\nmodels, as it relies on distance matrices ignoring multi-relational information during message passing.\n\n4.3 Blockchain Transaction Graphs\n\nIn terms of publicly accessible graph-structured data, blockchain networks like Ethereum constitute\nsome of the largest sources of graph data in the world. Interestingly, \ufb01nancial transaction networks\nsuch as the blockchain have a strongly hierarchical nature: the blockchain ecosystem has even\ninvented its own terminology for this, e.g., the market has long been speculated to be manipulated\nby \u201cwhales\u201d. A whale is a user (address) with enough \ufb01nancial resources to move the market in\ntheir favored direction. The structure of the blockchain graph and its dynamics over time have been\nused as a way of quantifying the \u201ctrue value\u201d of a network [49, 47]. Blockchain networks have\nuncharacteristic dynamics [30], but the distribution of wealth on the blockchain follows a power-law\ndistribution that is arguably (even) more skewed than in traditional markets [5]. This means that\nthe behavior of important \u201cwhale\u201d nodes in the graph might be more predictive of \ufb02uctations (up or\ndown) in the market price of the underlying asset, which should be easier to capture using hyperbolic\ngraph neural networks.\nHere, we study the problem of predicting price \ufb02uctuations for the underlying asset of the Ethereum\nblockchain [48], based on the large-scale behavior of nodes in transaction graphs (see the appendix\nfor more details). Each node (i.e., address in the transaction graph) is associated with the same\nembedding over all timepoints. Models are provided with node embeddings and the transaction graph\nfor a given time frame, together with the Ether/USDT market rate for the given time period. The\ntransaction graph is a directed multi-graph where edge weights correspond to the transaction amount.\nTo encourage message passing on the graphs, we enhance the transaction graphs with inverse edges\nu \u2192 v for each edge v \u2192 u. As a result, Equation 1 is extended to the bidirectional case:\n\nhk+1\n\nu = \u03c3\n\n\u02dcAuvWk logx(cid:48)(hk\n\nv) +\n\n\u02dcAuv \u02dcWk logx(cid:48)(hk\n\nv))\n\n(16)\n\nwhere O(u) is the set of out-neighbors of u, i.e. v \u2208 O(u) if and if only u has an edge pointing to v.\nWe use the mean candlestick price over a period of 8 hours in total as additional inputs to the network.\nTable 4 shows the results. We compare against a baseline of inputting averaged 128-dimensional\nnode2vec [20] features for the same time frame to an MLP classi\ufb01er. We found that it helped if we\nonly used the node2vec features for the top k nodes ordered by degree, for which we report results\nhere (and which seems to con\ufb01rm our suspicion that the transaction graph is strongly hierarchical). In\naddition, we compare against using the autoregressive integrated moving average (ARIMA), which is\na common baseline for time series predictions. As before, we \ufb01nd that Lorentz performs signi\ufb01cantly\nbetter than Poincar\u00e9, which in turn outperforms the Euclidean manifold.\nOne of the bene\ufb01ts of using hyperbolic representations is that we can inspect the hierarchy that the\nnetwork has learned. We use this property to sanity check our proposed architecture: if it is indeed\nthe case that hyperbolic networks model the latent hierarchy, nodes for what would objectively be\nconsidered in\ufb02uential \u201cwhale\u201d nodes would have to be closer to the origin. Table 5 shows the average\nnorm of whale nodes compared to the average. For our list of whale nodes, we obtain the top 10000\n\n8\n\n\uf8eb\uf8edexpu(\n\n(cid:88)\n\nv\u2208I(u)\n\n(cid:88)\n\nv\u2208O(u)\n\n\uf8f6\uf8f8 ,\n\n\faddresses according to Etherscan5, compared to the total average of over 2 million addresses. Top\nwhale nodes include exchanges, initial coin offerings (ICO) and original developers of Ethereum. We\nobserve a lower norm for whale addresses, re\ufb02ecting their importance in the hierarchy and in\ufb02uence\non price \ufb02uctuations, which the hyperbolic graph neural networks is able to pick up on.\n\n5 Conclusion\n\nWe described a method for generalizing graph neural networks to Riemannian manifolds, making\nthem agnostic to the underlying space. Within this framework, we harnessed the power of hyperbolic\ngeometry for full graph classi\ufb01cation. Hyperbolic representations are well-suited for capturing high-\nlevel structural information, even in low dimensions. We \ufb01rst showed that hyperbolic methods are\nmuch better at classifying graphs according to their structure by using synthetic data, where the task\nwas to distinguish randomly generated graphs based on the underlying graph generation algorithm.\nWe then applied our method to molecular property prediction on the ZINC dataset, and showed\nthat hyperbolic methods again outperformed their Euclidean counterpart, as well as state-of-the-art\nmodels developed by the wider community. Lastly, we showed that a large-scale hierarchical graph,\nsuch as the transaction graph of a blockchain network, can successfully be modeled for extraneous\nprediction of price \ufb02uctuations. We showed that the proposed architecture successfully made use of\nits geometrical properties in order to capture the hierarchical nature of the data.\n\nReferences\n[1] Rodrigo Aldecoa, Chiara Orsini, and Dmitri Krioukov. Hyperbolic graph generator. Computer\n\nPhysics Communications, 196:492\u2013496, 2015.\n\n[2] Albert-L\u00e1szl\u00f3 Barab\u00e1si and R\u00e9ka Albert. Emergence of scaling in random networks. Science,\n\n286(5439):509\u2013512, 1999.\n\n[3] Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius\nZambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan\nFaulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint\narXiv:1806.01261, 2018.\n\n[4] Gary B\u00e9cigneul and Octavian-Eugen Ganea. Riemannian adaptive optimization methods. In\n7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA,\nMay 6-9, 2019, 2019.\n\n[5] Stjepan Begu\u0161i\u00b4c, Zvonko Kostanj\u02c7car, H Eugene Stanley, and Boris Podobnik. Scaling properties\nof extreme price \ufb02uctuations in bitcoin markets. Physica A: Statistical Mechanics and its\nApplications, 510:400\u2013406, 2018.\n\n[6] M Bogu\u00f1\u00e1, F Papadopoulos, and D Krioukov. Sustaining the internet with hyperbolic mapping.\n\nNature communications, 1:62, 2010.\n\n[7] Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst.\nGeometric deep learning: Going beyond euclidean data. IEEE Signal Process. Mag., 34(4):18\u2013\n42, 2017.\n\n[8] D.S. Broomhead and D Lowe. Multi-variable functional interpolation and adaptive networks.\n\nComplex Systems, 2:321\u2013355, 1988.\n\n[9] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally\nconnected networks on graphs. In 2nd International Conference on Learning Representations,\nICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.\n\n[10] Ines Chami, Rex Ying, Christopher R\u00e9, and Jure Leskovec. Hyperbolic graph convolutional\nneural networks. The Thirty-third Conference on Neural Information Processing Systems, 2019.\n\n5https://etherscan.io\n\n9\n\n\f[11] Kyunghyun Cho, Bart van Merrienboer, \u00c7aglar G\u00fcl\u00e7ehre, Dzmitry Bahdanau, Fethi Bougares,\nHolger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-\ndecoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical\nMethods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A\nmeeting of SIGDAT, a Special Interest Group of the ACL, pages 1724\u20131734, 2014.\n\n[12] Christopher De Sa, Albert Gu, Christopher R\u00e9, and Frederic Sala. Representation tradeoffs for\n\nhyperbolic embeddings. arXiv preprint arXiv:1804.03329, 2018.\n\n[13] Micha\u00ebl Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks\non graphs with fast localized spectral \ufb01ltering. In Advances in Neural Information Processing\nSystems 29: Annual Conference on Neural Information Processing Systems 2016, December\n5-10, 2016, Barcelona, Spain, pages 3837\u20133845, 2016.\n\n[14] David K. Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Bombarell, Timothy\nHirzel, Al\u00e1n Aspuru-Guzik, and Ryan P. Adams. Convolutional networks on graphs for learning\nmolecular \ufb01ngerprints. In Advances in Neural Information Processing Systems 28: Annual\nConference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal,\nQuebec, Canada, pages 2224\u20132232, 2015.\n\n[15] Paul Erd\u02ddos and Alfr\u00e9d R\u00e9nyi. On random graphs, i. Publicationes Mathematicae (Debrecen),\n\n6:290\u2013297, 1959.\n\n[16] O.-E. Ganea, G. Becigneul, and T. Hofmann. Hyperbolic entailment cones for learning hierar-\nchical embeddings. In Proceedings of the 35th International Conference on Machine Learning\n(ICML), volume 80 of Proceedings of Machine Learning Research, pages 1646\u20131655. PMLR,\nJuly 2018.\n\n[17] Octavian-Eugen Ganea, Gary B\u00e9cigneul, and Thomas Hofmann. Hyperbolic neural networks.\nIn Advances in Neural Information Processing Systems 31: Annual Conference on Neural\nInformation Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montr\u00e9al, Canada.,\npages 5350\u20135360, 2018.\n\n[18] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural\nmessage passing for quantum chemistry. In Proceedings of the 34th International Conference\non Machine Learning-Volume 70, pages 1263\u20131272. JMLR. org, 2017.\n\n[19] Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph\ndomains. In Neural Networks, 2005. IJCNN\u201905. Proceedings. 2005 IEEE International Joint\nConference on, volume 2, pages 729\u2013734. IEEE, 2005.\n\n[20] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks.\n\nIn\nProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and\ndata mining, pages 855\u2013864. ACM, 2016.\n\n[21] \u00c7aglar G\u00fcl\u00e7ehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz\nHermann, Peter W. Battaglia, Victor Bapst, David Raposo, Adam Santoro, and Nando de Freitas.\nHyperbolic attention networks. In 7th International Conference on Learning Representations,\nICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.\n\n[22] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large\n\ngraphs. In Advances in Neural Information Processing Systems, pages 1024\u20131034, 2017.\n\n[23] John J Irwin and Brian K Shoichet. Zinc- a free database of commercially available compounds\n\nfor virtual screening. Journal of chemical information and modeling, 45(1):177\u2013182, 2005.\n\n[24] John J Irwin, Teague Sterling, Michael M Mysinger, Erin S Bolstad, and Ryan G Coleman. Zinc:\na free tool to discover chemistry for biology. Journal of chemical information and modeling,\n52(7):1757\u20131768, 2012.\n\n[25] Wengong Jin, Regina Barzilay, and Tommi S. Jaakkola. Junction tree variational autoencoder\nIn Proceedings of the 35th International Conference on\nfor molecular graph generation.\nMachine Learning, ICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden, July 10-15, 2018,\npages 2328\u20132337, 2018.\n\n10\n\n\f[26] Thomas N. Kipf and Max Welling. Semi-supervised classi\ufb01cation with graph convolutional\nnetworks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon,\nFrance, April 24-26, 2017, Conference Track Proceedings, 2017.\n\n[27] Robert Kleinberg. Geographic routing using hyperbolic space. In INFOCOM 2007. 26th IEEE\nInternational Conference on Computer Communications. IEEE, pages 1902\u20131909. IEEE, 2007.\n\n[28] Dmitri Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Mari\u00e1n Bogun\u00e1.\n\nHyperbolic geometry of complex networks. Physical Review E, 82(3):036106, 2010.\n\n[29] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. Gated graph sequence\nneural networks. In 4th International Conference on Learning Representations, ICLR 2016, San\nJuan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.\n\n[30] Jiaqi Liang, Linjing Li, and Daniel Zeng. Evolutionary dynamics of cryptocurrency transaction\n\nnetworks: An empirical study. PloS one, 13(8):e0202202, 2018.\n\n[31] Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander Gaunt. Constrained graph\nvariational autoencoders for molecule design. In Advances in Neural Information Processing\nSystems, pages 7795\u20137804, 2018.\n\n[32] Maximilian Nickel and Douwe Kiela. Poincar\u00e9 embeddings for learning hierarchical represen-\ntations. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and\nR. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6338\u20136347.\nCurran Associates, Inc., 2017.\n\n[33] Maximilian Nickel and Douwe Kiela. Learning continuous hierarchies in the lorentz model of\nhyperbolic geometry. In Proceedings of the 35th International Conference on Machine Learning,\nICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden, July 10-15, 2018, pages 3776\u20133785, 2018.\n\n[34] Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural\nnetworks for graphs. In Proceedings of the 33nd International Conference on Machine Learning,\nICML 2016, New York City, NY, USA, June 19-24, 2016, pages 2014\u20132023, 2016.\n\n[35] Fragkiskos Papadopoulos, Maksim Kitsak, M \u00c1ngeles Serrano, Mari\u00e1n Bogun\u00e1, and Dmitri\n\nKrioukov. Popularity versus similarity in growing networks. Nature, 489(7417):537, 2012.\n\n[36] Tomaso Poggio and Federico Girosi. Networks for approximation and learning. Proceedings of\n\nthe IEEE, 78(9):1481\u20131497, 1990.\n\n[37] Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole Von Lilienfeld.\nQuantum chemistry structures and properties of 134 kilo molecules. Scienti\ufb01c data, 1:140022,\n2014.\n\n[38] Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole Von Lilienfeld.\nQuantum chemistry structures and properties of 134 kilo molecules. Scienti\ufb01c data, 1:140022,\n2014.\n\n[39] Raghunathan Ramakrishnan, Mia Hartmann, Enrico Tapavicza, and O Anatole Von Lilienfeld.\nElectronic spectra from tddft and machine learning in chemical space. The Journal of chemical\nphysics, 143(8):084111, 2015.\n\n[40] Lars Ruddigkeit, Ruud Van Deursen, Lorenz C Blum, and Jean-Louis Reymond. Enumeration\nof 166 billion organic small molecules in the chemical universe database gdb-17. Journal of\nchemical information and modeling, 52(11):2864\u20132875, 2012.\n\n[41] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini.\n\nThe graph neural network model. IEEE Trans. Neural Networks, 20(1):61\u201380, 2009.\n\n[42] Michael Sejr Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and\nMax Welling. Modeling relational data with graph convolutional networks. In The Semantic\nWeb - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018,\nProceedings, pages 593\u2013607, 2018.\n\n11\n\n\f[43] Kristof T Sch\u00fctt, Farhad Arbabzadah, Stefan Chmiela, Klaus R M\u00fcller, and Alexandre\nTkatchenko. Quantum-chemical insights from deep tensor neural networks. Nature com-\nmunications, 8:13890, 2017.\n\n[44] Teague Sterling and John J Irwin. Zinc 15\u2013ligand discovery for everyone. Journal of chemical\n\ninformation and modeling, 55(11):2324\u20132337, 2015.\n\n[45] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li\u00f2, and Yoshua\nBengio. Graph attention networks. In 6th International Conference on Learning Representations,\nICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings,\n2018.\n\n[46] Duncan J Watts and Steven H Strogatz. Collective dynamics of \u2018small-world\u2019networks. Nature,\n\n393(6684):440, 1998.\n\n[47] Spencer Wheatley, Didier Sornette, M Reppen, T Huber, and RN Gantner. Are bitcoin bubbles\npredictable. Combining a Generalised Metcalfe\u2019s Law and the LPPLS Model, Swiss Finance\nInstitute Research Paper, (18-22), 2018.\n\n[48] Gavin Wood et al. Ethereum: A secure decentralised generalised transaction ledger. Ethereum\n\nproject yellow paper, 151:1\u201332, 2014.\n\n[49] Ke Wu, Spencer Wheatley, and Didier Sornette. Classi\ufb01cation of crypto-coins and tokens from\nthe dynamics of their power law capitalisation distributions. Technical report, Royal Society\nopen science, 2018.\n\n[50] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure\nLeskovec. Graph convolutional neural networks for web-scale recommender systems.\nIn\nProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &\nData Mining, pages 974\u2013983. ACM, 2018.\n\n[51] Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. In Advances\n\nin Neural Information Processing Systems, pages 5165\u20135175, 2018.\n\n[52] Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with\n\ngraph convolutional networks. Bioinformatics, 34(13):i457\u2013i466, 2018.\n\n12\n\n\f", "award": [], "sourceid": 4472, "authors": [{"given_name": "Qi", "family_name": "Liu", "institution": "University of Oxford"}, {"given_name": "Maximilian", "family_name": "Nickel", "institution": "Facebook AI Research"}, {"given_name": "Douwe", "family_name": "Kiela", "institution": "Facebook AI Research"}]}