{"title": "Hyperbolic Graph Convolutional Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 4868, "page_last": 4879, "abstract": "Graph convolutional neural networks (GCNs) embed nodes in a graph into Euclidean space, which has been shown to incur a large distortion when embedding real-world graphs with scale-free or hierarchical structure. Hyperbolic geometry offers an exciting alternative, as it enables embeddings with much smaller distortion. However, extending GCNs to hyperbolic geometry presents several unique challenges because it is not clear how to define neural network operations, such as feature transformation and aggregation, in hyperbolic space. Furthermore, since input features are often Euclidean, it is unclear how to transform the features into hyperbolic embeddings with the right amount of curvature. Here we propose Hyperbolic Graph Convolutional Neural Network (HGCN), the first inductive hyperbolic GCN that leverages both the expressiveness of GCNs and hyperbolic geometry to learn inductive node representations for hierarchical and scale-free\ngraphs. We derive GCNs operations in the hyperboloid model of hyperbolic space and map Euclidean input features to embeddings in hyperbolic spaces with different trainable curvature at each layer. Experiments demonstrate that HGCN learns embeddings that preserve hierarchical structure, and leads to improved performance when compared to Euclidean analogs, even with very low dimensional embeddings: compared to state-of-the-art GCNs, HGCN achieves an error reduction of up to 63.1% in ROC AUC for link prediction and of up to 47.5% in F1 score for node classification, also improving state-of-the art on the Pubmed dataset.", "full_text": "Hyperbolic Graph Convolutional Neural Networks\n\nInes Chami\u2217\u2021\n\nRex Ying\u2217\u2020\n\nChristopher R\u00b4e\u2020\n\nJure Leskovec\u2020\n\n\u2020Department of Computer Science, Stanford University\n\n\u2021Institute for Computational and Mathematical Engineering, Stanford University\n\n{chami, rexying, chrismre, jure}@cs.stanford.edu\n\nAbstract\n\nGraph convolutional neural networks (GCNs) embed nodes in a graph into Eu-\nclidean space, which has been shown to incur a large distortion when embedding\nreal-world graphs with scale-free or hierarchical structure. Hyperbolic geome-\ntry offers an exciting alternative, as it enables embeddings with much smaller\ndistortion. However, extending GCNs to hyperbolic geometry presents several\nunique challenges because it is not clear how to de\ufb01ne neural network operations,\nsuch as feature transformation and aggregation, in hyperbolic space. Furthermore,\nsince input features are often Euclidean, it is unclear how to transform the features\ninto hyperbolic embeddings with the right amount of curvature. Here we propose\nHyperbolic Graph Convolutional Neural Network (HGCN), the \ufb01rst inductive\nhyperbolic GCN that leverages both the expressiveness of GCNs and hyperbolic\ngeometry to learn inductive node representations for hierarchical and scale-free\ngraphs. We derive GCNs operations in the hyperboloid model of hyperbolic space\nand map Euclidean input features to embeddings in hyperbolic spaces with different\ntrainable curvature at each layer. Experiments demonstrate that HGCN learns\nembeddings that preserve hierarchical structure, and leads to improved performance\nwhen compared to Euclidean analogs, even with very low dimensional embeddings:\ncompared to state-of-the-art GCNs, HGCN achieves an error reduction of up to\n63.1% in ROC AUC for link prediction and of up to 47.5% in F1 score for node\nclassi\ufb01cation, also improving state-of-the art on the Pubmed dataset.\n\n1\n\nIntroduction\n\nGraph Convolutional Neural Networks (GCNs) are state-of-the-art models for representation learning\nin graphs, where nodes of the graph are embedded into points in Euclidean space [15, 21, 41, 45].\nHowever, many real-world graphs, such as protein interaction networks and social networks, often\nexhibit scale-free or hierarchical structure [7, 50] and Euclidean embeddings, used by existing GCNs,\nhave a high distortion when embedding such graphs [6, 32]. In particular, scale-free graphs have\ntree-like structure and in such graphs the graph volume, de\ufb01ned as the number of nodes within some\nradius to a center node, grows exponentially as a function of radius. However, the volume of balls in\nEuclidean space only grows polynomially with respect to the radius, which leads to high distortion\nembeddings [34, 35], while in hyperbolic space, this volume grows exponentially.\nHyperbolic geometry offers an exciting alternative as it enables embeddings with much smaller\ndistortion when embedding scale-free and hierarchical graphs. However, current hyperbolic embed-\nding techniques only account for the graph structure and do not leverage rich node features. For\ninstance, Poincar\u00b4e embeddings [29] capture the hyperbolic properties of real graphs by learning\nshallow embeddings with hyperbolic distance metric and Riemannian optimization. Compared to\n\n\u2217Equal contribution\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: Left: Poincar\u00b4e disk geodesics (shortest path) connecting x and y for different curvatures.\nAs curvature (\u22121/K) decreases, the distance between x and y increases, and the geodesics lines get\ncloser to the origin. Center: Hyperbolic distance vs curvature. Right: Poincar\u00b4e geodesic lines. x\n\ndeep alternatives such as GCNs, shallow embeddings do not take into account features of nodes, lack\nscalability, and lack inductive capability. Furthermore, in practice, optimization in hyperbolic space\nis challenging.\nWhile extending GCNs to hyperbolic geometry has the potential to lead to more faithful embeddings\nand accurate models, it also poses many hard challenges: (1) Input node features are usually Euclidean,\nand it is not clear how to optimally use them as inputs to hyperbolic neural networks; (2) It is not\nclear how to perform set aggregation, a key step in message passing, in hyperbolic space; And (3)\none needs to choose hyperbolic spaces with the right curvature at every layer of the GCN.\nHere we solve the above challenges and propose Hyperbolic Graph Convolutional Networks\n(HGCN)2, a class of graph representation learning models that combines the expressiveness of\nGCNs and hyperbolic geometry to learn improved representations for real-world hierarchical and\nscale-free graphs in inductive settings: (1) We derive the core operations of GCNs in the hyperboloid\nmodel of hyperbolic space to transform input features which lie in Euclidean space into hyperbolic\nembeddings; (2) We introduce a hyperbolic attention-based aggregation scheme that captures hierar-\nchical structure of networks; (3) At different layers of HGCN we apply feature transformations in\nhyperbolic spaces of different trainable curvatures to learn low-distortion hyperbolic embeddings.\nThe transformation between different hyperbolic spaces at different layers allows HGCN to \ufb01nd the\nbest geometry of hidden layers to achieve low distortion and high separation of class labels. Our\napproach jointly trains the weights for hyperbolic graph convolution operators, layer-wise curvatures\nand hyperbolic attention to learn inductive embeddings that re\ufb02ect hierarchies in graphs.\nCompared to Euclidean GCNs, HGCN offers improved expressiveness for hierarchical graph data.\nWe demonstrate the ef\ufb01cacy of HGCN in link prediction and node classi\ufb01cation tasks on a wide\nrange of open graph datasets which exhibit different extent of hierarchical structure. Experiments\nshow that HGCN signi\ufb01cantly outperforms Euclidean-based state-of-the-art graph neural networks\non scale-free graphs and reduces error from 11.5% up to 47.5% on node classi\ufb01cation tasks and\nfrom 28.2% up to 63.1% on link prediction tasks. Furthermore, HGCN achieves new state-of-the-art\nresults on the standard PUBMED benchmark. Finally, we analyze the notion of hierarchy learned by\nHGCN and show how the embedding geometry transforms from Euclidean features to hyperbolic\nembeddings.\n\n2 Related Work\n\nThe problem of graph representation learning belongs to the \ufb01eld of geometric deep learning. There\nexist two major types of approaches: transductive shallow embeddings and inductive GCNs.\nTransductive, shallow embeddings. The \ufb01rst type of approach attempts to optimize node embed-\ndings as parameters by minimizing a reconstruction error. In other words, the mapping from nodes\nin a graph to embeddings is an embedding look-up. Examples include matrix factorization [3, 24]\nand random walk methods [12, 31]. Shallow embedding methods have also been developed in\nhyperbolic geometry [29, 30] for reconstructing trees [35] and graphs [5, 13, 22], or embedding text\n\n2Project website with code and data: http://snap.stanford.edu/hgcn\n\n2\n\n\u221aKxy0\u221211\u221210\u22129\u22128\u22127\u22126\u22121/K0.51.01.52.02.53.03.5d(x,y)\f[39]. However, shallow (Euclidean and hyperbolic) embedding methods have three major downsides:\n(1) They fail to leverage rich node feature information, which can be crucial in tasks such as node\nclassi\ufb01cation. (2) These methods are transductive, and therefore cannot be used for inference on\nunseen graphs. And, (3) they scale poorly as the number of model parameters grows linearly with the\nnumber of nodes.\n(Euclidean) Graph Neural Networks. Instead of learning shallow embeddings, an alternative\napproach is to learn a mapping from input graph structure as well as node features to embeddings,\nparameterized by neural networks [15, 21, 25, 41, 45, 47]. While various Graph Neural Network\narchitectures resolve the disadvantages of shallow embeddings, they generally embed nodes into a\nEuclidean space, which leads to a large distortion when embedding real-world graphs with scale-free\nor hierarchical structure. Our work builds on GNNs and extends them to hyperbolic geometry.\nHyperbolic Neural Networks. Hyperbolic geometry has been applied to neural networks, to prob-\nlems of computer vision or natural language processing [8, 14, 18, 38]. More recently, hyperbolic\nneural networks [10] were proposed, where core neural network operations are in hyperbolic space.\nIn contrast to previous work, we derive the core neural network operations in a more stable model of\nhyperbolic space, and propose new operations for set aggregation, which enables HGCN to perform\ngraph convolutions with attention in hyperbolic space with trainable curvature. After NeurIPS 2019\nannounced accepted papers, we also became aware of the concurrently developed HGNN model [26]\nfor learning GNNs in hyperbolic space. The main difference with our work is how our HGCN de\ufb01nes\nthe architecture for neighborhood aggregation and uses a learnable curvature. Additionally, while\n[26] demonstrates strong performance on graph classi\ufb01cation tasks and provides an elegant extension\nto dynamic graph embeddings, we focus on link prediction and node classi\ufb01cation.\n\n3 Background\n\nProblem setting. Without loss of generality we describe graph representation learning on a single\ngraph. Let G = (V,E) be a graph with vertex set V and edge set E, and let (x0,E\n)i\u2208V be d-\ndimensional input node features where 0 indicates the \ufb01rst layer. We use the superscript E to indicate\nthat node features lie in a Euclidean space and use H to denote hyperbolic features. The goal in graph\nrepresentation learning is to learn a mapping f which maps nodes to embedding vectors:\n\ni\n\nf : (V,E, (x0,E\n\ni\n\n)i\u2208V ) \u2192 Z \u2208 R|V|\u00d7d(cid:48)\n\n,\n\nwhere d(cid:48)\n(cid:28) |V|. These embeddings should capture both structural and semantic information and can\nthen be used as input for downstream tasks such as node classi\ufb01cation and link prediction.\nGraph Convolutional Neural Networks (GCNs). Let N (i) = {j : (i, j) \u2208 E} denote a set of\nneighbors of i \u2208 V, (W (cid:96), b(cid:96)) be weights and bias parameters for layer (cid:96), and \u03c3(\u00b7) be a non-linear\nactivation function. General GCN message passing rule at layer (cid:96) for node i then consists of:\n\ni = W (cid:96)x(cid:96)\u22121,E\nh(cid:96),E\ni\nx(cid:96),E\ni = \u03c3(h(cid:96),E\ni +\n\n(cid:88)\n\n+ b(cid:96)\n\nj\u2208N (i)\n\nwijh(cid:96),E\n\nj\n\n)\n\n(feature transform)\n(neighborhood aggregation)\n\n(1)\n(2)\n\nwhere aggregation weights wij can be computed using different mechanisms [15, 21, 41]. Message\npassing is then performed for multiple layers to propagate messages over network neighborhoods.\nUnlike shallow methods, GCNs leverage node features and can be applied to unseen nodes/graphs in\ninductive settings.\nThe hyperboloid model of hyperbolic space. We review basic concepts of hyperbolic geometry\nthat serve as building blocks for HGCN. Hyperbolic geometry is a non-Euclidean geometry with a\nconstant negative curvature, where curvature measures how a geometric object deviates from a \ufb02at\nplane (cf. [33] for an introduction to differential geometry). Here, we work with the hyperboloid\nmodel for its simplicity and its numerical stability [30]. We review results for any constant negative\ncurvature, as this allows us to learn curvature as a model parameter, leading to more stable optimization\n(cf. Section 4.5 for more details).\nHyperboloid manifold. We \ufb01rst introduce our notation for the hyperboloid model of hyperbolic\nspace. Let (cid:104)., .(cid:105)L : Rd+1 \u00d7 Rd+1 \u2192 R denote the Minkowski inner product, (cid:104)x, y(cid:105)L := \u2212x0y0 +\n\n3\n\n\fFigure 2: HGCN neighborhood aggregation (Eq. 9) \ufb01rst maps messages/embeddings to the tangent\nspace, performs the aggregation in the tangent space, and then maps back to the hyperbolic space.\n\nTxHd,K := {v \u2208 Rd+1 : (cid:104)v, x(cid:105)L = 0}.\n\nHd,K := {x \u2208 Rd+1 : (cid:104)x, x(cid:105)L = \u2212K, x0 > 0}\n\nin hyperbolic space and we denote ||v||L =(cid:112)\n\nx1y1 + . . . + xdyd. We denote Hd,K as the hyperboloid manifold in d dimensions with constant\nnegative curvature \u22121/K (K > 0), and TxHd,K the (Euclidean) tangent space centered at point x\n(3)\nNow for v and w in TxHd,K, gK\nx (v, w) := (cid:104)v, w(cid:105)L is a Riemannian metric tensor [33] and\n(Hd,K, gK\nx ) is a Riemannian manifold with negative curvature \u22121/K. TxHd,K is a local, \ufb01rst-\norder approximation of the hyperbolic manifold at x and the restriction of the Minkowski inner\nproduct to TxHd,K is positive de\ufb01nite. TxHd,K is useful to perform Euclidean operations unde\ufb01ned\nGeodesics and induced distances. Next, we introduce the notion of geodesics and distances in\nmanifolds, which are generalizations of shortest paths in graphs or straight lines in Euclidean\ngeometry (Figure 1). Geodesics and distance functions are particularly important in graph embedding\nalgorithms, as a common optimization objective is to minimize geodesic distances between connected\nnodes. Let x \u2208 Hd,K and u \u2208 TxHd,K, and assume that u is unit-speed, i.e. (cid:104)u, u(cid:105)L = 1, then we\n(cid:17)\nhave the following result:\nProposition 3.1. Let x \u2208 Hd,K, u \u2208 TxHd,K be unit-speed. The unique unit-speed geodesic\n\u03b3x\u2192u(\u00b7) such that \u03b3x\u2192u(0) = x, \u02d9\u03b3x\u2192u(0) = u is \u03b3K\nu,\nand the intrinsic distance function between two points x, y in Hd,K is then:\n\n(cid:104)v, v(cid:105)L as the norm of v \u2208 TxHd,K.\n\nx + \u221aKsinh\n\nx\u2192u(t) = cosh\n\n(cid:16) t\u221a\n\n(cid:16) t\u221a\n\n(cid:17)\n\nK\n\nK\n\ndKL (x, y) = \u221aKarcosh(\u2212(cid:104)x, y(cid:105)L/K).\n\nx : TxHd,K \u2192 Hd,K assigns to v the point expK\n\n(4)\nExponential and logarithmic maps. Mapping between tangent space and hyperbolic space is done\nby exponential and logarithmic maps. Given x \u2208 Hd,K and a tangent vector v \u2208 TxHd,K, the\nexponential map expK\nx (v) := \u03b3(1), where \u03b3 is the\nunique geodesic satisfying \u03b3(0) = x and \u02d9\u03b3(0) = v. The logarithmic map is the reverse map that\nmaps back to the tangent space at x such that logK\nx (v)) = v. In general Riemannian manifolds,\nthese operations are only de\ufb01ned locally but in the hyperbolic space, they form a bijection between\nthe hyperbolic space and the tangent space at a point. We have the following direct expressions of\nthe exponential and the logarithmic maps, which allow us to perform operations on points on the\nhyperboloid manifold by mapping them to tangent spaces and vice-versa:\nProposition 3.2. For x \u2208 Hd,K, v \u2208 TxHd,K and y \u2208 Hd,K such that v (cid:54)= 0 and y (cid:54)= x, the\nexponential and logarithmic maps of the hyperboloid model are given by:\n\nx (expK\n\n(cid:18) ||v||L\u221a\n\n(cid:19)\n\nK\n\nexpK\n\nx (v) = cosh\n\n\u221a\n\nx +\n\nKsinh\n\n(cid:19) v\n\n(cid:18) ||v||L\u221a\n\nK\n\n||v||L\n\n,\n\nlogK\n\nx (y) = dKL (x, y)\n\ny + 1\n||y + 1\n\nK (cid:104)x, y(cid:105)Lx\nK (cid:104)x, y(cid:105)Lx||L\n\n.\n\n4 Hyperbolic Graph Convolutional Networks\n\nHere we introduce HGCN, a generalization of inductive GCNs in hyperbolic geometry that bene\ufb01ts\nfrom the expressiveness of both graph neural networks and hyperbolic embeddings. First, since input\n\n4\n\n\ffeatures are often Euclidean, we derive a mapping from Euclidean features to hyperbolic space. Next,\nwe derive two components of graph convolution: The analogs of Euclidean feature transformation\nand feature aggregation (Equations 1, 2) in the hyperboloid model. Finally, we introduce the HGCN\nalgorithm with trainable curvature.\n\n4.1 Mapping from Euclidean to hyperbolic spaces\nHGCN \ufb01rst maps input features to the hyperboloid manifold via the exp map. Let x0,E \u2208 Rd denote\ninput Euclidean features. For instance, these features could be produced by pre-trained Euclidean\nneural networks. Let o := {\u221aK, 0, . . . , 0} \u2208 Hd,K denote the north pole (origin) in Hd,K, which we\n(cid:19) x0,E\n(cid:19)\nuse as a reference point to perform tangent space operations. We have (cid:104)(0, x0,E), o(cid:105) = 0. Therefore,\nwe interpret (0, x0,E) as a point in ToHd,K and use Proposition 3.2 to map it to Hd,K with:\n||x0,E||2\n\n||x0,E||2\u221aK\n\n||x0,E||2\u221aK\n\no ((0, x0,E)) =\n\n,\u221aKsinh\n\n\u221aKcosh\n\nx0,H = expK\n\n(cid:18)\n\n(cid:18)\n\n(cid:19)\n\n(cid:18)\n\n.\n\n(5)\n\n4.2 Feature transform in hyperbolic space\n\nThe feature transform in Equation 1 is used in GCN to map the embedding space of one layer to\nthe next layer embedding space and capture large neighborhood structures. We now want to learn\ntransformations of points on the hyperboloid manifold. However, there is no notion of vector space\nstructure in hyperbolic space. We build upon Hyperbolic Neural Network (HNN) [10] and derive\ntransformations in the hyperboloid model. The main idea is to leverage the exp and log maps in\nProposition 3.2 so that we can use the tangent space ToHd,K to perform Euclidean transformations.\nHyperboloid linear transform. Linear transformation requires multiplication of the embedding\nvector by a weight matrix, followed by bias translation. To compute matrix vector multiplication, we\n\ufb01rst use the logarithmic map to project hyperbolic points xH to ToHd,K. Thus the matrix representing\nthe transform is de\ufb01ned on the tangent space, which is Euclidean and isomorphic to Rd. We then\nproject the vector in the tangent space back to the manifold using the exponential map. Let W be a\nd(cid:48)\n\n\u00d7 d weight matrix. We de\ufb01ne the hyperboloid matrix multiplication as:\n\nW \u2297K xH := expK\n\no (\u00b7) is on Hd,K and expK\n\n(6)\no (\u00b7) maps to Hd(cid:48),K. In order to perform bias addition, we use a\nwhere logK\nresult from the HNN model and de\ufb01ne b as an Euclidean vector located at ToHd,K. We then parallel\ntransport b to the tangent space of the hyperbolic point of interest and map it to the manifold. If\no\u2192xH (\u00b7) is the parallel transport from ToHd(cid:48),K to TxH Hd(cid:48),K (c.f. Appendix A for details), the\nP K\nhyperboloid bias addition is then de\ufb01ned as:\n(7)\n\no (W logK\n\no (xH )),\n\nxH (P K\n\no\u2192xH (b)).\n\nxH \u2295K b := expK\n\n4.3 Neighborhood aggregation on the hyperboloid manifold\n\nMean aggregation in Euclidean GCN computes the weighted average(cid:80)\n\nAggregation (Equation 2) is a crucial step in GCNs as it captures neighborhood structures and features.\nSuppose that xi aggregates information from its neighbors (xj)j\u2208N (i) with weights (wj)j\u2208N (i).\nj\u2208N (i) wjxj. An analog of\nmean aggregation in hyperbolic space is the Fr\u00b4echet mean [9], which, however, has no closed form\nsolution. Instead, we propose to perform aggregation in tangent spaces using hyperbolic attention.\nAttention based aggregation. Attention in GCNs learns a notion of neighbors\u2019 importance and\naggregates neighbors\u2019 messages according to their importance to the center node. However, attention\non Euclidean embeddings does not take into account the hierarchical nature of many real-world\nnetworks. Thus, we further propose hyperbolic attention-based aggregation. Given hyperbolic\nembeddings (xH\nto the tangent space of the origin to compute\nattention weights wij with concatenation and Euclidean Multi-layer Percerptron (MLP). We then\npropose a hyperbolic aggregation to average nodes\u2019 representations:\no (xH\n\nwij = SOFTMAXj\u2208N (i)(MLP(logK\n\nj ), we \ufb01rst map xH\n\ni and xH\nj\n\no (xH\n\ni , xH\n\nj )))\n\n(8)\n\nAGGK(xH )i = expK\nxH\ni\n\n(9)\n\n(cid:19)\ni )||logK\n.\n\n(xH\nj )\n\n(cid:18) (cid:88)\n\nj\u2208N (i)\n\nwijlogK\nxH\ni\n\n5\n\n\f(a) GCN layers.\n\n(b) HGCN layers.\n\n(c) GCN (left), HGCN (right).\n\nFigure 3: Visualization of embeddings for LP on DISEASE and NC on CORA (visualization on the\nPoincar\u00b4e disk for HGCN). (a) GCN embeddings in \ufb01rst and last layers for DISEASE LP hardly\ncapture hierarchy (depth indicated by color). (b) In contrast, HGCN preserves node hierarchies. (c)\nOn CORA NC, HGCN leads to better class separation (indicated by different colors).\n\nNote that our proposed aggregation is directly performed in the tangent space of each center point\ni , as this is where the Euclidean approximation is best (cf. Figure 2). We show in our ablation\nxH\nexperiments (cf. Table 2) that this local aggregation outperforms aggregation in tangent space at the\norigin (AGGo), due to the fact that relative distances have lower distortion in our approach.\nNon-linear activation with different curvatures. Analogous to Euclidean aggregation (Equation\n2), HGCN uses a non-linear activation function, \u03c3(\u00b7) such that \u03c3(0) = 0, to learn non-linear\ntransformations. Given hyperbolic curvatures \u22121/K(cid:96)\u22121,\u22121/K(cid:96) at layer (cid:96) \u2212 1 and (cid:96) respectively,\nwe introduce a hyperbolic non-linear activation \u03c3\u2297K(cid:96)\u22121 ,K(cid:96) with different curvatures. This step is\ncrucial as it allows us to smoothly vary curvature at each layer. More concretely, HGCN applies the\nEuclidean non-linear activation in ToHd,K(cid:96)\u22121 and then maps back to Hd,K(cid:96):\n\n(10)\nNote that in order to apply the exponential map, points must be located in the tangent space at the\nnorth pole. Fortunately, tangent spaces of the north pole are shared across hyperboloid manifolds of\nthe same dimension that have different curvatures, making Equation 10 mathematically correct.\n\n\u03c3\u2297K(cid:96)\u22121,K(cid:96) (xH ) = expK(cid:96)\n\no (\u03c3(logK(cid:96)\u22121\n\n(xH ))).\n\no\n\n4.4 HGCN architecture\n\nHaving introduced all the building blocks of HGCN, we now summarize the model architecture.\nGiven a graph G = (V,E) and input Euclidean features (x0,E)i\u2208V, the \ufb01rst layer of HGCN maps\nfrom Euclidean to hyperbolic space as detailed in Section 4.1. HGCN then stacks multiple hyperbolic\ngraph convolution layers. At each layer HGCN transforms and aggregates neighbour\u2019s embeddings\nin the tangent space of the center node and projects the result to a hyperbolic space with different\ncurvature. Hence the message passing in a HGCN layer is:\n\n) \u2295K(cid:96)\u22121 b(cid:96)\n\n(hyperbolic feature transform)\n(attention-based neighborhood aggregation)\n\n(11)\n(12)\n\ni = (W (cid:96) \u2297K(cid:96)\u22121 x(cid:96)\u22121,H\nh(cid:96),H\ni = AGGK(cid:96)\u22121(h(cid:96),H )i\ny(cid:96),H\ni = \u03c3\u2297K(cid:96)\u22121,K(cid:96) (y(cid:96),H\nx(cid:96),H\n)\n\ni\n\ni\n\n(13)\nwhere \u22121/K(cid:96)\u22121 and \u22121/K(cid:96) are the hyperbolic curvatures at layer (cid:96)\u22121 and (cid:96) respectively. Hyperbolic\nembeddings (xL,H )i\u2208V at the last layer can then be used to predict node attributes or links.\nFor link prediction, we use the Fermi-Dirac decoder [23, 29], a generalization of sigmoid, to compute\nprobability scores for edges:\n\n(non-linear activation with different curvatures)\n\ne(dKLL (xL,H\n\ni\n\nj\n\ni\n\nj\n\n) =\n\n, xL,H\n\np((i, j) \u2208 E|xL,H\n\n(14)\nwhere dKLL (\u00b7,\u00b7) is the hyperbolic distance and r and t are hyper-parameters. We then train HGCN by\nminimizing the cross-entropy loss using negative sampling.\nFor node classi\ufb01cation, we map the output of the last HGCN layer to the tangent space of the origin\nwith the logarithmic map logKL\no (\u00b7) and then perform Euclidean multinomial logistic regression. Note\nthat another possibility is to directly classify points on the hyperboloid manifold using the hyperbolic\nmultinomial logistic loss [10]. This method performs similarly to Euclidean classi\ufb01cation (cf. [10]\nfor an empirical comparison). Finally, we also add a link prediction regularization objective in node\nclassi\ufb01cation tasks, to encourage embeddings at the last layer to preserve the graph structure.\n\n,\n\n(cid:105)\u22121\n\n,xL,H\n\n)2\u2212r)/t + 1\n\n(cid:104)\n\n6\n\n\fDataset\nHyperbolicity \u03b4\nMethod\n\nDISEASE\n\n\u03b4 = 0\n\nLP\n\nNC\n\nDISEASE-M\n\n\u03b4 = 0\n\nHUMAN PPI\n\n\u03b4 = 1\n\nAIRPORT\n\n\u03b4 = 1\n\nPUBMED\n\u03b4 = 3.5\n\nCORA\n\u03b4 = 11\n\nLP\n\nNC\n\nLP\n\nNC\n\nLP\n\nNC\n\nLP\n-\n-\n-\n-\n\nNC\n-\n-\n-\n-\n\nLP\n-\n-\n-\n-\n\nNC\n-\n-\n-\n-\n\nHNN[10]\nGCN[21]\nGAT [41]\nSAGE [15]\nSGC [44]\n\nw EUC\n\no\nl\nl\na\nh\nS\n\nHYP [29]\nEUC-MIXED\nHYP-MIXED\n\nN MLP\nN\n\n32.5 \u00b1 1.1\n45.5 \u00b1 3.3\n35.2 \u00b1 3.4\n56.9 \u00b1 1.5\n28.8 \u00b1 2.5\n41.0 \u00b1 1.8\n69.7 \u00b1 0.4\n70.4 \u00b1 0.4\n69.1 \u00b1 0.6\n69.5 \u00b1 0.2\n74.5 \u00b1 0.9\n-13.8%\n\n59.8 \u00b1 2.0\n63.5 \u00b1 0.6\n49.6 \u00b1 1.1\n55.1 \u00b1 1.3\n72.6 \u00b1 0.6\n75.1 \u00b1 0.3\n64.7 \u00b10.5\n69.8 \u00b10.3\n65.9 \u00b1 0.3\n65.1 \u00b1 0.2\n90.8 \u00b1 0.3\n-63.1%\n\n23.8 \u00b1 0.7\n22.0 \u00b1 1.5\n46.1 \u00b1 0.4\n45.9 \u00b1 0.3\n51.5 \u00b1 1.0\n54.6 \u00b1 0.4\n81.3 \u00b1 0.3\n83.0 \u00b1 0.7\n77.9 \u00b1 2.4\n81.0 \u00b1 0.1\n79.9 \u00b1 0.2\n+18.2%\nTable 1: ROC AUC for Link Prediction (LP) and F1 score for Node Classi\ufb01cation (NC) tasks. For\ninductive datasets, we only evaluate inductive methods since shallow methods cannot generalize to\nunseen nodes/graphs. We report graph hyperbolicity values \u03b4 (lower is more hyperbolic).\n\n92.0 \u00b1 0.0\n94.5 \u00b1 0.0\n91.5 \u00b1 0.1\n93.3 \u00b1 0.0\n89.8 \u00b1 0.5\n90.8 \u00b1 0.2\n89.3 \u00b1 0.4\n90.5 \u00b1 0.3\n90.4 \u00b1 0.5\n89.8 \u00b1 0.3\n96.4 \u00b1 0.1\n-60.9%\n\n48.2 \u00b1 0.7\n68.5 \u00b1 0.3\n63.0 \u00b1 0.3\n73.9 \u00b1 0.2\n72.4 \u00b1 0.2\n69.8 \u00b1 0.4\n78.1 \u00b1 0.2\n79.0 \u00b1 0.3\n77.4 \u00b1 2.2\n78.9 \u00b1 0.0\n80.3 \u00b1 0.3\n-6.2%\n\n83.3 \u00b1 0.1\n87.5 \u00b1 0.1\n86.0 \u00b1 1.3\n83.8 \u00b1 0.3\n84.1 \u00b1 0.9\n94.9 \u00b1 0.1\n91.1 \u00b1 0.5\n91.2 \u00b1 0.1\n86.2 \u00b1 1.0\n94.1 \u00b1 0.0\n96.3 \u00b1 0.0\n-27.5%\n\n60.9 \u00b1 3.4\n70.2 \u00b1 0.1\n68.3 \u00b1 2.3\n69.6 \u00b1 0.1\n68.6 \u00b1 0.6\n80.5 \u00b1 0.5\n81.4 \u00b1 0.6\n81.5 \u00b1 0.3\n82.1 \u00b1 0.5\n80.6 \u00b1 0.1\n90.6 \u00b1 0.2\n-47.5%\n\n82.5 \u00b1 0.3\n87.6 \u00b1 0.2\n84.4 \u00b1 0.2\n85.6 \u00b1 0.5\n83.1 \u00b1 0.5\n89.0 \u00b1 0.1\n90.4 \u00b1 0.2\n93.7 \u00b1 0.1\n85.5 \u00b1 0.6\n91.5 \u00b1 0.1\n92.9 \u00b1 0.1\n+12.7%\n\n55.3\u00b10.4\n59.3 \u00b1 0.4\n69.7 \u00b1 0.3\n70.5 \u00b1 0.4\n69.1 \u00b1 0.3\n71.3 \u00b1 0.1\n74.6 \u00b1 0.3\n-11.5%\n\n55.3 \u00b1 0.5\n60.9 \u00b1 0.4\n66.0 \u00b1 0.8\n69.5 \u00b1 0.4\n67.4 \u00b1 0.5\n66.2 \u00b1 0.2\n78.1 \u00b1 0.4\n-28.2%\n\n55.9 \u00b1 0.3\n56.2 \u00b1 0.3\n59.4 \u00b1 3.4\n62.5 \u00b1 0.7\n61.3 \u00b1 0.4\n60.5 \u00b1 0.3\n72.2 \u00b1 0.5\n-25.9%\n\n67.8 \u00b1 0.2\n72.9 \u00b1 0.3\n77.0 \u00b1 0.5\n76.8 \u00b1 0.4\n78.1 \u00b1 0.6\n76.1 \u00b1 0.2\n84.5 \u00b1 0.4\n-29.2%\n\nN\nN\nG\n\nr\nu\nO\n\ns HGCN\n\n(%) ERR RED\n\n4.5 Trainable curvature\n\nWe further analyze the effect of trainable curvatures in HGCN. Theorem 4.1 (proof in Appendix\nB) shows that assuming in\ufb01nite precision, for the link prediction task, we can achieve the same\nperformance for varying curvatures with an af\ufb01ne invariant decoder by scaling embeddings.\nTheorem 4.1. For any hyperbolic curvatures \u22121/K,\u22121/K(cid:48) < 0, for any node embeddings H =\n{hi} \u2282 Hd,K of a graph G, we can \ufb01nd H(cid:48)\nK hi}, such that the\nreconstructed graph from H(cid:48) via the Fermi-Dirac decoder is the same as the reconstructed graph\nfrom H, with different decoder parameters (r, t) and (r(cid:48), t(cid:48)).\n\n, H(cid:48) = {h(cid:48)\n\n\u2282 Hd,K(cid:48)\n\n(cid:113) K(cid:48)\n\ni|h(cid:48)\n\ni =\n\nHowever, despite the same expressive power, adjusting curvature at every layer is important for good\nperformance in practice due to factors of limited machine precision and normalization. First, with\nvery low or very high curvatures, the scaling factor K(cid:48)\nK in Theorem 4.1 becomes close to 0 or very\nlarge, and limited machine precision results in large error due to rounding. This is supported by\nFigure 4 and Table 2 where adjusting and training curvature lead to signi\ufb01cant performance gain.\nSecond, the norms of hidden layers that achieve the same local minimum in training also vary by a\nfactor of \u221aK. In practice, however, optimization is much more stable when the values are normalized\n[16]. In the context of HGCN, trainable curvature provides a natural way to learn embeddings of the\nright scale at each layer, improving optimization. Figure 4 shows the effect of decreasing curvature\n(K = +\u221e is the Euclidean case) on link prediction performance.\n5 Experiments\n\nWe comprehensively evaluate our method on a variety of networks, on both node classi\ufb01cation (NC)\nand link prediction (LP) tasks, in transductive and inductive settings. We compare performance of\nHGCN against a variety of shallow and GNN-based baselines. We further use visualizations to\ninvestigate the expressiveness of HGCN in link prediction tasks, and also demonstrate its ability to\nlearn embeddings that capture the hierarchical structure of many real-world networks.\n\n5.1 Experimental setup\n\nDatasets. We use a variety of open transductive and inductive datasets that we detail below (more\ndetails in Appendix). We compute Gromovs \u03b4\u2212hyperbolicity [1, 28, 17], a notion from group theory\nthat measures how tree-like a graph is. The lower \u03b4, the more hyperbolic is the graph dataset, and\n\u03b4 = 0 for trees. We conjecture that HGCN works better on graphs with small \u03b4-hyperbolicity.\n1. Citation networks. CORA [36] and PUBMED [27] are standard benchmarks describing citation\nnetworks where nodes represent scienti\ufb01c papers, edges are citations between them, and node\nlabels are academic (sub)areas. CORA contains 2,708 machine learning papers divided into 7\nclasses while PUBMED has 19,717 publications in the area of medicine grouped in 3 classes.\n\n2. Disease propagation tree. We simulate the SIR disease spreading model [2], where the label of a\nnode is whether the node was infected or not. Based on the model, we build tree networks, where\n\n7\n\n\fMethod\nHGCN\n\nHGCN-ATTo\nHGCN-ATT\nHGCN-C\n\nHGCN-ATT-C\n\nDISEASE\n78.4 \u00b1 0.3\n80.9 \u00b1 0.4\n82.0 \u00b1 0.2\n89.1 \u00b1 0.2\n90.8 \u00b1 0.3\n\nAIRPORT\n91.8 \u00b1 0.3\n92.3 \u00b1 0.3\n92.5 \u00b1 0.2\n94.9 \u00b1 0.3\n96.4 \u00b1 0.1\n\nFigure 4: Decreasing curvature (\u22121/K) improves\nlink prediction performance on DISEASE.\n\nTable 2: ROC AUC for link prediction on AIR-\nPORT and DISEASE datasets.\n\nnode features indicate the susceptibility to the disease. We build transductive and inductive variants\nof this dataset, namely DISEASE and DISEASE-M (which contains multiple tree components).\n\n3. Protein-protein interactions (PPI) networks. PPI is a dataset of human PPI networks [37].\nEach human tissue has a PPI network, and the dataset is a union of PPI networks for human tissues.\nEach protein has a label indicating the stem cell growth rate after 19 days [40], which we use\nfor the node classi\ufb01cation task. The 16-dimensional feature for each node represents the RNA\nexpression levels of the corresponding proteins, and we perform log transform on the features.\n\n4. Flight networks. AIRPORT is a transductive dataset where nodes represent airports and edges\nrepresent the airline routes as from OpenFlights.org. Compared to previous compilations [49], our\ndataset has larger size (2,236 nodes). We also augment the graph with geographic information\n(longitude, latitude and altitude), and GDP of the country where the airport belongs to. We use the\npopulation of the country where the airport belongs to as the label for node classi\ufb01cation.\n\nBaselines. For shallow methods, we consider Euclidean embeddings (EUC) and Poincar\u00b4e embeddings\n(HYP) [29]. We conjecture that HYP will outperform EUC on hierarchical graphs. For a fair\ncomparison with HGCN which leverages node features, we also consider EUC-MIXED and HYP-\nMIXED baselines, where we concatenate the corresponding shallow embeddings with node features,\nfollowed by a MLP to predict node labels or links. For state-of-the-art Euclidean GNN models,\nwe consider GCN [21], GraphSAGE (SAGE) [15], Graph Attention Networks (GAT) [41] and\nSimpli\ufb01ed Graph Convolution (SGC) [44]3. We also consider feature-based approaches: MLP and\nits hyperbolic variant (HNN) [10], which does not utilize the graph structure.\nTraining. For all methods, we perform a hyper-parameter search on a validation set over initial\nlearning rate, weight decay, dropout4, number of layers, and activation functions. We measure\nperformance on the \ufb01nal test set over 10 random parameter initializations. For fairness, we also\ncontrol the number of dimensions to be the same (16) for all methods. We optimize all models with\nAdam [19], except Poincar\u00b4e embeddings which are optimized with RiemannianSGD [4, 48]. Further\ndetails can be found in Appendix. We open source our implementation5 of HGCN and baselines.\nEvaluation metric. In transductive LP tasks, we randomly split edges into 85/5/10% for training,\nvalidation and test sets. For transductive NC, we use 70/15/15% splits for AIRPORT, 30/10/60%\nsplits for DISEASE, and we use standard splits [21, 46] with 20 train examples per class for CORA\nand PUBMED. One of the main advantages of HGCN over related hyperbolic graph embedding is its\ninductive capability. For inductive tasks, the split is performed across graphs. All nodes/edges in\ntraining graphs are considered the training set, and the model is asked to predict node class or unseen\nlinks for test graphs. Following previous works, we evaluate link prediction by measuring area under\nthe ROC curve on the test set and evaluate node classi\ufb01cation by measuring F1 score, except for\nCORA and PUBMED, where we report accuracy as is standard in the literature.\n\n5.2 Results\n\nTable 1 reports the performance of HGCN in comparison to baseline methods. HGCN works\nbest in inductive scenarios where both node features and network topology play an important role.\n\n3The equivalent of GCN in link prediction is GAE [20]. We did not compare link prediction GNNs based on\n\nshallow embeddings such as [49] since they are not inductive.\n\n4HGCN uses DropConnect [42], as described in Appendix C.\n5Code available at http://snap.stanford.edu/hgcn. We provide HGCN implementations for hyperboloid and\nPoincar\u00b4e models. Empirically, both models give similar performance but hyperboloid model offers more stable\noptimization, because Poincar\u00b4e distance is numerically unstable [30].\n\n8\n\n\u22123\u22122\u2212101\u2212log(K)0.50.60.70.80.9ROCAUC\fFigure 5: Attention: Euclidean GAT (left), HGCN (right). Each graph represents a 2-hop neighbor-\nhood of the DISEASE-M dataset.\n\nThe performance gain of HGCN with respect to Euclidean GNN models is correlated with graph\nhyperbolicity. HGCN achieves an average of 45.4% (LP) and 12.3% (NC) error reduction compared\nto the best deep baselines for graphs with high hyperbolicity (low \u03b4), suggesting that GNNs can\nsigni\ufb01cantly bene\ufb01t from hyperbolic geometry, especially in link prediction tasks. Furthermore,\nthe performance gap between HGCN and HNN suggests that neighborhood aggregation has been\neffective in learning node representations in graphs. For example, in disease spread datasets, both\nEuclidean attention and hyperbolic geometry lead to signi\ufb01cant improvement of HGCN over other\nbaselines. This can be explained by the fact that in disease spread trees, parent nodes contaminate\ntheir children. HGCN can successfully model these asymmetric and hierarchical relationships with\nhyperbolic attention and improves performance over all baselines.\nOn the CORA dataset with low hyperbolicity, HGCN does not outperform Euclidean GNNs, sug-\ngesting that Euclidean geometry is better for its underlying graph structure. However, for small\ndimensions, HGCN is still signi\ufb01cantly more effective than GCN even with CORA. Figure 3c shows\n2-dimensional HGCN and GCN embeddings trained with LP objective, where colors denote the\nlabel class. HGCN achieves much better label class separation.\n\n5.3 Analysis\n\nAblations. We further analyze the effect of proposed components in HGCN, namely hyperbolic\nattention (ATT) and trainable curvature (C) on AIRPORT and DISEASE datasets in Table 2. We\nobserve that both attention and trainable curvature lead to performance gains over HGCN with \ufb01xed\ncurvature and no attention. Furthermore, our attention model ATT outperforms ATTo (aggregation\nin tangent space at o), and we conjecture that this is because the local Euclidean average is a better\napproximation near the center point rather than near o. Finally, the addition of both ATT and C\nimproves performance even further, suggesting that both components are important in HGCN.\nVisualizations. We \ufb01rst visualize the GCN and HGCN embeddings at the \ufb01rst and last layers in\nFigure 3. We train HGCN with 3-dimensional hyperbolic embeddings and map them to the Poincar\u00b4e\ndisk which is better for visualization. In contrast to GCN, tree structure is preserved in HGCN,\nwhere nodes close to the center are higher in the hierarchy of the tree. This way HGCN smoothly\ntransforms Euclidean features to Hyperbolic embeddings that preserve node hierarchy.\nFigure 5 shows the attention weights in the 2-hop neighborhood of a center node (red) for the\nDISEASE dataset. The red node is the node where we compute attention. The darkness of the color\nfor other nodes denotes their hierarchy. The attention weights for nodes in the neighborhood are\nvisualized by the intensity of edges. We observe that in HGCN the center node pays more attention\nto its (grand)parent. In contrast to Euclidean GAT, our aggregation with attention in hyperbolic\nspace allows us to pay more attention to nodes with high hierarchy. Such attention is crucial to good\nperformance in DISEASE, because only sick parents will propagate the disease to their children.\n\n6 Conclusion\n\nWe introduced HGCN, a novel architecture that learns hyperbolic embeddings using graph convolu-\ntional networks. In HGCN, the Euclidean input features are successively mapped to embeddings in\nhyperbolic spaces with trainable curvatures at every layer. HGCN achieves new state-of-the-art in\nlearning embeddings for real-world hierarchical and scale-free graphs.\n\n9\n\n\fAcknowledgments\n\nJure Leskovec is a Chan Zuckerberg Biohub investigator. This research has been supported in part by\nDARPA under FA865018C7880 (ASED), (MSC); NIH under No. U54EB020405 (Mobilize); ARO\nunder MURI; IARPA under No. 2017-17071900005 (HFC), NSF under No. OAC-1835598 (CINES);\nStanford Data Science Initiative, Chan Zuckerberg Biohub, JD.com, Amazon, Boeing, Docomo,\nHuawei, Hitachi, Observe, Siemens, and UST Global. We gratefully acknowledge the support of\nDARPA under Nos. FA87501720095 (D3M), FA86501827865 (SDH), and FA86501827882 (ASED);\nNIH under No. U54EB020405 (Mobilize), NSF under Nos. CCF1763315 (Beyond Sparsity),\nCCF1563078 (Volume to Velocity), and 1937301 (RTML); ONR under No. N000141712266\n(Unifying Weak Supervision); the Moore Foundation, NXP, Xilinx, LETI-CEA, Intel, IBM, Microsoft,\nNEC, Toshiba, TSMC, ARM, Hitachi, BASF, Accenture, Ericsson, Qualcomm, Analog Devices, the\nOkawa Foundation, American Family Insurance, Google Cloud, Swiss Re, TOTAL, and members\nof the Stanford DAWN project: Teradata, Facebook, Google, Ant Financial, NEC, VMWare, and\nInfosys. The U.S. Government is authorized to reproduce and distribute reprints for Governmental\npurposes notwithstanding any copyright notation thereon. Any opinions, \ufb01ndings, and conclusions or\nrecommendations expressed in this material are those of the authors and do not necessarily re\ufb02ect the\nviews, policies, or endorsements, either expressed or implied, of DARPA, NIH, ONR, or the U.S.\nGovernment.\n\nReferences\n[1] Aaron B Adcock, Blair D Sullivan, and Michael W Mahoney. Tree-like structure in large social\nand information networks. In 2013 IEEE 13th International Conference on Data Mining, pages\n1\u201310. IEEE, 2013.\n\n[2] Roy M Anderson and Robert M May. Infectious diseases of humans: dynamics and control.\n\nOxford university press, 1992.\n\n[3] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding\nand clustering. In Advances in neural information processing systems, pages 585\u2013591, 2002.\n[4] Silvere Bonnabel. Stochastic gradient descent on riemannian manifolds. IEEE Transactions on\n\nAutomatic Control, 2013.\n\n[5] Benjamin Paul Chamberlain, James Clough, and Marc Peter Deisenroth. Neural embeddings of\n\ngraphs in hyperbolic space. arXiv preprint arXiv:1705.10359, 2017.\n\n[6] Wei Chen, Wenjie Fang, Guangda Hu, and Michael W Mahoney. On the hyperbolicity of\n\nsmall-world and treelike random graphs. Internet Mathematics, 9(4):434\u2013491, 2013.\n\n[7] Aaron Clauset, Cristopher Moore, and Mark EJ Newman. Hierarchical structure and the\n\nprediction of missing links in networks. Nature, 453(7191):98, 2008.\n\n[8] Bhuwan Dhingra, Christopher J Shallue, Mohammad Norouzi, Andrew M Dai, and George E\n\nDahl. Embedding text in hyperbolic spaces. NAACL HLT, 2018.\n\n[9] Maurice Fr\u00b4echet. Les \u00b4el\u00b4ements al\u00b4eatoires de nature quelconque dans un espace distanci\u00b4e. In\n\nAnnales de l\u2019institut Henri Poincar\u00b4e, 1948.\n\n[10] Octavian Ganea, Gary B\u00b4ecigneul, and Thomas Hofmann. Hyperbolic neural networks. In\n\nAdvances in neural information processing systems, pages 5345\u20135355, 2018.\n\n[11] Octavian-Eugen Ganea, Gary Becigneul, and Thomas Hofmann. Hyperbolic entailment cones\nfor learning hierarchical embeddings. In International Conference on Machine Learning, pages\n1632\u20131641, 2018.\n\n[12] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks.\n\nIn\nProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and\ndata mining, pages 855\u2013864. ACM, 2016.\n\n[13] Albert Gu, Frederic Sala, Beliz Gunel, and Christopher R\u00b4e. Learning mixed-curvature rep-\nIn International Conference on Learning Representations,\n\nresentations in product spaces.\n2019.\n\n10\n\n\f[14] Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz\nHermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, et al. Hyperbolic\nattention networks. In International Conference on Learning Representations, 2019.\n\n[15] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large\n\ngraphs. In Advances in Neural Information Processing Systems, pages 1024\u20131034, 2017.\n\n[16] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training\nby reducing internal covariate shift. In International Conference on Machine Learning, pages\n448\u2013456, 2015.\n\n[17] Edmond Jonckheere, Poonsuk Lohsoonthorn, and Francis Bonahon. Scaled gromov hyperbolic\n\ngraphs. Journal of Graph Theory, 2008.\n\n[18] Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, and Victor Lem-\n\npitsky. Hyperbolic image embeddings. arXiv preprint arXiv:1904.02239, 2019.\n\n[19] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Interna-\n\ntional Conference on Learning Representations, 2015.\n\n[20] Thomas N Kipf and Max Welling. Variational graph auto-encoders. NeurIPS Workshop on\n\nBayesian Deep Learning, 2016.\n\n[21] Thomas N Kipf and Max Welling. Semi-supervised classi\ufb01cation with graph convolutional\n\nnetworks. In International Conference on Learning Representations, 2017.\n\n[22] Robert Kleinberg. Geographic routing using hyperbolic space. In IEEE International Conference\n\non Computer Communications, 2007.\n\n[23] Dmitri Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Mari\u00b4an Bogun\u00b4a.\n\nHyperbolic geometry of complex networks. Physical Review E, 2010.\n\n[24] Joseph B Kruskal. Multidimensional scaling by optimizing goodness of \ufb01t to a nonmetric\n\nhypothesis. Psychometrika, 1964.\n\n[25] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural\n\nnetworks. In International Conference on Learning Representations, 2016.\n\n[26] Qi Liu, Maximilian Nickel, and Douwe Kiela. Hyperbolic graph neural networks. In Advances\n\nin Neural Information Processing Systems, 2019.\n\n[27] Galileo Namata, Ben London, Lise Getoor, Bert Huang, and UMD EDU. Query-driven active\n\nsurveying for collective classi\ufb01cation. 2012.\n\n[28] Onuttom Narayan and Iraj Saniee. Large-scale curvature of networks. Physical Review E,\n\n84(6):066108, 2011.\n\n[29] Maximillian Nickel and Douwe Kiela. Poincar\u00b4e embeddings for learning hierarchical represen-\n\ntations. In Advances in neural information processing systems, pages 6338\u20136347, 2017.\n\n[30] Maximillian Nickel and Douwe Kiela. Learning continuous hierarchies in the lorentz model\nof hyperbolic geometry. In International Conference on Machine Learning, pages 3776\u20133785,\n2018.\n\n[31] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social repre-\nsentations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge\ndiscovery and data mining, pages 701\u2013710. ACM, 2014.\n\n[32] Erzs\u00b4ebet Ravasz and Albert-L\u00b4aszl\u00b4o Barab\u00b4asi. Hierarchical organization in complex networks.\n\nPhysical review E, 2003.\n\n[33] Joel W Robbin and Dietmar A Salamon. Introduction to differential geometry. ETH, Lecture\n\nNotes, preliminary version, 2011.\n\n11\n\n\f[34] Frederic Sala, Chris De Sa, Albert Gu, and Christopher R\u00b4e. Representation tradeoffs for\nhyperbolic embeddings. In International Conference on Machine Learning, pages 4457\u20134466,\n2018.\n\n[35] Rik Sarkar. Low distortion delaunay embedding of trees in hyperbolic plane. In International\n\nSymposium on Graph Drawing, 2011.\n\n[36] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-\n\nRad. Collective classi\ufb01cation in network data. AI magazine, 2008.\n\n[37] Damian Szklarczyk, John H Morris, Helen Cook, Michael Kuhn, Stefan Wyder, Milan Si-\nmonovic, Alberto Santos, Nadezhda T Doncheva, Alexander Roth, Peer Bork, et al. The\nstring database in 2017: quality-controlled protein\u2013protein association networks, made broadly\naccessible. Nucleic acids research, 2016.\n\n[38] Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. Hyperbolic representation learning for fast\nand ef\ufb01cient neural question answering. In Proceedings of the Eleventh ACM International\nConference on Web Search and Data Mining, pages 583\u2013591. ACM, 2018.\n\n[39] Alexandru Tifrea, Gary Becigneul, and Octavian-Eugen Ganea. Poincar\u00b4e glove: Hyperbolic\n\nword embeddings. In International Conference on Learning Representations, 2019.\n\n[40] Joyce van de Leemput, Nathan C Boles, Thomas R Kiehl, Barbara Corneo, Patty Lederman,\nVilas Menon, Changkyu Lee, Refugio A Martinez, Boaz P Levi, Carol L Thompson, et al.\nCortecon: a temporal transcriptome analysis of in vitro human cerebral cortex development\nfrom human embryonic stem cells. Neuron, 2014.\n\n[41] Petar Veli\u02c7ckovi\u00b4c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua\nBengio. Graph attention networks. In International Conference on Learning Representations,\n2018.\n\n[42] Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. Regularization of\nneural networks using dropconnect. In International conference on machine learning, pages\n1058\u20131066, 2013.\n\n[43] Richard C Wilson, Edwin R Hancock, El\u02d9zbieta Pekalska, and Robert PW Duin. Spherical and\nhyperbolic embeddings of data. IEEE transactions on pattern analysis and machine intelligence,\n36(11):2255\u20132269, 2014.\n\n[44] Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger.\nSimplifying graph convolutional networks. In International Conference on Machine Learning,\npages 6861\u20136871, 2019.\n\n[45] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural\n\nnetworks? In International Conference on Learning Representations, 2019.\n\n[46] Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Revisiting semi-supervised learning\nwith graph embeddings. In International Conference on Machine Learning, pages 40\u201348, 2016.\n\n[47] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure\nIn\nLeskovec. Graph convolutional neural networks for web-scale recommender systems.\nProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &\nData Mining, pages 974\u2013983. ACM, 2018.\n\n[48] Hongyi Zhang, Sashank J Reddi, and Suvrit Sra. Riemannian svrg: Fast stochastic optimization\nIn Advances in Neural Information Processing Systems, pages\n\non riemannian manifolds.\n4592\u20134600, 2016.\n\n[49] Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. In Advances\n\nin Neural Information Processing Systems, pages 5165\u20135175, 2018.\n\n[50] Marinka Zitnik, Marcus W Feldman, Jure Leskovec, et al. Evolution of resilience in pro-\ntein interactomes across the tree of life. Proceedings of the National Academy of Sciences,\n116(10):4426\u20134433, 2019.\n\n12\n\n\f", "award": [], "sourceid": 2699, "authors": [{"given_name": "Ines", "family_name": "Chami", "institution": "Stanford University"}, {"given_name": "Zhitao", "family_name": "Ying", "institution": "Stanford University"}, {"given_name": "Christopher", "family_name": "R\u00e9", "institution": "Stanford"}, {"given_name": "Jure", "family_name": "Leskovec", "institution": "Stanford University and Pinterest"}]}