{"title": "HyperGCN: A New Method For Training Graph Convolutional Networks on Hypergraphs", "book": "Advances in Neural Information Processing Systems", "page_first": 1511, "page_last": 1522, "abstract": "In many real-world network datasets such as co-authorship, co-citation, email communication, etc., relationships are complex and go beyond pairwise. Hypergraphs provide a flexible and natural modeling tool to model such complex relationships. The obvious existence of such complex relationships in many real-world networks naturaly motivates the problem of learning with hypergraphs. A popular learning paradigm is hypergraph-based semi-supervised learning (SSL) where the goal is to assign labels to initially unlabeled vertices in a hypergraph. Motivated by the fact that a graph convolutional network (GCN) has been effective for graph-based SSL, we propose HyperGCN, a novel GCN for SSL on attributed hypergraphs. Additionally, we show how HyperGCN can be used as a learning-based approach for combinatorial optimisation on NP-hard hypergraph problems. We demonstrate HyperGCN's effectiveness through detailed experimentation on real-world hypergraphs. We have made HyperGCN's source code available to foster reproducible research.", "full_text": "HyperGCN: A New Method of Training Graph\n\nConvolutional Networks on Hypergraphs\n\nNaganand Yadati\n\nIndian Institute of Science, Bangalore\n\nMadhav Nimishakavi\n\nIndian Institute of Science, Bangalore\n\ny.naganand@gmail.com\n\ncse.madhav@gmail.com\n\nPrateek Yadav\n\nIndian Institute of Science, Bangalore\n\nVikram Nitin \u2217\n\nBirla Institute of Technology and Science, Pilani\n\nugprateek@gmail.com\n\nvikramnitin9@gmail.com\n\nAnand Louis\n\nIndian Institute of Science, Bangalore\n\nPartha Talukdar\n\nIndian Institute of Science, Bangalore\n\nanandl@iisc.ac.in\n\npartha@talukdar.net\n\nAbstract\n\nIn many real-world networks such as co-authorship, co-citation, etc., relationships\nare complex and go beyond pairwise associations. Hypergraphs provide a \ufb02exible\nand natural modeling tool to model such complex relationships. The obvious\nexistence of such complex relationships in many real-world networks naturally\nmotivates the problem of learning with hypergraphs. A popular learning paradigm\nis hypergraph-based semi-supervised learning (SSL) where the goal is to assign\nlabels to initially unlabelled vertices in a hypergraph. Motivated by the fact that\na graph convolutional network (GCN) has been effective for graph-based SSL,\nwe propose HyperGCN, a novel way of training a GCN for SSL on hypergraphs\nbased on tools from sepctral theory of hypergraphs. We demonstrate HyperGCN\u2019s\neffectiveness through detailed experimentation on real-world hypergraphs for SSL\nand combinatorial optimisation and analyse when it is going to be more effective\nthan state-of-the art baselines. We have made the source code available.\n\n1\n\nIntroduction\n\nIn many real-world network datasets such as co-authorship, co-citation, email communication, etc.,\nrelationships are complex and go beyond pairwise associations. Hypergraphs provide a \ufb02exible and\nnatural modeling tool to model such complex relationships. For example, in a co-authorship network\nan author (hyperedge) can be a co-author of more than two documents (vertices).\nThe obvious existence of such complex relationships in many real-world networks naturaly motivates\nthe problem of learning with hypergraphs [52, 22, 49, 17]. A popular learning paradigm is graph-\nbased / hypergraph-based semi-supervised learning (SSL) where the goal is to assign labels to initially\nunlabelled vertices in a graph / hypergraph [10, 54, 42]. While many techniques have used explicit\nLaplacian regularisation in the objective [51, 53, 11, 48], the state-of-the-art neural methods encode\nthe graph / hypergraph structure G = (V, E) implicitly via a neural network f (G, X)[25, 3, 17] (X\ncontains the initial features on the vertices for example, text attributes for documents).\n\n\u2217Work done while at Indian Institute of Science, Bangalore\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fModel\u2193 Metric \u2192 Training time Density Training time (DBLP) Training time (Pubmed)\n\nHGNN\n\nFastHyperGCN\n\n170s\n143s\n\n337\n352\n\nTable 1: average training time of an epoch (lower is better)\n\n0.115s\n0.035s\n\n0.019s\n0.016s\n\nWhile explicit Laplacian regularisation assumes similarity among vertices in each edge / hyperedge,\nimplicit regularisation of graph convolutional networks (GCNs) [25] avoids this restriction and\nenables application to a broader range of problems in combinatorial optimisation [19, 26, 38, 31],\ncomputer vision [12, 37], natural language processing [44, 34], etc. In this work, we propose,\nHyperGCN, a novel training scheme for a GCN on hypergraph and show its effectiveness not only in\nSSL where hyperedges encode similarity but also in combinatorial optimisation where hyperedges do\nnot encode similarity. Combinatorial optimisation on hypergraphs has recently been highlighted as\ncrucial for real-world network analysis [2, 36].\nMethodologically, HyperGCN approximates each hyperedge of the hypergraph by a set of pairwise\nedges connecting the vertices of the hyperedge and treats the learning problem as a graph learning\nproblem on the approximation. While the state-of-the-art hypergraph neural networks (HGNN) [17]\napproximates each hyperedge by a clique and hence requires sC2 (quadratic number of) edges for\neach hyperedge of size s, our method, i.e. HyperGCN, requires a linear number of edges (i.e. O(s))\nfor each hyperedge. The advantage of this linear approximation is evident in Table 1 where a faster\nvariant of our method has lower training time on synthetic data (with higher density as well) for\ndensest k-subhypergraph and SSL on real-world hypergraphs (DBLP and Pubmed). In summary, we\nmake the following contributions:\n\n\u2022 We propose HyperGCN, a new method of training a GCN on hypergraph using tools from\nspectral theory of hypergraphs and introduce FastHyperGCN, its faster variant (Section 4).\n\u2022 We apply our methods to the problems of SSL and combinatorial optimisation on real-\nworld hypergraphs. Through detailed experimentation, we demonstrate their effectiveness\ncompared to the state-of-the art HGNN [17] and other baselines (Sections 5, and 7).\n\u2022 We thoroughly discuss when we prefer our methods to HGNN (Sections 6, and 8).\n\nWhile the motivation of our methods is based on similarity of vertices in a hyperedge, we show they\ncan be effectively used for combinatorial optimisation where hyperedges do not encode similarity.\n\n2 Related work\nIn this section, we discuss related work and then the background in the next section.\nDeep learning on graphs: Geometric deep learning [5] is an umbrella phrase for emerging tech-\nniques attempting to generalise (structured) deep neural network models to non-Euclidean domains\nsuch as graphs and manifolds. Graph convolutional network (GCN) [25] de\ufb01nes the convolution\nusing a simple linear function of the graph Laplacian and is shown to be effective on semi-supervised\nclassi\ufb01cation on attributed graphs. The reader is referred to a comprehensive literature review [5] and\nextensive surveys [20, 4] on this topic of deep learning on graphs.\nLearning on hypergraphs: The clique expansion of a hypergraph was introduced in a seminal work\n[52] and has become a popular approach for learning on hypergraph-structured data [39, 16, 50, 43].\nHypergraph neural networks [17] and their variants [23, 24] use the clique expansion to extend GCNs\nfor hypergraphs. Powerset convolutional networks [47] utilise tools from signal processing to de\ufb01ne\nconvolution on set functions. Another line of work uses mathematically appealing tensor methods\n[40, 6], but they are limited to uniform hypergraphs.\nSpectral Theory of Hypergraphs: The clique expansion of a hypergraph essentially models it as\na graph by converting each hyperedge to a clique subgraph [1]. It has been well established that\nthis approximation causes distortion, fails to utilise higher-order relationships in the data and leads\nto unreliable learning performance for clustering, SSL, active learning, etc. [28, 13]. A simple yet\neffective way to overcome the limitations is to introduce hyperedge-dependent vertex weights [14].\nResearchers have fully utilised the hypergraph structure also through non-linear Laplacian operators\n[22, 32, 8]. It has been shown that these operators enable Cheeger-type inequality for hypergraphs,\n\n2\n\n\frelating the second smallest eigenvalue of the operator to hypergraph expansion [32, 8]. One such\nLaplacian is derived from the notion of total variation on hypergraphs (Lovasz extension of the\nhypergraph cut) which considers the maximally disparate vertices in each hyperedge [22]. Recent\ndevelopements have extended these non-linear operators to several different settings:\n\n\u2022 directed hypergraphs (idea is to consider supremum in tail, in\ufb01mum in head) [49, 9].\n\u2022 submodular hypergraphs (different submodular weights for different hyperedge cuts) [30]\n\nand submodular function minimisation (generalises hypergraph SSL objective) [27, 29].\n\n\u2022 Laplacian that considers all vertices in each hyperedge (includes the vertices other than the\n\nmaximally disparate ones in each hyperedge) [7].\n\nGraph-based SSL: Researchers have shown that using unlabelled data in training can improve\nlearning accuracy signi\ufb01cantly. This topic is so popular that it has in\ufb02uential books [10, 54, 42].\nGraph neural networks for combinatorial optimisation: Graph-based deep models have recently\nbeen shown to be effective as learning-based approaches for NP-hard problems such as maximal\nindependent set, minimum vertex cover, etc. [31], the decision version of the traveling salesman\nproblem [38], graph colouring [26], and clique optimisation [19].\n\n3 Background: Graph convolutional network\nLet G = (V,E), with N = |V|, be a simple undirected graph with adjacency A \u2208 RN\u00d7N , and data\nmatrix X \u2208 RN\u00d7p. which has p-dimensional real-valued vector representations for each node v \u2208 V.\nThe basic formulation of graph convolution [25] stems from the convolution theorem [33] and it can\nbe shown that the convolution C of a real-valued graph signal S \u2208 RN and a \ufb01lter signal F \u2208 RN is\napproximately C \u2248 (w0 + w1 \u02dcL)S where w0 and w1 are learned weights, and \u02dcL = 2L\n\u2212 I is the\nscaled graph Laplacian, \u03bbN is the largest eigenvalue of the symmetrically-normalised graph Laplacian\n2 where D = diag(d1,\u00b7\u00b7\u00b7 , dN ) is the diagonal degree matrix with elements\nL = I \u2212 D\u2212 1\nj=1,j(cid:54)=i Aji. The \ufb01lter F depends on the structure of the graph (the graph Laplacian L). The\ndetailed derivation from the convolution theorem uses existing tools from graph signal processing\n[41, 21, 5] and is provided in the supplementary material. The key point here is that the convolution\nof two graph signals is a linear function of the graph Laplacian L.\n\ndi =(cid:80)N\n\n2 AD\u2212 1\n\n\u03bbN\n\nTable 2: Summary of symbols used in the paper.\n\nSymbol\nG = (V,E)\nV\nE\nN = |V|\nL\nA\n\nSymbol\n\nDescription\nan undirected simple graph H = (V, E)\nset of nodes\nset of edges\nnumber of nodes\ngraph Laplacian\ngraph adjacency matrix\n\nV\nE\nn = |V |\nL\nH\n\nDescription\nan undirected hypergraph\nset of hypernodes\nset of hyperedges\nnumber of hypernodes\nhypergraph Laplacian\nhypergraph incidence matrix\n\nThe graph convolution for p different graph signals contained in the data matrix X \u2208 RN\u00d7p\nwith learned weights \u0398 \u2208 Rp\u00d7r with r hidden units is \u00afAX\u0398 ,\n2 , \u02dcA = A +\n\u02dcAij. The proof involves a renormalisation trick [25] and is in the supplementary.\n\nI, and \u02dcDii =(cid:80)N\n\n\u00afA = \u02dcD\u2212 1\n\n2 \u02dcA \u02dcD\u2212 1\n\nj=1\n\nGCN [25] The forward model for a simple two-layer GCN takes the following simple form:\n\n(cid:32)\n\n(cid:18)\n\n(cid:19)\n\n(cid:33)\n\nZ = fGCN (X, A) = softmax\n\n\u00afA ReLU\n\n\u00afAX\u0398(1)\n\n\u0398(2)\n\n,\n\n(1)\n\nwhere \u0398(1) \u2208 Rp\u00d7h is an input-to-hidden weight matrix for a hidden layer with h hidden units and\n\u0398(2) \u2208 Rh\u00d7r is a hidden-to-output weight matrix. The softmax activation function is de\ufb01ned as\nsoftmax(xi) = exp(xi)\n\n(cid:80)\nj exp(xj ) and applied row-wise.\n\n3\n\n\fFigure 1: Graph convolution on a hypernode v using HyperGCN.\n\nGCN training for SSL: For multi-class, classi\ufb01cation with q classes, we minimise cross-entropy,\n\nL = \u2212(cid:88)\n\nq(cid:88)\n\nYij ln Zij,\n\n(2)\n\ni\u2208VL\n\nj=1\n\nover the set of labelled examples VL. Weights \u0398(1) and \u0398(2) are trained using gradient descent.\nA summary of the notations used throughout our work is shown in Table 2.\n\nTherefore, (cid:80)\n\n4 HyperGCN: Hypergraph Convolutional Network\nWe consider semi-supervised hypernode classi\ufb01cation on an undirected hypergraph H = (V, E) with\n|V | = n, |E| = m and a small set VL of labelled hypernodes. Each hypernode v \u2208 V = {1,\u00b7\u00b7\u00b7 , n}\nis also associated with a feature vector xv \u2208 Rp of dimension p given by X \u2208 Rn\u00d7p. The task is to\npredict the labels of all the unlabelled hypernodes, that is, all the hypernodes in the set V \\ VL.\nOverview: The crucial working principle here is that the hypernodes in the same hyperedge are\nsimilar and hence are likely to share the same label [49]. Suppose we use {hv : v \u2208 V } to denote\nsome representation of the hypernodes in V , then, for any e \u2208 E, the function maxi,j\u2208e ||hi \u2212 hj||2\nwill be \u201csmall\u201d only if vectors corresponding to the hypernodes in e are \u201cclose\u201d to each other.\ne\u2208E maxi,j\u2208e ||hi \u2212 hj||2 as a regulariser is likely to achieve the objective of the\nhypernodes in the same hyperedge having similar representations. However, instead of using it as\nan explicit regulariser, we can achieve the same goal by using GCN over an appropriately de\ufb01ned\nLaplacian of the hypergraph. In other words, we use the notion of hypergraph Laplacian as an\nimplicit regulariser which achieves this objective.\nA hypergraph Laplacian with the same underlying motivation as stated above was proposed in prior\nworks [8, 32]. We present this Laplacian \ufb01rst. Then we run GCN over the simple graph associated\nwith this hypergraph Laplacian. We call the resulting method 1-HyperGCN (as each hyperedge is\napproximated by exactly one pairwise edge). One epoch of 1-HyperGCN is shown in \ufb01gure 1\n\n4.1 Hypergraph Laplacian\nAs explained before, the key element for a GCN is the graph Laplacian of the given graph G. Thus,\nin order to develop a GCN-based SSL method for hypergraphs, we \ufb01rst need to de\ufb01ne a Laplacian for\nhypergraphs. One such way [8] (see also [32]) is a non-linear function L : Rn \u2192 Rn (the Laplacian\nmatrix for graphs can be viewed as a linear function L : Rn \u2192 Rn).\nDe\ufb01nition 1 (Hypergraph Laplacian [8, 32]2) Given a real-valued signal S \u2208 Rn de\ufb01ned on the\nhypernodes, L(S) is computed as follows.\n\n1. For each hyperedge e \u2208 E, let (ie, je) := argmaxi,j\u2208e|Si \u2212 Sj|, breaking ties randomly2.\n2The problem of breaking ties in choosing ie (resp. je) is a non-trivial problem as shown in [8]. Breaking\nties randomly was proposed in [32], but [8] showed that this might not work for all applications (see [8] for more\ndetails). [8] gave a way to break ties, and gave a proof of correctness for their tie-breaking rule for the problems\nthey studied. We chose to break ties randomly because of its simplicity and its ef\ufb01ciency.\n\n4\n\n\fFigure 2: Hypergraph Laplacian [8] vs. the generalised hypergraph Laplacian with mediators [7].\nOur approach requires at most a linear number of edges (1 and 2|e| \u2212 3 respectively) while HGNN\n[17] requires a quadratic number of edges for each hyperedge.\n\n2. A weighted graph GS on the vertex set V is constructed by adding edges {{ie, je} : e \u2208 E}\nwith weights w({ie, je}) := w(e) to GS, where w(e) is the weight of the hyperedge e. Let\nAS denote the weighted adjacency matrix of the graph GS.\n\n3. The symmetrically normalised hypergraph Laplacian is L(S) := (I \u2212 D\u2212 1\n\n2 ASD\u2212 1\n\n2 )S\n\n4.2\n\n1-HyperGCN\n\nBy following the Laplacian construction steps outlined in Section 4.1, we end up with the simple\ngraph GS with normalized adjacency matrix \u00afAS. We now perform GCN over this simple graph GS.\nThe graph convolution operation in Equation (1), when applied to a hypernode v \u2208 V in GS, in\nS ]v,u \u00b7 h(\u03c4 )\nthe neural message-passing framework [18] is h(\u03c4 +1)\n.\nu )\n\n(cid:18)\n(\u0398(\u03c4 ))T(cid:80)\n\nu\u2208N (v)([ \u00afA(\u03c4 )\n\n(cid:19)\n\n= \u03c3\n\nv\n\nv\n\nHere, \u03c4 is epoch number, h(\u03c4 +1)\nis the new hidden layer representation of node v, \u03c3 is a non-linear\nactivation function, \u0398 is a matrix of learned weights, N (u) is the set of neighbours of v, [ \u00afA(\u03c4 )\nS ]v,u is\nthe weight on the edge {v, u} after normalisation, and h(\u03c4 )\nu is the previous hidden layer representation\nof the neighbour u. We note that along with the embeddings of the hypernodes, the adjacency matrix\nis also re-estimated in each epoch.\nFigure 1 shows a hypernode v with \ufb01ve hyperedges incident on it. We consider exactly one\nrepresentative simple edge for each hyperedge e \u2208 E given by (ie, je) where (ie, je) =\n)||2 for epoch \u03c4. Because of this consideration, the hypern-\narg maxi,j\u2208e ||(\u0398(\u03c4 ))T (h(\u03c4 )\node v may not be a part of all representative simple edges (only three shown in \ufb01gure). We then use\ntraditional Graph Convolution Operation on v considering only the simple edges incident on it. Note\nthat we apply the operation on each hypernode v \u2208 V in each epoch \u03c4 of training until convergence.\nConnection to total variation on hypergraphs: Our 1-HyperGCN model can be seen as performing\nimplicit regularisation based on the total variation on hypergraphs [22]. In that prior work, explicit\nregularisation and only the hypergraph structure is used for hypernode classi\ufb01cation in the SSL\nsetting. HyperGCN, on the other hand, can use both the hypergraph structure and also exploit any\navailable features on the hypernodes, e.g., text attributes for documents.\n\ni \u2212 h(\u03c4 )\n\nj\n\n4.3 HyperGCN: Enhancing 1-HyperGCN with mediators\n\nOne peculiar aspect of the hypergraph Laplacian discussed is that each hyperedge e is represented\nby a single pairwise simple edge {ie, je} (with this simple edge potentially changing from epoch to\nepoch). This hypergraph Laplacian ignores the hypernodes in Ke := {k \u2208 e : k (cid:54)= ie, k (cid:54)= je} in\nthe given epoch. Recently, it has been shown that a generalised hypergraph Laplacian in which the\nhypernodes in Ke act as \u201cmediators\" [7] satis\ufb01es all the properties satis\ufb01ed by the above Laplacian\ngiven by [8]. The two Laplacians are pictorially compared in Figure 2. Note that if the hyperedge\nis of size 2, we connect ie and je with an edge. We also run a GCN on the simple graph associated\nwith the hypergraph Laplacian with mediators [7] (right in Figure 2). It has been suggested that the\n\n5\n\n\fTable 3: Real-world hypergraph datasets used in our work. Distribution of hyperedge sizes is not\nsymmetric either side of the mean and has a strong positive skewness.\n\n# hypernodes, |V |\n# hyperedges, |E|\navg. hyperedge size\n# features, d\n# classes, q\nlabel rate, |VL|/|V |\n\nDBLP\n(co-authorship)\n43413\n22535\n4.7 \u00b1 6.1\n1425\n6\n0.040\n\nPubmed\n(co-citation)\n19717\n7963\n4.3 \u00b1 5.7\n500\n3\n0.008\n\nCora\n(co-authorship)\n2708\n1072\n4.2 \u00b1 4.1\n1433\n7\n0.052\n\nCora\n(co-citation)\n2708\n1579\n3.0 \u00b1 1.1\n1433\n7\n0.052\n\nCiteseer\n(co-citation)\n3312\n1079\n3.2 \u00b1 2.0\n3703\n6\n0.042\n\nweights on the edges for each hyperedge in the hypergraph Laplacian (with mediators) sum to 1 [7].\nWe chose each weight to be\n\n2|e|\u22123 as there are 2|e| \u2212 3 edges for a hyperedge e.\n\n1\n\n4.4 FastHyperGCN\n\nWe use just the initial features X (without the weights) to construct the hypergraph Laplacian matrix\n(with mediators) and we call this method FastHyperGCN. Because the matrix is computed only once\nbefore training (and not in each epoch), the training time of FastHyperGCN is much less than that of\nother methods. Please see the supplementrary for all the algorithms.\n\n5 Experiments for semi-supervised learning\n\nWe conducted experiments not only on real-world datasets but also on categorical data (results in\nsupplementary) which are a standard practice in hypergraph-based learning [52, 22, 49, 30, 29, 27].\n\n5.1 Baselines\n\nWe compared HyperGCN, 1-HyperGCN and FastHyperGCN against the following baselines:\n\nmate the hypergraph. Each hyperedge of size s is approximated by an s-clique.\n\n\u2022 Hypergraph neural networks (HGNN) [17] uses the clique expansion [52, 1] to approxi-\n\u2022 Multi-layer perceptron (MLP) treats each instance (hypernode) as an independent and\nidentically distributed (i.i.d) instance. In other words, A = I in equation 1. We note that\nthis baseline does not use the hypergraph structure to make predictions.\n\u2022 Multi-layer perceptron + explicit hypergraph Laplacian regularisation (MLP + HLR):\nregularises the MLP by training it with the loss given by L = L0 + \u03bbLreg and uses the\nhypergraph Laplacian with mediators for explicit Laplacian regularisation Lreg. We used\n10% of the test set used for all the above models for this baseline to get an optimal \u03bb.\n\u2022 Con\ufb01dence Interval-based method (CI) [49] uses a subgradient-based method [49]. We\nnote that this method has consistently been shown to be superior to the primal dual hybrid\ngradient (PDHG) of [22] and also [52]. Hence, we did not use these other previous methods\nas baselines, and directly compared HyperGCN against CI.\n\nThe task for each dataset is to predict the topic to which a document belongs (multi-class classi\ufb01cation).\nStatistics are summarised in Table 3. For more details about datasets, please refer to the supplementary.\nWe trained all methods for 200 epochs and used the same hyperparameters of a prior work [25]. We\nreport the mean test error and standard deviation over 100 different train-test splits. We sampled sets\nof same sizes of labelled hypernodes from each class to have a balanced train split.\n\n6 Analysis of results\n\nThe results on real-world datasets are shown in Table 4. Firstly we note that HyperGCN is superior\nto 1-HyperGCN. This is expected as all the vertices in a hyperedge participate in the hypergraph\nLaplacian in HyperGCN while only two in 1-HyperGCN. Interestingly, we note that FastHyperGCN\nis superior to 1-HyperGCN. This, we believe is because of the large hyperedges (size more than 4)\n\n6\n\n\fTable 4: Results of SSL experiments. We report mean test error \u00b1 standard deviation (lower is better)\nover 100 train-test splits. Please refer to section 5 for details.\n\nData Method\n\nDBLP\n\nco-authorship\n\nPubmed\nco-citation\n\nCora\n\nco-authorship\n\nCora\n\nco-citation\n\nCiteseer\nco-citation\n\nH\nX\n\nCI\nMLP\n\n54.81 \u00b1 0.9\n37.77 \u00b1 2.0\n\n52.96 \u00b1 0.8\n30.70 \u00b1 1.6\n\n55.45 \u00b1 0.6\n41.25 \u00b1 1.9\n\n64.40 \u00b1 0.8\n42.14 \u00b1 1.8\n\n70.37 \u00b1 0.3\n41.12 \u00b1 1.7\n\nH, X MLP + HLR\nH, X HGNN\n\n30.42 \u00b1 2.1\n25.65 \u00b1 2.1\n\n30.18 \u00b1 1.5\n29.41 \u00b1 1.5\n\n34.87 \u00b1 1.8\n31.90 \u00b1 1.9\n\n36.98 \u00b1 1.8\n32.41 \u00b1 1.8\n\n37.75 \u00b1 1.6\n37.40 \u00b1 1.6\n\nH, X 1-HyperGCN\nH, X FastHyperGCN\nH, X HyperGCN\n\n33.87 \u00b1 2.4\n27.34 \u00b1 2.1\n24.09 \u00b1 2.0\n\n30.08 \u00b1 1.5\n29.48 \u00b1 1.6\n25.56 \u00b1 1.6\n\n36.22 \u00b1 2.2\n32.54 \u00b1 1.8\n30.08 \u00b1 1.8\n\n34.45 \u00b1 2.1\n32.43 \u00b1 1.8\n32.37 \u00b1 1.7\n\n38.87 \u00b1 1.9\n37.42 \u00b1 1.7\n37.35 \u00b1 1.6\n\nTable 5: Results (lower is better) on sythetic data and a subset of DBLP showing that our methods\nare more effective for noisy hyperedges. \u03b7 is no. of hypernodes of one class divided by that of the\nother in noisy hyperedges. Best result is in bold and second best is underlined. Please see Section 6.\n\nMethod\n\u03b7 = 0.75\n15.92 \u00b1 2.4\nHGNN\nFastHyperGCN 28.86 \u00b1 2.6\n22.44 \u00b1 2.0\nHyperGCN\n\n\u03b7 = 0.70\n24.89 \u00b1 2.2\n31.56 \u00b1 2.7\n29.33 \u00b1 2.2\n\n\u03b7 = 0.65\n31.32 \u00b1 1.9\n33.78 \u00b1 2.1\n33.41 \u00b1 1.9\n\n\u03b7 = 0.60\n39.13 \u00b1 1.78\n33.89 \u00b1 2.0\n33.67 \u00b1 1.9\n\n\u03b7 = 0.55\n42.23 \u00b1 1.9\n34.56 \u00b1 2.2\n35.05 \u00b1 2.0\n\n\u03b7 = 0.50\n44.25 \u00b1 1.8\n35.65 \u00b1 2.1\n37.89 \u00b1 1.9\n\nsDBLP\n\n45.27 \u00b1 2.4\n41.79 \u00b1 2.8\n41.64 \u00b1 2.6\n\npresent in all the datasets. FastHyperGCN uses all the mediators while 1-HyperGCN uses only two\nvertices. We now attempt to explain them.\nProposition 1: Given a hypergraph H = (V, E) with E \u2286 2V \u2212 \u222av\u2208V {v} and signals on the\nvertices S : V \u2192 Rd, let, for each hyperedge e \u2208 E, (ie, je) := arg maxi,j\u2208e ||Si \u2212 Sj||2 and\nKe := {v \u2208 e : v (cid:54)= ie, v (cid:54)= je}. De\ufb01ne\n\ne\u2208E\n\n\u2022 wc\n\n(cid:110){u, v} : u \u2208 e, v \u2208 e, u (cid:54)= v\n\u2022 Ec := (cid:83)\n(cid:16){u, v}(cid:17)\n:= (cid:80)\n\u2022 Em(S) := (cid:83)\n(cid:83)\n{ie, je} (cid:83)\nS,{u, v}(cid:17)\n(cid:16)\n:= (cid:80)\n\n(cid:111)\n(cid:16)\n1{u,v}\u2208Ec \u00b7 1u\u2208e \u00b7 1v\u2208e\n(cid:110){u, v} : u \u2208 {ie, je}, v \u2208 Ke\n\n(cid:17)\n(cid:16) 1\n\n1{u,v}\u2208Em(S) \u00b7 1u\u2208e \u00b7 1v\u2208e\n\ne\u2208E,|e|\u22653\n\n\u2022 wm\n\n|e|\u00b7(|e|\u22121)\n\n(cid:17)\n\ne\u2208E\n\ne\u2208E\n\n(cid:9)(cid:111)\n\n2\n\n,\n\n2|e|\u22123\n\n,\n\ne\u2208E\n\n|e| = 3.\n\nso that Gc = (V, Ec, wc) and Gm(S) = (V, Em(S), wm(S)) are the normalised clique expansion,\ni.e., graph of HGNN and mediator expansion, i.e., graph of HyperGCN/FastHyperGCN respectively.\nA suf\ufb01cient condition for Gc = Gm(S),\u2200S is max\ne\u2208E\n\n|e|C2 and |Em| =(cid:80)\n\nfollows from de\ufb01nitions that |Ec| =(cid:80)\n\nProof: Observe that we consider hypergraphs in which the size of each hyperedge is at least 2. It\n. Clealy, a suf\ufb01cient\ne\u2208E\ncondition is when each hyperedge is approximated by the same subgraph in both the expansions. In\nother words the condition is |e|\u00b7(|e|\u22121)\n= 2|e| \u2212 3 for each e \u2208 E. Solving the resulting quadratic\neqution x2 \u2212 5x + 6 = 0 gives us (x \u2212 2)(x \u2212 3) = 0. Hence, |e| = 2 or |e| = 3 for each e \u2208 E. (cid:3)\nComparable performance on Cora and Citeseer co-citation\nWe note that HGNN is the most competitive baseline. Also S = X for FastHyperGCN and S = H\u0398\n\n2|e|\u2212 3\n\n(cid:16)\n\n(cid:17)\n\ne\u2208E\n\n2\n\n7\n\n\ffor HyperGCN. The proposition states that the graphs of HGNN, FastHyperGCN, and HyperGCN\nare the same irrespective of the signal values whenever the maximum size of a hyperedge is 3.\nThis explains why the three methods have comparable accuracies for Cora co-citaion and Citeseer co-\ncitiation hypergraphs. The mean hyperedge sizes are close to 3 (with comparitively lower deviations)\nas shown in Table 3. Hence the graphs of the three methods are more or less the same.\nSuperior performance on Pubmed, DBLP, and Cora co-authorship\nWe see that HyperGCN performs statistically signi\ufb01cantly (p-value of Welch t-test is less than 0.0001)\ncompared to HGNN on the other three datasets. We believe this is due to large noisy hyperedges in\nreal-world hypergraphs. An author can write papers from different topics in a co-authorship network\nor a paper typically cites papers of different topics in co-citation networks.\nAverage sizes in Table 3 show the presence of large hyperedges (note the large standard deviations).\nClique expansion has edges on all pairs and hence potentially a larger number of hypernode pairs of\ndifferent labels than the mediator graph of Figure 2, thus accumulating more noise.\nPreference of HyperGCN and FastHyperGCN over HGNN\nTo further illustrate superiority over HGNN on noisy hyperedges, we conducted experiments on\nsynthetic hypergraphs each consisting of 1000 hypernodes, randomly sampled 500 hyperedges, and 2\nclasses with 500 hypernodes in each class. For each synthetic hypergraph, 100 hyperedges (each of\nsize 5) were \u201cpure\", i.e., all hypernodes were from the same class while the other 400 hyperedges\n(each of size 20) contained hypernodes from both classes. The ratio, \u03b7, of hypernodes of one class to\nthe other was varied from 0.75 (less noisy) to 0.50 (most noisy) in steps of 0.05.\nTable 5 shows the results on synthetic data. We initialise features to random Gaussian of d = 256. We\nreport mean error and deviation over 10 different synthetically generated hypergraphs. We see that\nfor hyperedges with \u03b7 = 0.75, 0.7 (mostly pure), HGNN is the superior model because it connects\nmore similar vertices. However, as \u03b7 (noise) increases, our methods begin to outperform HGNN.\nInterestingly, for \u03b7 = 0.50, FastHyperGCN even seems to outperform HyperGCN.\nSubset of DBLP: We also trained all three models on a subset of DBLP (we call it sDBLP) by\nremoving all hyperedges of size 2 and 3. The resulting hypergraph has around 8000 hyperedges with\nan average size of 8.5 \u00b1 8.8. We report mean error over 10 different train-test splits in Table 5.\nConclusion: From the above analysis, we conclude that our proposed methods (HyperGCN and\nFastHyperGCN) should be preferred to HGNN for hypergraphs with large noisy hyperedges. This is\nalso the case on experiments in combinatorial optimisation (Table 6) which we discuss next.\n\n7 HyperGCN for combinatorial optimisation\nInspired by the recent sucesses of deep graph models as learning-based approaches for NP-hard\nproblems [31, 38, 26, 19], we have used HyperGCN as a learning-based approach for the densest\nk-subhypergraph problem [15]. NP-hard problems on hypergraphs have recently been highlighted as\ncrucial for real-world network analysis [2, 36]. Our problem is, given a hypergraph (V, E), to \ufb01nd a\nsubset W \u2286 V of k hypernodes so as to maximise the number of hyperedges contained in V , i.e., we\nwish to maximise the density given by |e \u2208 E : e \u2286 W|.\nA greedy heuristic for the problem is to select the k hypernodes of the maximum degree. We call\nthis \u201cMaxDegree\". Another greedy heuristic is to iteratively remove all hyperedges from the current\n(residual) hypergraph consisting of a hypernode of the minimum degree. We repeat the procedure n\u2212k\ntimes and consider the density of the remaining k hypernodes. We call this \u201cRemoveMinDegree\".\nExperiments: Table 6 shows the results. We trained all the learning-based models with a synthetically\ngenerated dataset. More details on the approach and the synthetic data are in the supplementary. As\nseen in Table 6, our proposed HyperGCN outperforms all the other approaches except for the pubmed\ndataset which contains a small number of vertices with large degrees and a large number of vertices\nwith small degrees. The RemoveMinDegree baseline is able to recover all the hyperedges here.\nVisualisation: Figure 3 shows the visualisations given by HGNN and HyperGCN on the Cora\nco-authorship clique-expanded hypergraph. We used Gephi\u2019s Force Atlas to space out the vertices.\nIn general, a cluster of nearby vertices has multiple hyperedges connecting them. Clusters of only\ngreen vertices are ideal, this means the algorithm has likely included many hyperedges induced by\nthe clusters. The \ufb01gure of HyperGCN has more dense green clusters than that of HGNN.\n\n8\n\n\fDBLP\n\nco-authorship\n\nPubmed\nco-citation\n\nTable 6: Results on the densest k-subhypergraph problem. We report density (higher is better) of the\nset of vertices obtained by each of the proposed approaches for k = 3|V |\n4 . See section 7 for details.\nDataset\u2192\nCiteseer\nApproach\u2193\nco-citation\nMaxDegree\nRemoveMinDegree\nMLP\nMLP + HLR\nHGNN\n1-HyperGCN\nFastHyperGCN\nHyperGCN\n# hyperedges, |E|\n\nSynthetic\ntest set\n174 \u00b1 50\n147 \u00b1 48\n174 \u00b1 56\n231 \u00b1 46\n337 \u00b1 49\n207 \u00b1 52\n352 \u00b1 45\n359 \u00b1 49\n\n4840\n7714\n5580\n5821\n6274\n5624\n7342\n7720\n22535\n\n1306\n7963\n1206\n3462\n7865\n1761\n7893\n7928\n7963\n\nCora\n\n544\n1369\n550\n952\n1408\n563\n1419\n1431\n1579\n\n507\n843\n534\n764\n969\n509\n969\n971\n1079\n\n500\n\nco-authorship\n\nco-citation\n\nCora\n\n194\n450\n238\n297\n437\n251\n452\n504\n1072\n\n(a) HGNN\n\n(b) HyperGCN\n\nFigure 3: Green / pink hypernodes denote those the algorithm labels as positive / negative respectively.\n\n8 Comparison of training times of FastHyperGCN and HGNN\nWe compared the average training times in Table 1. Both were run on a GeForce GTX 1080 Ti\nGPU machine. FastHyperGCN is faster because it uses a linear number of edges for each hyperedge\nwhile HGNN uses quadratic. It is also superior in terms of performance on hypergraphs with large\nnoisy hyperedges (Table 5) and higly competitive on real-world data (Tables 4 and 6). Please see\nsupplementary for the algorithms and time complexities of all the proposed methods and HGNN.\n9 Conclusion\nWe have proposed HyperGCN, a new method of training GCN on hypergraph using tools from\nspectral theory of hypergraphs. We have shown HyperGCN\u2019s effectiveness in SSL and combinatorial\noptimisation. Approaches that assign importance to nodes [46, 35, 45] have improved results on SSL.\nHyperGCN may be augmented with such approaches for even more improved performance. One\nof the limitations of our approach is that the quality of the graph approximation obtained is highly\ndependent on the weight initialisation. We address this issue as part of future work.\n10 Acknowledgement\nAnand Louis was supported in part by SERB Award ECR/2017/003296 and a Pratiksha Trust Young\nInvestigator Award. We acknowledge the support of Google India and NeurIPS in the form of an\nInternational Travel Grant, which enabled Naganand Yadati to attend the conference.\n\nReferences\n[1] Sameer Agarwal, Kristin Branson, and Serge Belongie. Higher order learning with graphs. In\n\nInternational Conference on Machine Learning (ICML), pages 17\u201324, 2006. 2 and 6.\n\n[2] Ilya Amburg, Jon Kleinberg, and Austin R. Benson. Planted hitting set recovery in hypergraphs.\n\nCoRR, arXiv:1905.05839, 2019. 2 and 8.\n\n9\n\n\f[3] James Atwood and Don Towsley. Diffusion-convolutional neural networks. In Neural Informa-\n\ntion Processing Systems (NIPS), pages 1993\u20132001. Curran Associates, Inc., 2016. 1.\n\n[4] Peter W. Battaglia, Jessica Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vin\u00edcius Flo-\nres Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan\nFaulkner, \u00c7aglar G\u00fcl\u00e7ehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish\nVaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan\nWierstra, Pushmeet Kohli, Matthew Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu.\nRelational inductive biases, deep learning, and graph networks. arXiv:1806.01261, 2018. 2.\n\n[5] Michael Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geo-\n\nmetric deep learning: Beyond euclidean data. IEEE Signal Process., 2017. 2 and 3.\n\n[6] Samuel R. Bul\u00f2 and Marcello Pelillo. A game-theoretic approach to hypergraph clustering.\nIn Advances in Neural Information Processing Systems (NIPS) 22, pages 1571\u20131579. Curran\nAssociates, Inc., 2009. 2.\n\n[7] T.-H. Hubert Chan and Zhibin Liang. Generalizing the hypergraph laplacian via a diffusion\nprocess with mediators. In Computing and Combinatorics - 24th International Conference,\n(COCOON), pages 441\u2013453, 2018. 3, 5, and 6.\n\n[8] T.-H. Hubert Chan, Anand Louis, Zhihao Gavin Tang, and Chenzi Zhang. Spectral properties of\nhypergraph laplacian and approximation algorithms. J. ACM, 65(3):15:1\u201315:48, 2018. 2, 3, 4, and 5.\n\n[9] T.-H. Hubert Chan, Zhihao Gavin Tang, Xiaowei Wu, and Chenzi Zhang. Diffusion operator\n\nand spectral analysis for directed hypergraph laplacian. Theor. Comput. Sci., 2019. 3.\n\n[10] Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-Supervised Learning. The\n\nMIT Press, 2010. 1 and 3.\n\n[11] Olivier Chapelle, Jason Weston, and Bernhard Sch\u00f6lkopf. Cluster kernels for semi-supervised\n\nlearning. In Neural Information Processing Systems (NIPS), pages 601\u2013608. MIT, 2003. 1.\n\n[12] Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. Multi-label image recognition\nwith graph convolutional networks. In The IEEE Conference on Computer Vision and Pattern\nRecognition (CVPR), 2019. 2.\n\n[13] I (Eli) Chien, Huozhi Zhou, and Pan Li. hs2: Active learning over hypergraphs with pointwise\nIn International Conference on Arti\ufb01cial Intelligence and Statistics\n\nand pairwise queries.\n(AISTATS), pages 2466\u20132475, 2019. 2.\n\n[14] Uthsav Chitra and Benjamin J Raphael. Random walks on hypergraphs with edge-dependent\nvertex weights. In Proceedings of the 36th International Conference on Machine Learning\n(ICML), 2019. 2.\n\n[15] Eden Chlamt\u00e1c, Michael Dinitz, Christian Konrad, Guy Kortsarz, and George Rabanca. The\n\ndensest k-subhypergraph problem. SIAM J. Discrete Math., pages 1458\u20131477, 2018. 8.\n\n[16] Fuli Feng, Xiangnan He, Yiqun Liu, Liqiang Nie, and Tat-Seng Chua. Learning on partial-\norder hypergraphs. In Proceedings of the 2018 World Wide Web Conference (WWW), pages\n1523\u20131532, 2018. 2.\n\n[17] Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural\nnetworks. In Proceedings of the Thirty-Third Conference on Association for the Advancement\nof Arti\ufb01cial Intelligence (AAAI), 2019. 1, 2, 5, and 6.\n\n[18] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl.\nIn Proceedings of the 34th International\n\nNeural message passing for quantum chemistry.\nConference on Machine Learning (ICML), pages 1263\u20131272, 2017. 5.\n\n[19] Yu Gong, Yu Zhu, Lu Duan, Qingwen Liu, Ziyu Guan, Fei Sun, Wenwu Ou, and Kenny Q. Zhu.\n\nExact-k recommendation via maximal clique optimization. In KDD, 2019. 2, 3, and 8.\n\n[20] William L. Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods\n\nand applications. IEEE Data Eng. Bull., 40(3):52\u201374, 2017. 2.\n\n10\n\n\f[21] David K. Hammond, Pierre Vandergheynst, and R\u00e9mi Gribonval. Wavelets on graphs via\n\nspectral graph theory. Applied and Computational Harmonic Analysis, 2011. 3.\n\n[22] Matthias Hein, Simon Setzer, Leonardo Jost, and Syama Sundar Rangapuram. The total\nvariation on hypergraphs - learning on hypergraphs revisited. In Advances in Neural Information\nProcessing Systems (NIPS) 26, pages 2427\u20132435. Curran Associates, Inc., 2013. 1, 2, 3, 5, and 6.\n\n[23] Jianwen Jiang, Yuxuan Wei, Yifan Feng, Jingxuan Cao, and Yue Gao. Dynamic hypergraph\nIn Proceedings of the Twenty-Eighth International Joint Conference on\n\nneural networks.\nArti\ufb01cial Intelligence (IJCAI), pages 2635\u20132641, 2019. 2.\n\n[24] Taisong Jin, Liujuan Cao, Baochang Zhang, Xiaoshuai Sun, Cheng Deng, and Rongrong Ji.\nHypergraph induced convolutional manifold networks. In Proceedings of the Twenty-Eighth\nInternational Joint Conference on Arti\ufb01cial Intelligence (IJCAI), pages 2670\u20132676, 2019. 2.\n\n[25] Thomas N Kipf and Max Welling. Semi-supervised classi\ufb01cation with graph convolutional\n\nnetworks. In International Conference on Learning Representations (ICLR), 2017. 1, 2, 3, and 6.\n\n[26] Henrique Lemos, Marcelo Prates, Pedro Avelar, and Luis Lamb. Graph colouring meets deep\nlearning: Effective graph neural network models for combinatorial problems. In Proceedings of\nthe 28th International Joint Conference on Arti\ufb01cial Intelligence, (IJCAI), 2019. 2, 3, and 8.\n\n[27] Pan Li, Niao He, and Olgica Milenkovic. Quadratic decomposable submodular function\nminimization. In Advances in Neural Information Processing Systems (NeurIPS) 31, pages\n1054\u20131064. Curran Associates, Inc., 2018. 3 and 6.\n\n[28] Pan Li and Olgica Milenkovic. Inhomogeneous hypergraph clustering with applications. In\nAdvances in Neural Information Processing Systems (NIPS) 30, pages 2308\u20132318. Curran\nAssociates, Inc., 2017. 2.\n\n[29] Pan Li and Olgica Milenkovic. Revisiting decomposable submodular function minimization\nwith incidence relations. In Advances in Neural Information Processing Systems (NeurIPS) 31,\npages 2237\u20132247. Curran Associates, Inc., 2018. 3 and 6.\n\n[30] Pan Li and Olgica Milenkovic. Submodular hypergraphs: p-laplacians, Cheeger inequalities and\nspectral clustering. In Proceedings of the 35th International Conference on Machine Learning\n(ICML), pages 3014\u20133023, 2018. 3 and 6.\n\n[31] Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Combinatorial optimization with graph convolu-\ntional networks and guided tree search. In Advances in Neural Information Processing Systems\n(NIPS) 31, pages 537\u2013546. Curran Associates, Inc., 2018. 2, 3, and 8.\n\n[32] Anand Louis. Hypergraph markov operators, eigenvalues and approximation algorithms. In\nProceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, (STOC),\npages 713\u2013722, 2015. 2, 3, and 4.\n\n[33] Stphane Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1999. 3.\n\n[34] Diego Marcheggiani and Ivan Titov. Encoding sentences with graph convolutional networks\nfor semantic role labeling. In Proceedings of the 2017 Conference on Empirical Methods in\nNatural Language Processing (EMNLP), pages 1506\u20131515, 2017. 2.\n\n[35] Federico Monti, Oleksandr Shchur, Aleksandar Bojchevski, Or Litany, Stephan G\u00fcnnemann,\nand Michael Bronstein. Dual-primal graph convolutional networks. abs/1806.00770, 2018. 9.\n\n[36] Hung Nguyen, Phuc Thai, My Thai, Tam Vu, and Thang Dinh. Approximate k-cover in\n\nhypergraphs: Ef\ufb01cient algorithms, and applications. CoRR, arXiv:1901.07928, 2019. 2 and 8.\n\n[37] Will Norcliffe-Brown, Efstathios Vafeias, and Sarah Parisot. Learning conditioned graph\nstructures for interpretable visual question answering. In Advances in Neural Information\nProcessing Systems (NeurIPS) 31, pages 8344\u20138353. Curran Associates, Inc., 2018. 2.\n\n11\n\n\f[38] Marcelo O. R. Prates, Pedro H. C. Avelar, Henrique Lemos, Luis Lamb, and Moshe Vardi.\nLearning to solve np-complete problems - a graph neural network for the decision tsp. In\nProceedings of the Thirty-Third Conference on Association for the Advancement of Arti\ufb01cial\nIntelligence (AAAI), 2019. 2, 3, and 8.\n\n[39] Sai Nageswar Satchidanand, Harini Ananthapadmanaban, and Balaraman Ravindran. Extended\ndiscriminative random walk: A hypergraph approach to multi-view multi-relational transductive\nlearning. In IJCAI, pages 3791\u20133797, 2015. 2.\n\n[40] Amnon Shashua, Ron Zass, and Tamir Hazan. Multi-way clustering using super-symmetric\nnon-negative tensor factorization. In Proceedings of the 9th European Conference on Computer\nVision (ECCV), pages 595\u2013608, 2006. 2.\n\n[41] David I. Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst.\nThe emerging \ufb01eld of signal processing on graphs: Extending high-dimensional data analysis to\nnetworks and other irregular domains. IEEE Signal Process. Mag., 30(3), 2013. 3.\n\n[42] Amarnag Subramanya and Partha Pratim Talukdar. Graph-Based Semi-Supervised Learning.\n\nMorgan & Claypool Publishers, 2014. 1 and 3.\n\n[43] Ke Tu, Peng Cui, Xiao Wang, Fei Wang, and Wenwu Zhu. Structural deep embedding for hyper-\nnetworks. In Proceedings of the Thirty-Second Conference on Association for the Advancement\nof Arti\ufb01cial Intelligence (AAAI), 2018. 2.\n\n[44] Shikhar Vashishth, Manik Bhandari, Prateek Yadav, Piyush Rai, Chiranjib Bhattacharyya, and\nPartha Talukdar. Incorporating syntactic and semantic information in word embeddings using\ngraph convolutional networks. In Proceedings of the 57th Annual Meeting of the Association\nfor Computational Linguistics (ACL), 2019. 2.\n\n[45] Shikhar Vashishth, Prateek Yadav, Manik Bhandari, and Partha Talukdar. Con\ufb01dence-based\ngraph convolutional networks for semi-supervised learning. In International Conference on\nArti\ufb01cial Intelligence and Statistics (AISTATS), 2019. 9.\n\n[46] Petar Veli\u02c7ckovi\u00b4c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li\u00f2, and Yoshua\n\nBengio. Graph attention networks. In ICLR, 2018. 9.\n\n[47] Chris Wendler, Dan Alistarh, and Markus P\u00fcschel. Powerset convolutional neural networks.\nIn Advances in Neural Information Processing Systems (NeurIPS) 32. Curran Associates, Inc.,\n2019. 2.\n\n[48] Jason Weston, Fr\u00e9d\u00e9ric Ratle, and Ronan Collobert. Deep learning via semi-supervised embed-\nding. In Proceedings of the 25th International Conference on Machine Learning (ICML), pages\n1168\u20131175, 2008. 1.\n\n[49] Chenzi Zhang, Shuguang Hu, Zhihao Gavin Tang, and T-H. Hubert Chan. Re-revisiting\nlearning on hypergraphs: Con\ufb01dence interval and subgradient method. In Proceedings of 34th\nInternational Conference on Machine Learning (ICML), pages 4026\u20134034, 2017. 1, 3, 4, and 6.\n\n[50] Muhan Zhang, Zhicheng Cui, Shali Jiang, and Yixin Chen. Beyond link prediction: Predicting\nhyperlinks in adjacency space. In Proceedings of the Thirty-Second Conference on Association\nfor the Advancement of Arti\ufb01cial Intelligence (AAAI), 2018. 2.\n\n[51] Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Sch\u00f6lkopf.\n\nLearning with local and global consistency. In NIPS, 2003. 1.\n\n[52] Denny Zhou, Jiayuan Huang, and Bernhard Sch\u00f6lkopf. Learning with hypergraphs: Clustering,\nclassi\ufb01cation, and embedding. In Advances in Neural Information Processing Systems (NIPS)\n19, pages 1601\u20131608. MIT Press, 2007. 1, 2, and 6.\n\n[53] Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. Semi-supervised learning using gaussian\n\n\ufb01elds and harmonic functions. In ICML, 2003. 1.\n\n[54] Xiaojin Zhu, Andrew B. Goldberg, Ronald Brachman, and Thomas Dietterich. Introduction to\n\nSemi-Supervised Learning. Morgan and Claypool Publishers, 2009. 1 and 3.\n\n12\n\n\f", "award": [], "sourceid": 850, "authors": [{"given_name": "Naganand", "family_name": "Yadati", "institution": "Indian Institute of Science"}, {"given_name": "Madhav", "family_name": "Nimishakavi", "institution": "Indian Institute of Science"}, {"given_name": "Prateek", "family_name": "Yadav", "institution": "Indian Institute of Science"}, {"given_name": "Vikram", "family_name": "Nitin", "institution": "Indian Institute of Science"}, {"given_name": "Anand", "family_name": "Louis", "institution": "Indian Institute of Science, Bengaluru"}, {"given_name": "Partha", "family_name": "Talukdar", "institution": "Indian Institute of Science, Bangalore"}]}