{"title": "Powerset Convolutional Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 929, "page_last": 940, "abstract": "We present a novel class of convolutional neural networks (CNNs) for set functions, i.e., data indexed with the powerset of a finite set. The convolutions are derived as linear, shift-equivariant functions\tfor various notions of shifts on set functions. The framework is fundamentally different from graph convolutions based on the Laplacian, as it provides not one but several basic shifts, one for each element in the ground set. Prototypical experiments with several set function classification tasks on synthetic datasets and on datasets derived from real-world hypergraphs demonstrate the potential of our new powerset CNNs.", "full_text": "Powerset Convolutional Neural Networks\n\nChris Wendler\n\nDepartment of Computer Science\n\nETH Zurich, Switzerland\n\nchris.wendler@inf.ethz.ch\n\nDan Alistarh\nIST Austria\n\ndan.alistarh@ist.ac.at\n\nMarkus P\u00fcschel\n\nDepartment of Computer Science\n\nETH Zurich, Switzerland\npueschel@inf.ethz.ch\n\nAbstract\n\nWe present a novel class of convolutional neural networks (CNNs) for set functions,\ni.e., data indexed with the powerset of a \ufb01nite set. The convolutions are derived\nas linear, shift-equivariant functions for various notions of shifts on set functions.\nThe framework is fundamentally different from graph convolutions based on the\nLaplacian, as it provides not one but several basic shifts, one for each element in\nthe ground set. Prototypical experiments with several set function classi\ufb01cation\ntasks on synthetic datasets and on datasets derived from real-world hypergraphs\ndemonstrate the potential of our new powerset CNNs.\n\n1\n\nIntroduction\n\nDeep learning-based methods are providing state-of-the-art approaches for various image learning\nand natural language processing tasks, such as image classi\ufb01cation [22, 28], object detection [41],\nsemantic image segmentation [42], image synthesis [20], language translation / understanding [23, 62]\nand speech synthesis [58]. However, an artifact of many of these models is that regularity priors\nare hidden in their fundamental neural building blocks, which makes it impossible to apply them\ndirectly to irregular data domains. For instance, image convolutional neural networks (CNNs) are\nbased on parametrized 2D convolutional \ufb01lters with local support, while recurrent neural networks\nshare model parameters across different time stamps. Both architectures share parameters in a way\nthat exploits the symmetries of the underlying data domains.\nIn order to port deep learners to novel domains, the according parameter sharing schemes re\ufb02ecting\nthe symmetries in the target data have to be developed [40]. An example are neural architectures for\ngraph data, i.e., data indexed by the vertices of a graph. Graph CNNs de\ufb01ne graph convolutional\nlayers by utilizing results from algebraic graph theory for the graph Laplacian [9, 51] and message\npassing neural networks [18, 47] generalize recurrent neural architectures from chain graphs to\ngeneral graphs. With these building blocks in place, neural architectures for supervised [16, 18, 50],\nsemi-supervised [25] and generative learning [52, 59] on graphs have been deployed. These research\nendeavors fall under the umbrella term of geometric deep learning (GDL) [10].\nIn this work, we want to open the door for deep learning on set functions, i.e., data indexed by\nthe powerset of a \ufb01nite set. There are (at least) three ways to do so. First, set functions can be\nviewed as data indexed by a hypercube graph, which makes graph neural nets applicable. Second,\nresults from the Fourier analysis of set functions based on the Walsh-Hadamard-transform (WHT)\n[15, 33, 54] can be utilized to formulate a convolution for set functions in a similar way to [51].\nThird, [36] introduces several novel notions of convolution for set functions (powerset convolution)\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fas linear, equivariant functions for different notions of shift on set functions. This derivation parallels\nthe standard 2D-convolution (equivariant to translations) and graph convolutions (equivariant to\nthe Laplacian or adjacency shift) [34]. A general theory for deriving new forms of convolutions,\nassociated Fourier transforms and other signal processing tools is outlined in [38].\nContributions Motivated by the work on generalized convolutions and by the potential utility of\ndeep learning on novel domains, we propose a method-driven approach for deep learning on irregular\ndata domains and, in particular, set functions:\n\n\u2022 We formulate novel powerset CNN architectures by integrating recent convolutions [36] and\n\nproposing novel pooling layers for set functions.\n\n\u2022 As a protoypical application, we consider the set function classi\ufb01cation task. Since there is\nlittle prior work in this area, we evaluate our powerset CNNs on three synthetic classi\ufb01cation\ntasks (submodularity and spectral properties) and two classi\ufb01cation tasks on data derived\nfrom real-world hypergraphs [5]. For the latter, we design classi\ufb01ers to identify the origin\nof the extracted subhypergraph. To deal with hypergraph data, we introduce several set-\nfunction-based hypergraph representations.\n\n\u2022 We validate our architectures experimentally, and show that they generally outperform\nthe natural fully-connected and graph-convolutional baselines on a range of scenarios and\nhyperparameter values.\n\n2 Convolutions on Set Functions\n\nWe introduce background and de\ufb01nitions for set functions and associated convolutions. For context\nand analogy, we \ufb01rst brie\ufb02y review prior convolutions for 2D and graph data. From the signal\nprocessing perspective, 2D convolutions are linear, shift-invariant (or equivariant) functions on\nimages s : Z2 \u2192 R; (i, j) (cid:55)\u2192 si,j, where the shifts are the translations T(k,l)s = (si\u2212k,j\u2212l)i,j\u2208Z2.\nThe 2D convolution thus becomes\n\n(h \u2217 s)i,j =\n\nhk,lsi\u2212k,j\u2212l.\n\n(1)\n\n(cid:88)\n\nk,l\u2208Z2\n\nEquivariance means that all convolutions commute with all shifts: h \u2217 (T(k,l)s) = T(k,l)(h \u2217 s).\nConvolutions on vertex-indexed graph signals s : V \u2192 R; v (cid:55)\u2192 sv are linear and equivariant with\nrespect to the Laplacian shifts Tks = Lks, where L is the graph Laplacian [51].\nSet functions With this intuition in place, we now consider set functions. We \ufb01x a \ufb01nite set\nN = {x1, . . . , xn}. An associated set function is a signal on the powerset of N:\n\ns : 2N \u2192 R; A (cid:55)\u2192 sA.\n\n(2)\n\nPowerset convolution A convolution for set functions is obtained by specifying the shifts to which\nit is equivariant. The work in [36] speci\ufb01es TQs = (sA\\Q)A\u2286N as one possible choice of shifts for\nQ \u2286 N. Note that in this case the shift operators are parametrized by the monoid (2N ,\u222a), since for\nall s\n\nTQ(TRs) = (sA\\R\\Q)A\u2286N = (sA\\(R\u222aQ))A\u2286N = TQ\u222aRs,\n\nwhich implies TQTR = TQ\u222aR. The corresponding linear, shift-equivariant powerset convolution is\ngiven by [36] as\n\n(h \u2217 s)A =\n\nhQsA\\Q.\n\n(3)\n\n(cid:88)\n\nQ\u2286N\n\nNote that the \ufb01lter h is itself a set function. Table 1 contains an overview of generalized convolutions\nand the associated shift operations to which they are equivariant to.\nFourier transform Each \ufb01lter h de\ufb01nes a linear operator \u03a6h := (h \u2217 \u00b7) obtained by \ufb01xing h in (3).\nIt is diagonalized by the powerset Fourier transform\n\n(cid:18)1\n\n(cid:19)\u2297n\n\n(cid:18)1\n\n(cid:19)\n\n(cid:18)1\n\n(cid:19)\n\nF =\n\n0\n1 \u22121\n\n=\n\n0\n1 \u22121\n\n\u2297 \u00b7\u00b7\u00b7 \u2297\n\n0\n1 \u22121\n\n,\n\n(4)\n\n2\n\n\fimage\ngraph Laplacian\ngraph adjacency\ngroup\ngroup spherical\npowerset\n\nsignal\n\n(si,j)i,j\n(sv)v\u2208V\n(sv)v\u2208V\n(sg)g\u2208G\n(sR)R\u2208SO(3)\n(sA)A\u2286N\n\nshifted signal\n(si\u2212k,j\u2212l)i,j\u2208Z\n(Lks)v\u2208V\n(Aks)v\u2208V\n(sq\u22121g)g\u2208G\n(sQ\u22121R)R\u2208SO(3)\n(sA\\Q)A\u2286N\n\nconvolution\n\n(cid:80)\n((cid:80)\n((cid:80)\n(cid:80)\n(cid:82) hQsQ\u22121Rd\u00b5(Q)\n(cid:80)\n\nk,l hk,lsi\u2212k,j\u2212l\nk hkLks)v\nk hkAks)v\nq hqsq\u22121g\n\nQ hQsA\\Q\n\nreference CNN\nstandard\n[51]\n[44]\n[53]\n[12]\n[36]\n\nstandard\n[9]\n[55]\n[13]\n[12]\nthis paper\n\nTable 1: Generalized convolutions and their shifts.\n\nwhere \u2297 denotes the Kronecker product. Note that F \u22121 = F in this case and that the spectrum is\nalso indexed by subsets B \u2286 N. In particular, we have\n\nF \u03a6hF \u22121 = diag((\u02dchB)B\u2286N ),\n\n(5)\nin which \u02dch denotes the frequency response of the \ufb01lter h [36]. We denote the linear mapping from h\nto its frequency response \u02dch by \u00afF , i.e., \u02dch = \u00afF h.\nOther shifts and convolutions There are several other possible de\ufb01nitions of set shifts, each com-\ning with its respective convolutions and Fourier transforms [36]. Two additional examples are\nT (cid:5)\nQs = (sA\u222aQ)A\u2286N and the symmetric difference T \u2022\nQs = (s(A\\Q)\u222a(Q\\A))A\u2286N [54]. The associated\nconvolutions are, respectively,\n\n(h \u2217 s)A =\n\nhQsA\u222aQ and\n\n(h \u2217 s)A =\n\nhQs(A\\Q)\u222a(Q\\A).\n\n(6)\n\n(cid:88)\n\nQ\u2286N\n\n(cid:88)\n\nQ\u2286N\n\nparticular, 1-localized \ufb01lters (h \u2217 s)A = h\u2205sA +(cid:80)\n\nLocalized \ufb01lters Filters h with hQ = 0 for |Q| > k are k-localized in the sense that the evaluation\nof (h \u2217 s)A only depends on evaluations of s on sets differing by at most k elements from A. In\nx\u2208N h{x}sA\\{x} are the counterpart of one-hop\n\ufb01lters that are typically used in graph CNNs [25]. In contrast to the omnidirectional one-hop graph\n\ufb01lters, these one-hop \ufb01lters have one direction per element in N.\n\n2.1 Applications of Set Functions\n\nSet functions are of practical importance across a range of research \ufb01elds. Several optimization tasks,\nsuch as cost effective sensor placement [27], optimal ad placement [19] and tasks such as semantic\nimage segmentation [35], can be reduced to subset selection tasks, in which a set function determines\nthe value of every subset and has to be maximized to \ufb01nd the best one. In combinatorial auctions,\nset functions can be used to describe bidding behavior. Each bidder is represented as a valuation\nfunction that maps each subset of goods to its subjective value to the customer [14]. Cooperative\ngames are set functions [8]. A coalition is a subset of players and a coalition game assigns a value to\nevery subset of players. In the simplest case the value one is assigned to winning and the value zero\nto losing coalitions. Further, graphs and hypergraphs also admit set function representations:\nDe\ufb01nition 1. (Hypergraph) A hypergraph is a triple H = (V, E, w), where V = {v1, . . . , vn} is a\nset of vertices, E \u2286 (P(V ) \\ \u2205) is a set of hyperedges and w : E \u2192 R is a weight function.\nThe weight function of a hypergraph is a set function on V by setting sA = wA if A \u2208 E and\nsA = 0 otherwise. Additionally, hypergraphs induce two set functions, namely the hypergraph cut\nand association score function:\n\ncutA =\n\nwB and\n\nassocA =\n\nwB.\n\n(7)\n\n(cid:88)\n\nB\u2208E,B\u2286A\n\n(cid:88)\n\nB\u2208E,B\u2229A(cid:54)=\u2205,\nB\u2229(V \\A)(cid:54)=\u2205\n\n2.2 Convolutional Pattern Matching\n\nThe powerset convolution in (3) raises the question of which patterns are \u201cdetected\u201d by a \ufb01lter\n(hQ)Q\u2286N . In other words, to which signal does the \ufb01lter h respond strongest when evaluated at a\n\n3\n\n\fgiven subset A? We call this signal pA (the pattern matched at position A). Formally,\n\npA = arg max\ns:(cid:107)s(cid:107)=1\n\n(h \u2217 s)A.\n\n(8)\n\nFor pN , the answer is pN = (1/(cid:107)h(cid:107))(hN\\B)B\u2286N . This is because the dot product (cid:104)h, s\u2217(cid:105), with\ns\u2217\nA = sN\\A, is maximal if h and s\u2217 are aligned. Slightly rewriting (3) yields the answer for the\ngeneral case A \u2286 N:\n\n(h \u2217 s)A =\n\nhQsA\\Q =\n\nhQ1\u222aQ2\n\nsA\\Q1.\n\n(9)\n\n(cid:88)\n\nQ\u2286N\n\n(cid:88)\n\nQ1\u2286A\n\n\uf8eb\uf8ed (cid:88)\n(cid:124)\n\nQ2\u2286N\\A\n=:h(cid:48)\n\n(cid:123)(cid:122)\n\nQ1\n\n\uf8f6\uf8f8\n(cid:125)\n\nNamely, (9) shows that the powerset convolution evaluated at position A can be seen as the convolution\nof a new \ufb01lter h(cid:48) with s restricted to the powerset 2A evaluated at position A, the case for which we\nknow the answer: pA\n\nExample 1. (One-hop patterns) For a one-hop \ufb01lter h, i.e., (h \u2217 s)A = h\u2205sA +(cid:80)\n\nA\\B if B \u2286 A and pA\n\nB = (1/(cid:107)h(cid:48)(cid:107))h(cid:48)\n\nB = 0 otherwise.\n\nx\u2208N h{x}sA\\{x}\n\nthe pattern matched at position A takes the form\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3 1(cid:107)h(cid:48)(cid:107) (h\u2205 +(cid:80)\n\n1(cid:107)h(cid:48)(cid:107) h{x}\n0\n\npA\nB =\n\nx\u2208N\\A h{x})\n\nif B = A,\nif B = A \\ {x} with x \u2208 A,\nelse.\n\n(10)\n\nHere, h(cid:48) corresponds to the \ufb01lter restricted to the powerset 2A as in (9).\n\nNotice that this behavior is different from 1D and 2D convolutions: there the underlying shifts\n(translations) are invertible and thus the detected patterns are again shifted versions of each other. For\nexample, the 1D convolutional \ufb01lter (hq)q\u2208Z matches p0 = (h\u2212q)q\u2208Z at position 0 and pt = T\u2212tp0 =\n(h\u2212q+t)q\u2208Z at position t, and, the group convolutional \ufb01lter (hq)q\u2208G matches pe = (hq\u22121)q\u2208G at the\nunit element e and pg = Tg\u22121pe = (hgq\u22121)q\u2208G at position g. Since powerset shifts are not invertible,\nthe detected patterns by a \ufb01lter are not just (set-)shifted versions of each other as shown above.\nA similar behavior can be expected with graph convolutions since the Laplacian shift is never\ninvertible and the adjacency shift is not always invertible.\n\n3 Powerset Convolutional Neural Networks\n\nConvolutional layers We de\ufb01ne a convolutional layer by extending the convolution to multiple\nchannels, summing up the feature maps obtained by channel-wise convolution as in [10]:\nDe\ufb01nition 2. (Powerset convolutional layer) A powerset convolutional layer is de\ufb01ned as follows:\n\n1. The input is given by nc set functions s = (s(1), . . . , s(nc)) \u2208 R2N\u00d7nc ;\n2. The output is given by nf set functions t = L\u0393(s) = (t(1), . . . , t(nf )) \u2208 R2N\u00d7nf ;\n3. The layer applies a bank of set function \ufb01lters \u0393 = (h(i,j))i,j, with i \u2208 {1, . . . , nc} and\n\nj \u2208 {1, . . . , nf}, and a point-wise non-linearity \u03c3 resulting in\n\nt(j)\nA = \u03c3(\n\n(h(i,j) \u2217 s(i))A).\n\n(11)\n\ni=1\n\nPooling layers As in conventional CNNs, we de\ufb01ne powerset pooling layers to gain additional\nrobustness with respect to input perturbations, and to control the number of features extracted by the\nconvolutional part of the powerset CNN. From a signal processing perspective, the crucial aspect of\nthe pooling operation is that the pooled signal lives on a valid signal domain, i.e., a powerset. One\nway to achieve this is by combining elements of the ground set.\n\n4\n\nnc(cid:88)\n\n\fFigure 1: Forward pass of a simple powerset CNN with two convolutional and two pooling layers.\nSet functions are depicted as signals on the powerset lattice.\n\nDe\ufb01nition 3. (Powerset pooling) Let N(cid:48)(X) be the ground set of size n \u2212 |X| + 1 obtained by\ncombining all the elements in X \u2286 N into a single element. E.g., for X = {x1, x2} we get\nN(cid:48)(X) = {{x1, x2}, x3, . . . , xn}. Therefore every subset X \u2286 N de\ufb01nes a pooling operation\n\nP X : R2N \u2192 R2N(cid:48) (X)\n\n: (sA)A\u2286N (cid:55)\u2192 (sB)B:B\u2229X=X or B\u2229X=\u2205.\n\n(12)\n\nIn our experiments we always use P := P {x1,x2}. It is also possible to pool a set function by\ncombining elements of the powerset as in [48] or by the simple rule sB = max(sB, sB\u222a{x}) for\nB \u2286 N \\ {x}. Then, a pooling layer is obtained by applying our pooling strategy to every channel.\nDe\ufb01nition 4. (Powerset pooling layer) A powerset pooling layer takes nc set functions as\ninput s = (s(1), . . . , s(nc)) \u2208 R2N\u00d7nc and outputs nc pooled set functions t = LP (s) =\n(t(1), . . . , t(nc)) \u2208 R2N(cid:48)\u00d7nc, with |N(cid:48)| = |N| \u2212 1, by applying the pooling operation to every\nchannel\n\nt(i) = P (s(i)).\n\n(13)\n\nPowerset CNN A powerset CNN is a composition of several powerset convolutional and pooling\nlayers. Depending on the task, the outputs of the convolutional component can be fed into a multi-layer\nperceptron, e.g., for classi\ufb01cation.\nFig. 1 illustrates a forward pass of a powerset CNN with two convolutional layers, each of which is\nfollowed by a pooling layer. The \ufb01rst convolutional layer is parameterized by three one-hop \ufb01lters\nand the second one is parameterized by \ufb01fteen (three times \ufb01ve) one-hop \ufb01lters. The \ufb01lter coef\ufb01cients\nwere initialized with random weights for this illustration.\nImplementation1 We implemented the powerset convolutional and pooling layers in Tensor\ufb02ow [1].\nOur implementation supports various de\ufb01nitions of powerset shifts, and utilizes the respective Fourier\ntransforms to compute the convolutions in the frequency domain.\n\n4 Experimental Evaluation\n\nOur powerset CNN is built on the premise that the successful components of conventional CNNs\nare domain independent and only rely on the underlying concepts of shift and shift-equivariant\nconvolutions. In particular, if we use only one-hop \ufb01lters, our powerset CNN satis\ufb01es locality and\ncompositionality. Thus, similar to image CNNs, it should be able to learn localized hierarchical\nfeatures. To understand whether this is useful when applied to set function classi\ufb01cation problems,\nwe evaluate our powerset CNN architectures on three synthetic tasks and on two tasks based on\nreal-world hypergraph data.\nProblem formulation Intuitively, our set function classi\ufb01cation task will require the models to learn\nto classify a collection of set functions sampled from some natural distributions. One such example\nwould be to classify (hyper-)graphs coming from some underlying data distributions. Formally, the\nset function classi\ufb01cation problem is characterized by a training set {(s(i), t(i))}m\ni=1 \u2286 (R2N \u00d7 C)\ncomposed of pairs (set function, label), as well as a test set. The learning task is to utilize the training\nset to learn a mapping from the space of set functions R2N to the label space C = {1, . . . , k}.\n1Sample implementations are provided at https://github.com/chrislybaer/Powerset-CNN.\n\n5\n\ninputconv1pool1conv2pool2MLP22x 524x 123x 523x 324x 3\f4.1 Synthetic Datasets\nUnless stated otherwise, we consider the ground set N = {x1, . . . , xn} with n = 10, and sample\n10, 000 set functions per class. We use 80% of the samples for training, and the remaining 20% for\ntesting. We only use one random split per dataset. Given this, we generated the following three\nsynthetic datasets, meant to illustrate speci\ufb01c applications of our framework.\nSpectral patterns In order to obtain non-trivial classes of set functions, we de\ufb01ne a sampling proce-\ndure based on the Fourier expansion associated with the shift TQs = (sA\\Q)A\u2286N . In particular, we\nsample Fourier sparse set functions, s = F \u22121\u02c6s with \u02c6s sparse. We implement this by associating each\ntarget \u201cclass\u201d with a collection of frequencies, and sample normally distributed Fourier coef\ufb01cients\nfor these frequencies. In our example, we de\ufb01ned four classes, where the Fourier support of the \ufb01rst\nand second class is obtained by randomly selecting roughly half of the frequencies. For the third\nclass we use the entire spectrum, while for the fourth we use the frequencies that are either in both of\nclass one\u2019s and class two\u2019s Fourier support, or in neither of them.\nk-junta classi\ufb01cation A k-junta [33] is a boolean function de\ufb01ned on n variables x1, . . . , xn that\nonly depends on k of the variables: xi1, . . . , xik. In the same spirit, we call a set function a k-junta if\nits evaluations only depend on the presence or absence of k of the n elements of the ground set:\nDe\ufb01nition 5. (k-junta) A set function s on the ground set N is called a k-junta if there exists a subset\nN(cid:48) \u2286 N, with |N(cid:48)| = k, such that s(A) = s(A \u2229 N(cid:48)), for all A \u2286 N.\nWe generate a k-junta classi\ufb01cation dataset by sampling random k-juntas for k \u2208 {3, . . . , 7}. We do\nso by utilizing the fact that shifting a set function by {x} eliminates its dependency on x, i.e., for\nA with x \u2208 A we have (T{x}s)A = sA\\{x} = (T{x}s)A\\{x} because (A \\ {x}) \\ {x} = A \\ {x}.\nTherefore, sampling a random k-junta amounts to \ufb01rst sampling a random value for every subset\nA \u2286 N and performing n \u2212 k set shifts by randomly selected singleton sets.\nSubmodularity classi\ufb01cation A set function s is submodular if it satis\ufb01es the diminishing returns\nproperty\n\n\u2200A, B \u2286 N with A \u2286 B and \u2200x \u2208 N \\ B : sA\u222a{x} \u2212 sA \u2265 sB\u222a{x} \u2212 sB.\n\n(14)\n\nIn words, adding an element to a small subset increases the value of the set function at least as\nmuch as adding it to a larger subset. We construct a dataset comprised of submodular and \"almost\nsubmodular\" set functions. As examples of submodular functions we utilize coverage functions [26]\n(a subclass of submodular functions that allows for easy random generation). As examples of what\nwe informally call \"almost submodular\" set functions here, we sample coverage functions and perturb\nthem slightly to destroy the coverage property.\n\n4.2 Real Datasets\n\nFinally, we construct two classi\ufb01cation tasks based on real hypergraph data. Reference [5] provides\n19 real-world hypergraph datasets. Each dataset is a hypergraph evolving over time. An example is\nthe DBLP coauthorship hypergraph in which vertices are authors and hyperedges are publications.\nIn the following, we consider classi\ufb01cation problems on subhypergraphs induced by vertex subsets\nof size ten. Each hypergraph is represented by its weight set function sA = 1 if A \u2208 E and sA = 0\notherwise.\nDe\ufb01nition 6. (Induced Subhypergraph [6]) Let H = (V, E) be a hypergraph. The subset of vertices\nV (cid:48) \u2286 V induces a subhypergraph H(cid:48) = (V (cid:48), E(cid:48)) with E(cid:48) = {A\u2229 V (cid:48) : for A \u2208 E and A\u2229 V (cid:48) (cid:54)= \u2205}.\nDomain classi\ufb01cation As we have multiple hypergraphs, an interesting question is whether it is\npossible to identify from which hypergraph a given subhypergraph of size ten was sampled, i.e.,\nwhether it is possible to distinguish the hypergraphs by considering only local interactions. Therefore,\namong the publicly available hypergraphs in [5] we only consider those containing at least 500\nhyperedges of cardinality ten (namely, DAWN: 1159, threads-stack-over\ufb02ow: 3070, coauth-DBLP:\n6599, coauth-MAG-History: 1057, coauth-MAG-Geology: 7704, congress-bills: 2952). The coauth-\nhypergraphs are coauthorship hypergraphs, in DAWN the vertices are drugs and the hyperedges\npatients, in threads-stack-over\ufb02ow the vertices are users and the hyperedges questions on threads\non stackoverflow.com and in congress-bills the vertices are congresspersons and the hyperedges\ncosponsored bills. From those hypergraphs we sample all the subhypergraphs induced by the\n\n6\n\n\fhyperedges of size ten and assign the respective hypergraph of origin as class label. In addition to this\ndataset (DOM6), we create an easier version (DOM4) in which we only keep one of the coauthorship\nhypergraphs, namely coauth-DBLP.\nSimplicial closure Reference [5] distinguishes between open and closed hyperedges (the latter are\ncalled simplices). A hyperedge is called open if its vertices in the 2-section (the graph obtained\nby making the vertices of every hyperedge a clique) of the hypergraph form a clique and it is\nnot contained in any hyperedge in the hypergraph. On the other hand, a hyperedge is closed if\nit is contained in one or is one of the hyperedges of the hypergraph. We consider the following\nclassi\ufb01cation problem: For a given subhypergraph of ten vertices, determine whether its vertices form\na closed hyperedge in the original hypergraph or not.\nIn order to obtain examples for closed hyperedges, we sample the subhypergraphs induced by the\nvertices of hyperedges of size ten and for open hyperedges we sample subhypergraphs induced by\nvertices of hyperedges of size nine extended by an additional vertex. In this way we construct two\nlearning tasks. First, CON10 in which we extend the nine-hyperedge by choosing the additional\nvertex such that the resulting hyperedge is open (2952 closed and 4000 open examples). Second,\nCOAUTH10 in which we randomly extend the size nine hyperedges (as many as there are closed ones)\nand use coauth-DBLP for training and coauth-MAG-History & coauth-MAG-Geology for testing.\n\n4.3 Experimental Setup\n\n(h (cid:5) s)A = h\u2205sA +(cid:80)\n\n(h \u2217 s)A = h\u2205sA + (cid:80)\n\nBaselines As baselines we consider a multi-layer perceptron (MLP) [43] with two hidden layers\nof size 4096 and an appropriately chosen last layer and graph CNNs (GCNs) on the undirected\nn-dimensional hypercube. Every vertex of the hypercube corresponds to a subset and vertices are\nconnected by an edge if their subsets only differ by one element. We evaluate graph convolutional\nlayers based on the Laplacian shift [25] and based on the adjacency shift [44]. In both cases one layer\ndoes at most one hop.\nOur models For our powerset CNNs (PCNs) we consider convolutional\none-hop \ufb01lters of two different convolutions:\n\nlayers based on\nx\u2208N h{x}sA\\{x} and\nx\u2208N h{x}sA\u222a{x}. For all types of convolutional layers we consider the follow-\ning models: three convolutional layers followed by an MLP with one hidden layer of size 512 as\nillustrated before, a pooling layer after each convolutional layer followed by the MLP, and a pooling\nlayer after each convolutional layer followed by an accumulation step (average of the features over all\nsubsets) as in [18] followed by the MLP. For all models we use 32 output channels per convolutional\nlayer and ReLU [32] non-linearities.\nTraining We train all models for 100 epochs (passes through the training data) using the Adam\noptimizer [24] with initial learning rate 0.001 and an exponential learning rate decay factor of 0.95.\nThe learning rate decays after every epoch. We use batches of size 128 and the cross entropy loss. All\nour experiments were run on a server with an Intel(R) Xeon(R) CPU @ 2.00GHz with four NVIDIA\nTesla T4 GPUs. Mean and standard deviation are obtained by running each experiment 20 times.\n\n4.4 Results\n\nOur results are summarized in Table 2. We report the test classi\ufb01cation accuracy in percentages (for\nmodels that converged).\nDiscussion Table 2 shows that in the synthetic tasks the powerset convolutional models (\u2217-PCNs)\ntend to outperform the baselines with the exception of A-GCNs, which are based on the adjacency\ngraph shift on the undirected hypercube. In fact, the set of A-convolutional \ufb01lters parametrized\nby our A-GCNs is the subset of the powerset convolutional \ufb01lters associated with the symmetric\ndifference shift (6) obtained by constraining all \ufb01lter coef\ufb01cients for one-element sets to be equal:\nh{xi} = c with c \u2208 R, for all i \u2208 {1, . . . , n}. Therefore, it is no surprise that the A-GCNs perform\nwell. In contrast, the restrictions placed on the \ufb01lters of L-GCN are stronger, since [25] replaces the\none-hop Laplacian convolution (\u03b80I + \u03b81(L \u2212 I))x (in Chebyshev basis) with \u03b8(2I \u2212 L)x by setting\n\u03b8 = \u03b80 = \u2212\u03b81.\nAn analogous trend is not as clearly visible in the tasks derived from real hypergraph data. In\nthese tasks, the graph CNNs seem to be either more robust to noisy data, or, to bene\ufb01t from their\npermutation equivariance properties. The robustness as well as the permutation equivariance can\n\n7\n\n\fBaselines\nMLP\nL-GCN\nL-GCN pool\nL-GCN pool avg.\nA-GCN\nA-GCN pool\nA-GCN pool avg.\nProposed models\n\u2217-PCN\n\u2217-PCN pool\n\u2217-PCN pool avg.\n(cid:5)-PCN\n(cid:5)-PCN pool\n(cid:5)-PCN pool avg.\n\nPatterns\n\nk-Junta\n\nSubmod.\n\nCOAUTH10\n\nCON10\n\nDOM4\n\nDOM6\n\n46.8 \u00b1 3.9\n52.5 \u00b1 0.9\n45.0 \u00b1 1.0\n42.1 \u00b1 0.3\n65.5 \u00b1 0.9\n56.9 \u00b1 2.2\n54.8 \u00b1 0.9\n88.5 \u00b1 4.3\n80.9 \u00b1 0.9\n75.9 \u00b1 1.9\n-\n-\n54.8 \u00b1 1.9\n\n43.2 \u00b1 2.5\n69.3 \u00b1 2.8\n60.9 \u00b1 1.5\n64.3 \u00b1 2.2\n95.8 \u00b1 2.7\n91.9 \u00b1 2.1\n95.8 \u00b1 1.1\n97.2 \u00b1 2.3\n96.0 \u00b1 1.6\n96.5 \u00b1 0.6\n97.5 \u00b1 1.4\n96.4 \u00b1 1.7\n96.6 \u00b1 0.7\n\n-\n-\n-\n82.2 \u00b1 0.4\n-\n89.8 \u00b1 1.8\n84.8 \u00b1 1.9\n88.6 \u00b1 0.4\n85.1 \u00b1 1.8\n87.0 \u00b1 1.6\n-\n-\n80.9 \u00b1 2.9\n\n80.7 \u00b1 0.2\n84.7 \u00b1 0.9\n83.2 \u00b1 0.7\n56.8 \u00b1 1.1\n80.5 \u00b1 0.7\n84.1 \u00b1 0.6\n64.8 \u00b1 1.1\n80.6 \u00b1 0.7\n82.6 \u00b1 0.4\n80.6 \u00b1 0.5\n83.6 \u00b1 0.4\n84.8 \u00b1 0.3\n83.3 \u00b1 0.5\n\n66.1 \u00b1 1.8\n67.2 \u00b1 1.8\n65.7 \u00b1 1.0\n64.1 \u00b1 1.7\n64.9 \u00b1 1.8\n66.0 \u00b1 1.6\n65.4 \u00b1 0.7\n62.8 \u00b1 2.9\n62.9 \u00b1 2.0\n63.4 \u00b1 3.5\n68.7 \u00b1 1.3\n68.2 \u00b1 0.8\n67.0 \u00b1 2.0\n\n93.6 \u00b1 0.2\n96.0 \u00b1 0.2\n93.2 \u00b1 1.1\n88.4 \u00b1 0.3\n93.9 \u00b1 0.3\n93.8 \u00b1 0.3\n92.7 \u00b1 0.6\n94.1 \u00b1 0.3\n94.0 \u00b1 0.3\n94.4 \u00b1 0.3\n93.7 \u00b1 0.2\n93.6 \u00b1 0.3\n94.8 \u00b1 0.3\n\n71.1 \u00b1 0.3\n73.7 \u00b1 0.4\n71.7 \u00b1 0.5\n62.8 \u00b1 0.4\n69.1 \u00b1 0.5\n70.7 \u00b1 0.4\n67.9 \u00b1 0.3\n70.5 \u00b1 0.3\n70.2 \u00b1 0.5\n73.0 \u00b1 0.3\n69.9 \u00b1 0.3\n70.3 \u00b1 0.4\n73.5 \u00b1 0.5\n\nTable 2: Results of the experimental evaluation in terms of test classi\ufb01cation accuracy (percentage).\nThe \ufb01rst three columns contain the results from the synthetic experiments and the last four columns\nthe results from the hypergraph experiments. The best-performing model from the corresponding\ncategory is in bold.\n\nbe attributed to the graph one-hop \ufb01lters being omnidirectional. On the other hand, the powerset\none-hop \ufb01lters are n-directional. Thus, they are sensitive to hypergraph isomorphy, i.e., hypergraphs\nwith same connectivity structure but different vertex ordering are being processed differently.\nPooling Interestingly, while reducing the hidden state by a factor of two after every convolutional\nlayer, pooling in most cases only slightly decreases the accuracy of the PCNs in the synthetic tasks\nand has no impact in the other tasks. Also the in\ufb02uence of pooling on the A-GCN is more similar to\nthe behavior of PCNs than the one for the L-GCN.\nEquivariance Finally, we compare models having a shift-invariant convolutional part (suf\ufb01x \"pool\navg.\") with models having a shift-equivariant convolutional part (suf\ufb01x \"pool\") models. The difference\nbetween these models is that the invariant ones have an accumulation step before the MLP resulting\nin (a) the inputs to the MLP being invariant w.r.t. the shift corresponding to the speci\ufb01c convolutions\nused and (b) the MLP having much fewer parameters in its hidden layer (32 \u00b7 512 instead of\n210 \u00b7 32 \u00b7 512). For the PCNs the effect of the accumulation step appears to be task dependent.\nFor instance, in k-Junta, Submod., DOM4 and DOM6 it is largely bene\ufb01cial, and in the others it\nslightly disadvantageous. Similarly, for the GCNs the accumulation step is bene\ufb01cial in k-Junta and\ndisadvantageous in COAUTH10. A possible cause is that the resulting models are not expressive\nenough due to the lack of parameters.\nComplexity analysis Consider a powerset convolutional layer (11) with nc input channels and nf\noutput channels. Using k-hop \ufb01lters, the layer is parametrized by np = nf + ncnf\nparameters (nf bias terms plus ncnf\nin the Fourier domain, i.e., h\u2217 s = F \u22121(diag( \u00afF h)F s), which requires 3\n2 n2n + 2n operations and 2n\n\ufb02oats of memory [36]. Thus, forward as well as backward pass require \u0398(ncnf n2n) operations and\n\u0398(nc2n + nf 2n + np) \ufb02oats of memory2. The hypercube graph convolutional layers are a special\ncase of powerset convolutional layers. Hence, they are in the same complexity class. A k-hop graph\nconvolutional layer requires nf + ncnf (k + 1) parameters.\n\n(cid:1)\n(cid:0)n\n(cid:1) \ufb01ltering coef\ufb01cients). Convolution is done ef\ufb01ciently\n\n(cid:80)k\n\n(cid:80)k\n\ni=0\n\ni\n\n(cid:0)n\n\ni\n\ni=0\n\n5 Related Work\n\nOur work is at the intersection of geometric deep learning, generalized signal processing and set\nfunction learning. Since each of these areas is broad, due to space limitations, we will only review\nthe work that is most closely related to ours.\nDeep learning Geometric deep learners [10] can be broadly categorized into convolution-based\napproaches [9, 12, 13, 16, 25, 55] and message-passing-based approaches [18, 47, 50]. The latter\nassign a hidden state to each element of the index domain (e.g., to each vertex in a graph) and make\nuse of a message passing protocol to learn representations in a \ufb01nite amount of communication steps.\n\n2The derivation of these results is provided in the supplementary material.\n\n8\n\n\fReference [18] points out that graph CNNs are a subclass of message passing / graph neural networks\n(MPNNs). References [9, 16, 25] utilize the spectral analysis of the graph Laplacian [51] to de\ufb01ne\ngraph convolutions, while [55] makes use of the adjacency shift based convolution [44]. Similarly,\n[12, 13] utilize group convolutions [53] with desirable equivariances.\nIn a similar vein, in this work we utilize the recently proposed powerset convolutions [36] as the\nfoundation of a generalized CNN. With respect to the latter reference, which provides the theoretical\nfoundation for powerset convolutions, our contributions are an analysis of the resulting \ufb01lters from\na pattern matching perspective, to de\ufb01ne its exact instantiations and applications in the context of\nneural networks, as well as to show that these operations are practically relevant for various tasks.\nSignal processing Set function signal processing [36] is an instantiation of algebraic signal processing\n(ASP) [38] on the powerset domain. ASP provides a theoretical framework for deriving a complete\nset of basic signal processing concepts, including convolution, for novel index domains, using as\nstarting point a chosen shift to which convolutions should be equivariant. To date the approach\nwas used for index domains including graphs [34, 44, 45], powersets (set functions) [36], meet/join\nlattices [37, 61], and a collection of more regular domains, e.g., [39, 46, 49].\nAdditionally, there are spectral approaches such as [51] for graphs and [15, 33] for set functions\n(or, equivalently, pseudo-boolean functions), that utilize analogues of the Fourier transform to port\nspectral analysis and other signal processing methods to novel domains.\nSet function learning In contrast to the set function classi\ufb01cation problems considered in this work,\nmost of existing set function learning is concerned with completing a single partially observed set\nfunction [2\u20134, 7, 11, 30, 54, 56, 63]. In this context, traditional methods [2\u20134, 11, 30, 56] mainly\ndiffer in the way how the class of considered set functions is restricted in order to be manageable.\nE.g., [54] does this by considering Walsh-Hadamard-sparse (= Fourier sparse) set functions. Recent\napproaches [7, 17, 31, 57, 60, 63] leverage deep learning. Reference [7] proposes a neural architecture\nfor learning submodular functions and [31, 63] propose architectures for learning multi-set functions\n(i.e., permutation-invariant sequence functions). References [17, 57] introduce differentiable layers\nthat allow for backpropagation through the minimizer or maximizer of a submodular optimization\nproblem respectively and, thus, for learning submodular set functions. Similarly, [60] proposes a\ndifferentiable layer for learning boolean functions.\n\n6 Conclusion\n\nWe introduced a convolutional neural network architecture for powerset data. We did so by utilizing\nnovel powerset convolutions and introducing powerset pooling layers. The powerset convolutions\nused stem from algebraic signal processing theory [38], a theoretical framework for porting signal\nprocessing to novel domains. Therefore, we hope that our method-driven approach can be used to\nspecialize deep learning to other domains as well. We conclude with challenges and future directions.\nLack of data We argue that certain success components of deep learning are domain independent\nand our experimental results empirically support this claim to a certain degree. However, one cannot\nneglect the fact that data abundance is one of these success components and, for the supervised\nlearning problems on set functions considered in this paper, one that is currently lacking.\nComputational complexity As evident from our complexity analysis and [29], the proposed method-\nology is feasible only up to about n = 30 using modern multicore systems. This is caused by the fact\nthat set functions are exponentially large objects. If one would like to scale our approach to larger\nground sets, e.g., to support semisupervised learning on graphs or hypergraphs where there is enough\ndata available, one should either devise methods to preserve the sparsity of the respective set function\nrepresentations while \ufb01ltering, pooling and applying non-linear functions, or, leverage techniques for\nNN dimension reduction like [21].\n\nAcknowledgements\n\nWe thank Max Horn for insightful discussions and his extensive feedback, and Razvan Pascanu\nfor feedback on an earlier draft. This project has received funding from the European Research\nCouncil (ERC) under the European Union\u2019s Horizon 2020 research and innovation programme (grant\nagreement No 805223).\n\n9\n\n\fReferences\n[1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving,\nM. Isard, et al. Tensor\ufb02ow: A system for large-scale machine learning. In Symp. Operating\nSystems Design and Implementation (OSDI), pages 265\u2013283, 2016.\n\n[2] A. Badanidiyuru, S. Dobzinski, H. Fu, R. Kleinberg, N. Nisan, and T. Roughgarden. Sketching\n\nvaluation functions. In Proc. Discrete Algorithms, pages 1025\u20131035. SIAM, 2012.\n\n[3] M. F. Balcan and N. J. A. Harvey. Learning submodular functions. In Proc. Theory of computing,\n\npages 793\u2013802. ACM, 2011.\n\n[4] M. F. Balcan, F. Constantin, S. Iwata, and L. Wang. Learning valuation functions. 2012.\n[5] A. R. Benson, R. Abebe, M. T. Schaub, A. Jadbabaie, and J. Kleinberg. Simplicial closure and\nhigher-order link prediction. Proc. National Academy of Sciences, 115(48):E11221\u2013E11230,\n2018.\n\n[6] C. Berge. Graphs and hypergraphs. North-Holland Pub. Co., 1973.\n[7] J. Bilmes and W. Bai. Deep submodular functions. arXiv preprint arXiv:1701.08939, 2017.\n[8] R. Branzei, D. Dimitrov, and S. Tijs. Models in cooperative game theory, volume 556. Springer\n\nScience & Business Media, 2008.\n\n[9] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geometric deep learning:\n\ngoing beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18\u201342, 2017.\n\n[10] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geometric deep learning:\n\ngoing beyond Euclidean data. IEEE Signal Processing Magazine, 34(4):18\u201342, 2017.\n\n[11] S.-S. Choi, K. Jung, and J. H. Kim. Almost tight upper bound for \ufb01nding Fourier coef\ufb01cients\nof bounded pseudo-Boolean functions. Journal of Computer and System Sciences, 77(6):\n1039\u20131053, 2011.\n\n[12] T. Cohen, M. Geiger, J. K\u00f6hler, and M. Welling. Convolutional networks for spherical signals.\n\narXiv preprint arXiv:1709.04893, 2017.\n\n[13] T. S. Cohen and M. Welling. Group equivariant convolutional networks. In Proc. International\nConference on International Conference on Machine Learning (ICML), pages 2990\u20132999, 2016.\nINFORMS Journal on\n\n[14] S. De Vries and R. V. Vohra. Combinatorial auctions: A survey.\n\ncomputing, 15(3):284\u2013309, 2003.\n\n[15] R. De Wolf. A brief introduction to Fourier analysis on the Boolean cube. Theory of Computing,\n\npages 1\u201320, 2008.\n\n[16] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs\n\nwith fast localized spectral \ufb01ltering. pages 3844\u20133852, 2016.\n\n[17] J. Djolonga and A. Krause. Differentiable learning of submodular models. In Advances in\n\nneural information processing systems (NIPS), pages 1013\u20131023, 2017.\n\n[18] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for\n\nquantum chemistry. pages 1263\u20131272, 2017.\n\n[19] D. Golovin, A. Krause, and M. Streeter. Online submodular maximization under a matroid\n\nconstraint with application to learning assignments. arXiv preprint arXiv:1407.1082, 2014.\n\n[20] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and\nY. Bengio. Generative adversarial nets. In Advances in neural information processing systems\n(NIPS), pages 2672\u20132680, 2014.\n\n[21] T. Hackel, M. Usvyatsov, S. Galliani, J. D. Wegner, and K. Schindler. Inference, learning and\nattention mechanisms that exploit and preserve sparsity in CNNs. In German Conference on\nPattern Recognition, pages 597\u2013611. Springer, 2018.\n\n[22] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.\nProc. IEEE conference on computer vision and pattern recognition, pages 770\u2013778, 2016.\n\nIn\n\n[23] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):\n\n1735\u20131780, 1997.\n\n[24] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n10\n\n\f[25] T. N. Kipf and M. Welling. Semi-supervised classi\ufb01cation with graph convolutional networks.\n\n2017.\n\n[26] A. Krause and D. Golovin. Submodular function maximization. In Tractability: Practical\n\nApproaches to Hard Problems, pages 71\u2013104. Cambridge University Press, 2014.\n\n[27] A. Krause and C. Guestrin. Near-optimal observation selection using submodular functions. In\n\nAAAI, volume 7, pages 1650\u20131654, 2007.\n\n[28] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classi\ufb01cation with deep convolutional\nneural networks. In Advances in neural information processing systems (NIPS), pages 1097\u2013\n1105, 2012.\n\n[29] Y. Lu. Practical tera-scale Walsh-Hadamard transform. In Future Technologies Conference\n\n(FTC), pages 1230\u20131236. IEEE, 2016.\n\n[30] E. Mossel, R. O\u2019Donnell, and R. P. Servedio. Learning juntas. In Proc. Theory of computing,\n\npages 206\u2013212. ACM, 2003.\n\n[31] R. L. Murphy, B. Srinivasan, V. Rao, and B. Ribeiro.\n\nJanossy pooling: Learning deep\npermutation-invariant functions for variable-size inputs. arXiv preprint arXiv:1811.01900,\n2018.\n\n[32] V. Nair and G. E. Hinton. Recti\ufb01ed linear units improve restricted Boltzmann machines. In\n\nProc. International conference on machine learning (ICML), pages 807\u2013814, 2010.\n\n[33] R. O\u2019Donnell. Analysis of Boolean functions. Cambridge University Press, 2014.\n[34] A. Ortega, P. Frossard, J. Kova\u02c7cevi\u00b4c, J. M. F. Moura, and P. Vandergheynst. Graph signal\n\nprocessing: Overview, challenges, and applications. Proc. IEEE, 106(5):808\u2013828, 2018.\n\n[35] A. Osokin and D. P. Vetrov. Submodular relaxation for inference in Markov random \ufb01elds.\n\nIEEE Trans. pattern analysis and machine intelligence, 37(7):1347\u20131359, 2014.\n\n[36] M. P\u00fcschel. A discrete signal processing framework for set functions. In Proc. International\n\nConference on Acoustics, Speech and Signal Processing (ICASSP), pages 4359\u20134363, 2018.\n\n[37] M. P\u00fcschel. A discrete signal processing framework for meet/join lattices with applications\nto hypergraphs and trees. In Proc. International Conference on Acoustics, Speech and Signal\nProcessing (ICASSP), pages 5371\u20135375, 2019.\n\n[38] M. P\u00fcschel and J. M. F. Moura. Algebraic signal processing theory: Foundation and 1-D time.\n\nIEEE Trans. Signal Processing, 56(8):3572\u20133585, 2008.\n\n[39] M. P\u00fcschel and M. R\u00f6tteler. Algebraic signal processing theory: 2-D hexagonal spatial lattice.\n\nIEEE Trans. Image Processing, 16(6):1506\u20131521, 2007.\n\n[40] S. Ravanbakhsh, J. Schneider, and B. Poczos. Equivariance through parameter-sharing. In\n\nProc. International Conference on Machine Learning (ICML), pages 2892\u20132901, 2017.\n\n[41] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with\nregion proposal networks. In Advances in neural information processing systems (NIPS), pages\n91\u201399, 2015.\n\n[42] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical\nimage segmentation. In Proc. International Conference on Medical image computing and\ncomputer-assisted intervention, pages 234\u2013241, 2015.\n\n[43] F. Rosenblatt. Principles of neurodynamics. perceptrons and the theory of brain mechanisms.\n\nTechnical report, Cornell Aeronautical Lab Inc Buffalo NY, 1961.\n\n[44] A. Sandryhaila and J. M. F. Moura. Discrete signal processing on graphs. IEEE Trans. Signal\n\nProcessing, 61(7):1644\u20131656, 2013.\n\n[45] A. Sandryhaila and J. M. F. Moura. Discrete signal processing on graphs: Frequency analysis.\n\nIEEE Trans. Signal Processing, 62(12):3042\u20133054, 2014.\n\n[46] A. Sandryhaila, J. Kovacevic, and M. P\u00fcschel. Algebraic signal processing theory: 1-D\n\nnearest-neighbor models. IEEE Trans. on Signal Processing, 60(5):2247\u20132259, 2012.\n\n[47] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural\n\nnetwork model. IEEE Trans. Neural Networks, 20(1):61\u201380, 2009.\n\n11\n\n\f[48] R. Scheibler, S. Haghighatshoar, and M. Vetterli. A fast Hadamard transform for signals with\nsublinear sparsity in the transform domain. IEEE Trans. Information Theory, 61(4):2115\u20132132,\n2015.\n\n[49] B. Seifert and K. H\u00fcper. The discrete cosine transform on triangles. In Proc. International\n\nConference on Acoustics, Speech and Signal Processing (ICASSP), pages 5023\u20135026, 2019.\n\n[50] D. Selsam, M. Lamm, B. B\u00fcnz, P. Liang, L. de Moura, and D. L. Dill. Learning a SAT solver\n\nfrom single-bit supervision. arXiv preprint arXiv:1802.03685, 2018.\n\n[51] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst. The emerging \ufb01eld\nof signal processing on graphs: Extending high-dimensional data analysis to networks and other\nirregular domains. IEEE Trans. Signal Processing, 30(3):83\u201398, 2013.\n\n[52] M. Simonovsky and N. Komodakis. Graphvae: Towards generation of small graphs using\nvariational autoencoders. In International Conference on Arti\ufb01cial Neural Networks, pages\n412\u2013422, 2018.\n\n[53] R. S. Stankovic, C. Moraga, and J. Astola. Fourier analysis on \ufb01nite groups with applications\n\nin signal processing and system design. John Wiley & Sons, 2005.\n\n[54] P. Stobbe and A. Krause. Learning Fourier sparse set functions. In Arti\ufb01cial Intelligence and\n\nStatistics, pages 1125\u20131133, 2012.\n\n[55] F. P. Such, S. Sah, M. A. Dominguez, S. Pillai, C. Zhang, A. Michael, N. D. Cahill, and\nR. Ptucha. Robust spatial \ufb01ltering with graph convolutional neural networks. IEEE Journal of\nSelected Topics in Signal Processing, 11(6):884\u2013896, 2017.\n\n[56] A. M. Sutton, L. D. Whitley, and A. E. Howe. Computing the moments of k-bounded pseudo-\nBoolean functions over Hamming spheres of arbitrary radius in polynomial time. Theoretical\nComputer Science, 425:58\u201374, 2012.\n\n[57] S. Tschiatschek, A. Sahin, and A. Krause. Differentiable submodular maximization.\n\nIn\nProc. International Joint Conference on Arti\ufb01cial Intelligence, pages 2731\u20132738. AAAI Press,\n2018.\n\n[58] A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner,\nA. W. Senior, and K. Kavukcuoglu. WaveNet: A generative model for raw audio. SSW, 125,\n2016.\n\n[59] H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo. Graphgan:\nGraph representation learning with generative adversarial nets. In Conference on Arti\ufb01cial\nIntelligence, 2018.\n\n[60] P.-W. Wang, P. Donti, B. Wilder, and Z. Kolter. SATNet: Bridging deep learning and logical\nreasoning using a differentiable satis\ufb01ability solver. In International Conference on Machine\nLearning, pages 6545\u20136554, 2019.\n\n[61] C. Wendler and M. P\u00fcschel. Sampling signals on Meet/Join lattices. In Proc. Global Conference\n\non Signal and Information Processing (GlobalSIP), 2019.\n\n[62] T. Young, D. Hazarika, S. Poria, and E. Cambria. Recent trends in deep learning based natural\n\nlanguage processing. IEEE Computational intelligence magazine, 13(3):55\u201375, 2018.\n\n[63] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola. Deep\nsets. In Advances in neural information processing systems (NIPS), pages 3391\u20133401, 2017.\n\n12\n\n\f", "award": [], "sourceid": 526, "authors": [{"given_name": "Chris", "family_name": "Wendler", "institution": "ETH Zurich"}, {"given_name": "Markus", "family_name": "P\u00fcschel", "institution": "ETH Zurich"}, {"given_name": "Dan", "family_name": "Alistarh", "institution": "IST Austria & NeuralMagic"}]}