{"title": "Structure-Aware Convolutional Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 11, "page_last": 20, "abstract": "Convolutional neural networks (CNNs) are inherently subject to invariable filters that can only aggregate local inputs with the same topological structures. It causes that CNNs are allowed to manage data with Euclidean or grid-like structures (e.g., images), not ones with non-Euclidean or graph structures (e.g., traffic networks). To broaden the reach of CNNs, we develop structure-aware convolution to eliminate the invariance, yielding a unified mechanism of dealing with both Euclidean and non-Euclidean structured data. Technically, filters in the structure-aware convolution are generalized to univariate functions, which are capable of aggregating local inputs with diverse topological structures. Since infinite parameters are required to determine a univariate function, we parameterize these filters with numbered learnable parameters in the context of the function approximation theory. By replacing the classical convolution in CNNs with the structure-aware convolution, Structure-Aware Convolutional Neural Networks (SACNNs) are readily established. Extensive experiments on eleven datasets strongly evidence that SACNNs outperform current models on various machine learning tasks, including image classification and clustering, text categorization, skeleton-based action recognition, molecular activity detection, and taxi flow prediction.", "full_text": "Structure-Aware Convolutional Neural Networks\n\nJianlong Chang1,2\n\nJie Gu1,2\n\nShiming Xiang1,2\n\n1NLPR, Institute of Automation, Chinese Academy of Sciences\n\n2School of Arti\ufb01cial Intelligence, University of Chinese Academy of Sciences\n\n{jianlong.chang, jie.gu, lfwang, gfmeng, smxiang, chpan}@nlpr.ia.ac.cn\n\nLingfeng Wang1\n\nChunhong Pan1\n\nGaofeng Meng1\n\nAbstract\n\nConvolutional neural networks (CNNs) are inherently subject to invariable \ufb01lters\nthat can only aggregate local inputs with the same topological structures. It causes\nthat CNNs are allowed to manage data with Euclidean or grid-like structures (e.g.,\nimages), not ones with non-Euclidean or graph structures (e.g., traf\ufb01c networks). To\nbroaden the reach of CNNs, we develop structure-aware convolution to eliminate\nthe invariance, yielding a uni\ufb01ed mechanism of dealing with both Euclidean and\nnon-Euclidean structured data. Technically, \ufb01lters in the structure-aware convolu-\ntion are generalized to univariate functions, which are capable of aggregating local\ninputs with diverse topological structures. Since in\ufb01nite parameters are required\nto determine a univariate function, we parameterize these \ufb01lters with numbered\nlearnable parameters in the context of the function approximation theory. By re-\nplacing the classical convolution in CNNs with the structure-aware convolution,\nStructure-Aware Convolutional Neural Networks (SACNNs) are readily estab-\nlished. Extensive experiments on eleven datasets strongly evidence that SACNNs\noutperform current models on various machine learning tasks, including image\nclassi\ufb01cation and clustering, text categorization, skeleton-based action recognition,\nmolecular activity detection, and taxi \ufb02ow prediction.\n\n1\n\nIntroduction\n\nConvolutional neural networks (CNNs) provide an effective and ef\ufb01cient framework to deal with\nEuclidean structured data, including speeches and images. As a core module in CNNs, the convolution\nunit explicitly allows to share parameters among the whole spatial domains to extremely reduce the\nnumber of parameters, without sacri\ufb01cing the expressive capability of networks [3]. Bene\ufb01ting from\nsuch artful modeling, signi\ufb01cant successes have been achieved in a multitude of \ufb01elds, including the\nimage classi\ufb01cation [15, 24] and clustering [5, 6], the object detection [9, 32], and amongst others.\nAlthough the achievements in the literature are brilliant, CNNs are still incompetent to handle non-\nEuclidean structured data, such as the traf\ufb01c \ufb02ow data on traf\ufb01c networks, the relational data on\nsocial networks, and the active data on molecule structure networks. The major limitation originates\nfrom that the classical \ufb01lters are invariant at each location. As a result, the \ufb01lters can only be applied\nto aggregate local inputs with the same topological structures, not with diverse topological structures.\nIn order to eliminate the limitation, we develop structure-aware convolution in which a single share-\nable \ufb01lter suf\ufb01ces to aggregate local inputs with diverse topological structures. For this purpose, we\ngeneralize the classical \ufb01lters to univariate functions that can be effectively and ef\ufb01ciently parameter-\nized under the guidance of the function approximation theory. Then, we introduce local structure\nrepresentations to quanti\ufb01cationally encode topological structures. By modeling these representations\ninto the generalized \ufb01lters, the corresponding local inputs can be aggregated based on the generalized\n\ufb01lters consequently. In practice, Structure-Aware Convolutional Neural Networks (SACNNs) can\nbe readily established by replacing the classical convolution in CNNs with our structure-aware\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fconvolution. Since all the operations in our structure-aware convolution are differentiable, SACNNs\ncan be trained end-to-end by the standard back-propagation.\nTo sum up, the key contributions of this paper are:\n\ncapability of capturing the latent structures of data in a purely data-driven way.\n\n\u2022 The structure-aware convolution is developed to establish SACNNs to uniformly deal with\nboth Euclidean and non-Euclidean structured data, which broadens the reach of convolution.\n\u2022 We introduce the learnable local structure representations, which endow SACNNs with the\n\u2022 By taking advantage of the function approximation theory, SACNNs can be effectively and\n\u2022 Extensive experiments demonstrate that SACNNs are superior to current models in various\n\nef\ufb01ciently trained with the standard back-propagation to guarantee the practicability.\n\nmachine learning tasks, including classi\ufb01cation, clustering, and regression.\n\n2 Related work\n\n2.1 Convolutional neural networks (CNNs)\n\nTo elevate the performance of CNNs, much research has been devoted to designing the convolution\nunits, which can be roughly divided into two classes, i.e., handcrafted and learnable ones.\nHandcrafted convolution units generally derive from the professional knowledge. Primary convolution\nunits [24, 26] present large sizes, e.g., 7 \u00d7 7 pixels in images. To increase the nonlinearity, stacking\nmultiple small \ufb01lters (e.g., 3 \u00d7 3 pixels) instead of using a single large \ufb01lter has become a common\ndesign in CNNs [38]. To obtain larger receptive \ufb01elds, the dilated convolution [41], whose receptive\n\ufb01eld size grows exponentially while the number of parameters grows linearly, is proposed. In addition,\nthe separable convolution [7] promotes performance by integrating various \ufb01lters with diverse sizes.\nAmong the latter, lots of efforts have been widely made to learn convolution units. By introducing\nadditional parameters named offsets, the active convolution [19] is explored to learn the shape of\nconvolution. To achieve dynamic offsets that vary with inputs, the deformable convolution [9] is\nproposed. Contrary to such modi\ufb01cations, some approaches have been devoted to directly capturing\nstructures of data to improve the performance of CNNs, such as the spatial transform networks [18].\nWhile these models have been successful on Euclidean domains, they can hardly be applied to\nnon-Euclidean domains. In contrast, our SACNNs can be utilized on these two domains uniformly.\n\n2.2 Graph convolutional neural networks (GCNNs)\n\nRecently, there has been a growing interest in applying CNNs to non-Euclidean domains [3, 29, 31,\n35]. Generally, existing methods can be summarized into two types, i.e., spectral and spatial methods.\nSpectral methods explore an analogical convolution operator over non-Euclidean domains on the basis\nof the spectral graph theory [4, 16, 27]. Relying on the eigenvectors of graph Laplacian, data with non-\nEuclidean structures can be \ufb01ltered on the corresponding spectral domain. To enhance the ef\ufb01ciency\nand acquire spectrum-free methods without performing eigen-decomposition, polynomial-based\nnetworks are developed to execute convolution on non-Euclidean domains ef\ufb01ciently [10, 22].\nContrary to the spectral methods, spatial methods always analogize the convolutional strategy based\non the local spatial \ufb01ltering [1, 2, 30, 31, 37, 40]. The major difference between these methods lies in\nthe intrinsic coordinate systems used for encoding local patches. Typically, the diffusion CNNs [1]\nencode local patches based on the random walk process on graphs, the anisotropic CNNs [2] employ\nan anisotropic patch-extraction method, and the geodesic CNNs [30] represent local patches in polar\ncoordinates. In the mixture-model CNNs [31], synthetically, learnable local pseudo-coordinates are\ndeveloped to parameterize local patches in a general way. Additionally, a series of spatial methods\nwithout the classical convolutional strategy have also been explored, including the message passing\nneural networks [12, 28, 34], and the graph attention networks [39].\nIn spite of considerable achievements, both spectral and spatial methods partially rely on \ufb01xed\nstructures (i.e., \ufb01xed relationship matrix) in graphs. Bene\ufb01ting from the proposed structure-aware\nconvolution, by comparison, the structures can be learned from data automatically in our SACNNs.\n\n2\n\n\f3 Structure-aware convolution\n\nConvolution, intrinsically, is an aggregation operation between local inputs and \ufb01lters. In practice,\nlocal inputs involve not only their input values but also topological structures. Accordingly, \ufb01lters\nshould be in a position to aggregate local inputs with diverse topological structures. To this end, we\ndevelop the structure-aware convolution by generalizing the \ufb01lters in the classical convolution and\nmodeling the local structure information into the generalized \ufb01lters.\nThe \ufb01lters in the classical convolution can be smoothly generalized to univariate functions. Without\nloss of generality and for simplicity, we elaborate such generalization with 1-Dimensional data. Given\nan input x \u2208 Rn and a \ufb01lter w \u2208 R2m\u22121, the output at the i-th vertex (location) is\ni \u2208 {1, 2,\u00b7\u00b7\u00b7 , n},\n\n(1)\nwhere xi = [xi\u2212m+1,\u00b7\u00b7\u00b7 , xi+m\u22121]T is the local input at the i-th vertex, i\u2212m < j < i+m indicates\nthat the j-th vertex is a neighbor of the i-th vertex, wj\u2212i+m and xj signify the (j \u2212 i + m)-th and\nj-th elements in w and x, respectively. For any univariate function f (\u00b7), Eq. (1) can be equivalently\nrewritten as follows when f (j \u2212 i + m) = wj\u2212i+m is always satis\ufb01ed, i.e.,\n\nwj\u2212i+m \u00b7 xj,\n\n\u00afyi = wTxi =\n\ni\u2212m