{"title": "Unsupervised Deep Haar Scattering on Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 1709, "page_last": 1717, "abstract": "The classification of high-dimensional data defined on graphs is particularly difficult when the graph geometry is unknown. We introduce a Haar scattering transform on graphs, which computes invariant signal descriptors. It is implemented with a deep cascade of additions, subtractions and absolute values, which iteratively compute orthogonal Haar wavelet transforms. Multiscale neighborhoods of unknown graphs are estimated by minimizing an average total variation, with a pair matching algorithm of polynomial complexity. Supervised classification with dimension reduction is tested on data bases of scrambled images, and for signals sampled on unknown irregular grids on a sphere.", "full_text": "Unsupervised Deep Haar Scattering on Graphs\n\nXu Chen1,2, Xiuyuan Cheng2, and St\u00b4ephane Mallat2\n\n1Department of Electrical Engineering, Princeton University, NJ, USA\n2D\u00b4epartement d\u2019Informatique, \u00b4Ecole Normale Sup\u00b4erieure, Paris, France\n\nAbstract\n\nThe classi\ufb01cation of high-dimensional data de\ufb01ned on graphs is particularly dif\ufb01-\ncult when the graph geometry is unknown. We introduce a Haar scattering trans-\nform on graphs, which computes invariant signal descriptors. It is implemented\nwith a deep cascade of additions, subtractions and absolute values, which itera-\ntively compute orthogonal Haar wavelet transforms. Multiscale neighborhoods of\nunknown graphs are estimated by minimizing an average total variation, with a\npair matching algorithm of polynomial complexity. Supervised classi\ufb01cation with\ndimension reduction is tested on data bases of scrambled images, and for signals\nsampled on unknown irregular grids on a sphere.\n\n1\n\nIntroduction\n\nThe geometric structure of a data domain can be described with a graph [11], where neighbor data\npoints are represented by vertices related by an edge. For sensor networks, this connectivity depends\nupon the sensor physical locations, but in social networks it may correspond to strong interactions\nor similarities between two nodes. In many applications, the connectivity graph is unknown and\nmust therefore be estimated from data. We introduce an unsupervised learning algorithm to classify\nsignals de\ufb01ned on an unknown graph.\nAn important source of variability on graphs results from displacement of signal values. It may\nbe due to movements of physical sources in a sensor network, or to propagation phenomena in so-\ncial networks. Classi\ufb01cation problems are often invariant to such displacements.\nImage pattern\nrecognition or characterization of communities in social networks are examples of invariant prob-\nlems. They require to compute locally or globally invariant descriptors, which are suf\ufb01ciently rich\nto discriminate complex signal classes.\nSection 2 introduces a Haar scattering transform which builds an invariant representation of graph\ndata, by cascading additions, subtractions and absolute values in a deep network. It can be factor-\nized as a product of Haar wavelet transforms on the graph. Haar wavelet transforms are \ufb02exible\nrepresentations which characterize multiscale signal patterns on graphs [6, 10, 11]. Haar scatter-\ning transforms are extensions on graphs of wavelet scattering transforms, previously introduced for\nuniformly sampled signals [1].\nFor unstructured signals de\ufb01ned on an unknown graph, recovering the full graph geometry is an\nNP complete problem. We avoid this complexity by only learning connected multiresolution graph\napproximations. This is suf\ufb01cient to compute Haar scattering representations. Multiscale neigh-\nborhoods are calculated by minimizing an average total signal variation over training examples.\nIt involves a pair matching algorithm of polynomial complexity. We show that this unsupervised\nlearning algorithms computes sparse scattering representations.\n\nThis work was supported by the ERC grant InvariantClass 320959.\n\n1\n\n\fFigure 1: A Haar scattering network computes each coef\ufb01cient of a layer Sj+1x by adding or subtracting a\npair of coef\ufb01cients in the previous layer Sjx.\n\nFor classi\ufb01cation, the dimension of unsupervised Haar scattering representations are reduced with\nsupervised partial least square regressions [12]. It amounts to computing a last layer of reduced\ndimensionality, before applying a Gaussian kernel SVM classi\ufb01er. The performance of a Haar scat-\ntering classi\ufb01cation is tested on scrambled images, whose graph geometry is unknown. Results\nare provided for MNIST and CIFAR-10 image data bases. Classi\ufb01cation experiments are also per-\nformed on scrambled signals whose samples are on an irregular grid of a sphere. All computations\ncan be reproduced with a software available at www.di.ens.fr/data/scattering/haar.\n\n2 Orthogonal Haar Scattering on a Graph\n\n2.1 Deep Networks of Permutation Invariant Operators\nWe consider signals x de\ufb01ned on an unweighted graph G = (V, E), with V = {1, ..., d}. Edges\nrelate neighbor vertices. We suppose that d is a power of 2 to simplify explanations. A Haar\nscattering is calculated by iteratively applying the following permutation invariant operator\n\nIts values are not modi\ufb01ed by a permutation of \u03b1 and \u03b2, and both values are recovered by\n\n(\u03b1, \u03b2) \u2212\u2192 (\u03b1 + \u03b2,|\u03b1 \u2212 \u03b2|) .\n\n(cid:0)\u03b1 + \u03b2 + |\u03b1 \u2212 \u03b2|(cid:1) and min(\u03b1, \u03b2) =\n\nmax(\u03b1, \u03b2) =\n\n1\n2\n\n(cid:0)\u03b1 + \u03b2 \u2212 |\u03b1 \u2212 \u03b2|(cid:1) .\n\n1\n2\n\n(1)\n\n(2)\n\nAn orthogonal Haar scattering transform computes progressively more invariant signal descriptors\nby applying this invariant operator at multiple scales. This is implemented along a deep network\nillustrated in Figure 1. The network layer j is a two-dimensional array Sjx(n, q) of d = 2\u2212jd \u00d7 2j\ncoef\ufb01cients, where n is a node index and q is a feature type.\nThe input network layer is S0x(n, 0) = x(n). We compute Sj+1x by regrouping the 2\u2212jd nodes\nof Sjx in 2\u2212j\u22121d pairs (an, bn), and applying the permutation invariant operator (1) to each pair\n(Sjx(an, q), Sjx(bn, q)):\n\nSj+1x(n, 2q) = Sjx(an, q) + Sjx(bn, q)\n\n(3)\n\nSj+1x(n, 2q + 1) = |Sjx(an, q) \u2212 Sjx(bn, q)| .\n\nand\n(4)\nThis transform is iterated up to a maximum depth J \u2264 log2(d).\nIt computes SJ x with Jd/2\nadditions, subtractions and absolute values. Since Sjx \u2265 0 for j > 0, one can put an absolute value\non the sum in (3) without changing Sj+1x. It results that Sj+1x is calculated from the previous\nlayer Sjx by applying a linear operator followed by a non-linearity as in most deep neural network\narchitectures. In our case this non-linearity is an absolute value as opposed to recti\ufb01ers used in most\ndeep networks [4].\nFor each n, the 2j scattering coef\ufb01cients {Sjx(n, q)}0\u2264q<2j are calculated from the values of x in a\nvertex set Vj,n of size 2j. One can verify by induction on (3) and (4) that V0,n = {n} for 0 \u2264 n < d,\nand for any j \u2265 0\n(5)\n\nVj+1,n = Vj,an \u222a Vj,bn .\n\n2\n\nxS1xS2xS3xv\f(a)\n\n(b)\n\nFigure 2: A connected multiresolution is a partition of vertices with embedded connected sets Vj,n of size\n2j. (a): Example of partition for the graph of a square image grid, for 1 \u2264 j \u2264 3. (b): Example on an irregular\ngraph.\n\nThe embedded subsets {Vj,n}j,n form a multiresolution approximation of the vertex set V . At each\nscale 2j, different pairings (an, bn) de\ufb01ne different multiresolution approximations. A small graph\ndisplacement propagates signal values from a node to its neighbors. To build nearly invariant repre-\nsentations over such displacements, a Haar scattering transform must regroup connected vertices. It\nis thus computed over multiresolution vertex sets Vj,n which are connected in the graph G. It results\nfrom (5) that a necessary and suf\ufb01cient condition is that each pair (an, bn) regroups two connected\nsets Vj,an and Vj,bn.\nFigure 2 shows two examples of connected multiresolution approximations. Figure 2(a) illustrates\nthe graph of an image grid, where pixels are connected to 8 neighbors. In this example, each Vj+1,n\nregroups two subsets Vj,an and Vj,bn which are connected horizontally if j is even and connected\nvertically if j is odd. Figure 2(b) illustrates a second example of connected multiresolution approx-\nimation on an irregular graph. There are many different connected multiresolution approximations\nresulting from different pairings at each scale 2j. Different multiresolution approximations corre-\nspond to different Haar scattering transforms. In the following, we compute several Haar scattering\ntransforms of a signal x, by de\ufb01ning different multiresolution approximations.\nThe following theorem proves that a Haar scattering preserves the norm and that it is contractive up\nto a normalization factor 2j/2. The contraction is due to the absolute value which suppresses the\nsign and hence reduces the amplitude of differences. The proof is in Appendix A.\nTheorem 2.1. For any j \u2265 0, and any x, x(cid:48) de\ufb01ned on V\n\nand\n\n(cid:107)Sjx \u2212 Sjx\n\n(cid:48)(cid:107) \u2264 2j/2(cid:107)x \u2212 x\n\n(cid:48)(cid:107) ,\n\n(cid:107)Sjx(cid:107) = 2j/2(cid:107)x(cid:107) .\n\n2.2\n\nIterated Haar Wavelet Transforms\n\nWe show that a Haar scattering transform can be written as a cascade of orthogonal Haar wavelet\ntransforms and absolute value non-linearities. It is a particular example of scattering transforms in-\ntroduced in [1]. It computes coef\ufb01cients measuring signal variations at multiple scales and multiple\norders. We prove that the signal can be recovered from Haar scattering coef\ufb01cients computed over\nenough multiresolution approximations.\nA scattering operator is contractive because of the absolute value. When coef\ufb01cients have an arbi-\ntrary sign, suppressing the sign reduces by a factor 2 the volume of the signal space. We say that\nSJ x(n, q) is a coef\ufb01cient of order m if its computation includes m absolute values of differences.\nThe amplitude of scattering coef\ufb01cients typically decreases exponentially when the scattering order\nm increases, because of the contraction produced by the absolute value. We verify from (3) and (4)\n\n3\n\nV1,nV2,nV3,nV1,nV2,nV3,n\fthat SJ x(n, q) is a coef\ufb01cient of order m = 0 if q = 0 and of order m > 0 if\n\nIt results that there are(cid:0) J\n\nm\n\nq =\n\n2J\u2212jk for 0 \u2264 jk < jk+1 \u2264 J .\n\n(cid:1)2\u2212J d coef\ufb01cients SJ x(n, q) of order m.\n\nm(cid:88)\n\nk=1\n\nWe now show that Haar scattering coef\ufb01cients of order m are obtained by cascading m orthogonal\nHaar wavelet tranforms de\ufb01ned on the graph G. A Haar wavelet at a scale 2J is de\ufb01ned over each\nVj,n = Vj\u22121,an \u222a Vj\u22121,bn by\nFor any J \u2265 0, one can verify [10, 6] that\n\n\u2212 1Vj\u22121,bn .\n\n\u03c8j,n = 1Vj\u22121,an\n\n{1VJ,n}0\u2264n<2\u2212J d \u222a {\u03c8j,n}0\u2264n<2\u2212j d,0\u2264j