{"title": "Solid Harmonic Wavelet Scattering: Predicting Quantum Molecular Energy from Invariant Descriptors of 3D  Electronic Densities", "book": "Advances in Neural Information Processing Systems", "page_first": 6540, "page_last": 6549, "abstract": "We introduce a solid harmonic wavelet scattering representation, invariant  to rigid motion and stable to deformations, for regression and classification  of 2D and 3D signals. Solid harmonic wavelets are computed by multiplying solid  harmonic functions with Gaussian windows dilated at different scales. Invariant  scattering coefficients are obtained by cascading such wavelet transforms with  the complex modulus nonlinearity. We study an application of solid harmonic  scattering invariants to the estimation of quantum molecular energies, which  are also invariant to rigid motion and stable with respect to deformations. A multilinear regression  over scattering invariants provides close to state of the art results over  small and large databases of organic molecules.", "full_text": "Solid Harmonic Wavelet Scattering: Predicting\n\nQuantum Molecular Energy from Invariant\n\nDescriptors of 3D Electronic Densities\n\nMichael Eickenberg\n\nDepartment of computer science\n\nEcole normale sup\u00e9rieure\n\nGeorgios Exarchakis\n\nDepartment of computer science\n\nEcole normale sup\u00e9rieure\n\nPSL Research University, 75005 Paris, France\n\nPSL Research University, 75005 Paris, France\n\nmichael.eickenberg@nsup.org\n\ngeorgios.exarchakis@ens.fr\n\nMatthew Hirn\n\nDepartment of Computational Mathematics,\n\nScience and Engineering;\nDepartment of Mathematics\nMichigan State University\n\nEast Lansing, MI 48824, USA\n\nmhirn@msu.edu\n\nSt\u00e9phane Mallat\nColl\u00e8ge de France\n\nEcole Normale Sup\u00e9rieure\nPSL Research University\n\n75005 Paris, France\n\nAbstract\n\nWe introduce a solid harmonic wavelet scattering representation, invariant to\nrigid motion and stable to deformations, for regression and classi\ufb01cation of 2D\nand 3D signals. Solid harmonic wavelets are computed by multiplying solid\nharmonic functions with Gaussian windows dilated at different scales. Invariant\nscattering coef\ufb01cients are obtained by cascading such wavelet transforms with\nthe complex modulus nonlinearity. We study an application of solid harmonic\nscattering invariants to the estimation of quantum molecular energies, which are\nalso invariant to rigid motion and stable with respect to deformations. A multilinear\nregression over scattering invariants provides close to state of the art results over\nsmall and large databases of organic molecules.\n\n1\n\nIntroduction\n\nDeep convolutional neural networks provide state of the art results over most classi\ufb01cation and\nregression problems when there is enough training data. The convolutional architecture builds a\nrepresentation which translates when the input is translated. It can compute invariants to translations\nwith a global spatial pooling operator such as averaging or max pooling. A major issue is to understand\nif one can reduce the amount of training data, by re\ufb01ning the architecture or specifying network\nweights, from prior information on the classi\ufb01cation or regression problem. Beyond translation\ninvariance, such prior information can be provided by invariance over other known groups of\ntransformations.\nThis paper studies the construction of generic translation and rotation invariant representations for\nany 2D and 3D signals, and their application. Rotation invariant representations have been developed\nfor 2D images, for instance in [20], where a descriptor based on oriented wavelets was used to create\na jointly translation and rotation-invariant representation of texture images which retained all identity\ninformation necessary for classi\ufb01cation. These representations have not been extended to 3D because\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fan oriented wavelet representation in 3D requires covering the unit sphere instead of the unit circle\nleading to much heavier computational requirements.\nSection 2 introduces a 2D or 3D rotation invariant representation calculated with a cascade of\nconvolutions with spherical harmonic wavelets, and modulus non-linearities. Invariance to rotations\nresults from speci\ufb01c properties of spherical harmonics, which leads to ef\ufb01cient computations. A\nwavelet scattering can be implemented as a deep convolutional network where all \ufb01lters are prede\ufb01ned\nby the wavelet choice [13]. In that case, prior information on invariants fully speci\ufb01es the network\nweights. Besides translation and rotation invariance, such scattering representations linearize small\ndeformations. Invariants to small deformations are thus obtained with linear operators applied to\nscattering coef\ufb01cients, and scattering coef\ufb01cients can provide accurate regressions of functions which\nare stable to deformations.\nTranslation and rotation invariance is often encountered in physical functionals. For example energies\nof isolated physical systems are usually translation and rotation invariant, and are stable to small\ndeformations. This paper concentrates on applications to computations of quantum energies of organic\nmolecules. Computing the energy of a molecule given the charges and the relative positions of the\nnuclei is a fundamental topic in computational chemistry. It has considerable industrial applications,\nfor example to test and design materials or pharmaceuticals [4]. Density functional theory is currently\nthe most ef\ufb01cient numerical technique to compute approximate values of quantum energies, but it\nrequires considerable amounts of calculations which limit the size of molecules and the number\nof tests. Machine learning methods have gained traction to estimate quantum molecular energies\nfrom existing quantum chemistry databases, because they require much less computation time after\ntraining.\nState of the art learning approaches have been adapted to the speci\ufb01cities of the underlying physics.\nBest results on large databases are obtained with deep neural networks whose architectures are\ntailored to this quantum chemistry problem. Numerical experiments in Section 4 show that applying\na standard multilinear regression on our generic 3D invariant solid harmonic scattering representation\nyields nearly state of the art results compared to all methods, including deep neural networks, and on\nboth small and large databases.\n\n2 Solid harmonic wavelet scattering\n\nWavelet scattering transforms have been introduced to de\ufb01ne representations which are invariant\nto translations and Lipschitz continuous to deformations [12]. In two dimensions they have been\nextended to de\ufb01ne rotationally invariant representations [20] but in 3D this approach requires covering\nthe unit sphere with multiple oriented wavelets (as opposed to the unit circle in 2D), which requires\ntoo much computation. This section introduces a solid harmonic wavelet scattering transform whose\nrotation invariance results from symmetries of solid harmonics. In contrast to oriented wavelets,\nevery solid harmonic wavelet can yield its own rotation invariant descriptor because it operates in a\nrotational frequency space.\n\n2.1 Solid harmonics in 2D and 3D\n\nSolid harmonics are solutions of the Laplace equation \u2206f = 0, usually expressed in spherical coordi-\nnates, where the Laplacian is the sum of unmixed second derivatives. In 2D, interpreting R2 as the\ncomplex plane, we \ufb01nd that z (cid:55)\u2192 z(cid:96) is a solution for all (cid:96) \u2208 N due to its holomorphicity1. Expressing\nthis solution in polar coordinates gives (r, \u03d5) (cid:55)\u2192 r(cid:96)ei(cid:96)\u03d5, revealing an (cid:96)th- order polynomial in radius\nand a so-called circular harmonic with (cid:96) angular oscillations per circle.\nSolving the Laplace equation in 3D spherical coordinates (r, \u03d1, \u03d5) gives rise to spherical harmonics,\nthe eigenvectors of the Laplacian on the sphere. Imposing separability of azimuthal and elevation\ncontributions yields the functions Y m\nis an associated\n, for (cid:96) \u2265 0 and \u2212(cid:96) \u2264 m \u2264 (cid:96). They form an\nLegendre polynomial and C((cid:96), m) =\northogonal basis of L2 functions on the sphere. Analogously to the 2D case, 3D solid harmonics are\n\n(cid:113) (2(cid:96)+1)((cid:96)\u2212m)!\n\n4\u03c0((cid:96)+m)!\n\n(cid:96) (\u03d1, \u03d5) = C((cid:96), m)P m\n\n(cid:96) (cos \u03d1)eim\u03d5, where P m\n\n(cid:96)\n\n1Real and imaginary parts of holomorphic functions are harmonic - their Laplacian is 0\n\n2\n\n\fthen de\ufb01ned as\n\n2.2 Solid harmonic wavelets\n\n(cid:114) 4\u03c0\n\n2(cid:96) + 1\n\n(r, \u03d1, \u03d5) (cid:55)\u2192\n\nr(cid:96)Y m\n\n(cid:96) (\u03d1, \u03d5).\n\nWe now de\ufb01ne solid harmonic wavelets in 2D and 3D. A wavelet \u03c8(u) is a spatial \ufb01lter with zero\nsum, which is localized around the origin in the sense that it has a fast decay along (cid:107)u(cid:107). Let\n\u03c8j(u) = 2\u2212dj\u03c8(2\u2212ju) be a normalized dilation of \u03c8 by 2j in dimension d. A multiscale wavelet\ntransform of a signal \u03c1(u) computes convolutions with these dilated wavelets at all scales 2j to obtain\nthe set of wavelet coef\ufb01cients {\u03c1 (cid:63) \u03c8j(u)}j\u2208Z . They are translation covariant. Let us denote by \u02c6\u03c1(\u03c9)\nthe Fourier transform of \u03c1(u). The Fourier transforms of these convolutions are \u02c6\u03c1(\u03c9) \u02c6\u03c8(2j\u03c9), which\nyields fast computational algorithms using FFTs.\nA wavelet is de\ufb01ned from a solid harmonic by multiplying it by a Gaussian, which localizes its\nsupport. In the 2D case we obtain the following family of wavelets:\n\n\u03c8(cid:96)(r, \u03d5) =\n\ne\u2212 1\n\n2 r2\n\nr(cid:96) ei(cid:96)\u03d5.\n\n1(cid:112)(2\u03c0)2\n\nFor (cid:96) > 0, these functions have zero integrals and are localized around the origin. In 2D frequency\npolar coordinates \u03c9 = \u03bb (cos \u03b1, sin \u03b1)T , one can verify that the Fourier transform of this solid\nharmonic wavelet is very similar to itself in signal space: \u02c6\u03c8(cid:96)(\u03bb, \u03b1) = (\u2212i)(cid:96) e\u2212 1\n2 \u03bb2\n\u03bb(cid:96) ei(cid:96)\u03b1. The solid\nharmonic wavelet transform inherits the rotation properties of the solid harmonics.\nIn 2D, the rotation of a solid harmonic incurs a complex phase shift. Let R\u03b3 \u2208 SO(2) be a rotation\nof angle \u03b3. We \ufb01rst observe that\n\nR\u03b3\u03c8j,(cid:96)(r, \u03d5) = \u03c8j,(cid:96)(r, \u03d5 \u2212 \u03b3) = e\u2212il\u03b3\u03c8(r, \u03d5).\n\nOne can derive that rotating a signal \u03c1 produces the same rotation on its wavelet convolution,\nmultiplied by a phase factor encoding the rotational angle: R\u03b3\u03c1 (cid:63) \u03c8j,(cid:96)(u) = eil\u03b3R\u03b3(\u03c1 (cid:63) \u03c8j,(cid:96))(u).\nIf we eliminate the phase with a modulus U [j, (cid:96)]\u03c1(u) = |\u03c1 (cid:63) \u03c8j,(cid:96)(u)| then it becomes covariant to\nrotations:\n\nU [j, (cid:96)] R\u03b3\u03c1(u) = R\u03b3U [j, (cid:96)]\u03c1(u).\n\nThe left of Figure 1 shows the real part of 2D solid harmonic wavelets at different scales and angular\nfrequencies.\nIn 3D, solid harmonics wavelet are de\ufb01ned by\n\n\u03c8(cid:96),m(r, \u03d1, \u03d5) =\n\ne\u2212 1\n\n2 r2\n\nr(cid:96) Y m\n\n(cid:96) (\u03d1, \u03d5).\n\n1(cid:112)(2\u03c0)3\n\nits dilation by 2j.\n\nWe write \u03c8(cid:96),m,j\nLet us write \u03c9 with 3D polar coordinates: \u03c9 =\n\u03bb(cos \u03b1 cos \u03b2, cos \u03b1 sin \u03b2, sin \u03b1)T . The Fourier transform of the wavelet has the same analytical\nexpression up to a complex factor: \u02c6\u03c8(cid:96),m(\u03bb, \u03b1, \u03b2) = (\u2212i)(cid:96)e\u2212 1\n(cid:96) (\u03b1, \u03b2). The 3D covariance\nto rotations is more involved. The asymmetry of the azimuthal and elevation components of the\nspherical harmonics requires them to be treated differently. In order to obtain a rotation covariance\nproperty, it is necessary to sum the energy over all indices m for a \ufb01xed (cid:96). We thus de\ufb01ne the wavelet\nmodulus operator of a 3D signal \u03c1(u) by\n\n2 \u03bb2\n\n\u03bb(cid:96) Y m\n\n(cid:32) (cid:96)(cid:88)\n\nm=\u2212(cid:96)\n\n(cid:33)1/2\n\nU [(cid:96), j]\u03c1(u) =\n\n|\u03c1 (cid:63) \u03c8(cid:96),m,j(u)|2\n\n.\n\nUsing the properties of spherical harmonics, one can prove that this summation over m de\ufb01nes a\nwavelet transform modulus which is covariant to 3D rotations. For a general rotation R \u2208 SO(3)\n\nU [j, (cid:96)] R\u03c1 = R U [j, (cid:96)]\u03c1.\n\n3\n\n\f2.3 Solid harmonic scattering invariants\n\nWe showed that the wavelet modulus U [j, (cid:96)]\u03c1 is covariant to translations and rotations in 2D and 3D.\nSumming these coef\ufb01cients over the spatial variable u thus de\ufb01nes a translation and rotation invariant\nrepresentation. This property remains valid under pointwise transformations, e.g. if we raise the\nmodulus coef\ufb01cients to any power q. Since U [j, (cid:96)]\u03c1(u) is obtained by a wavelet scaled by 2j, it is\na smooth function and its integral can be computed by subsampling u at intervals 2j\u2212\u03b1 where \u03b1 is\nan oversampling factor typically equal to 1, to avoid aliasing. First order solid harmonic scattering\ncoef\ufb01cients in 2D and 3D are de\ufb01ned for any (j1, (cid:96)) and any exponent q by:\n\n(cid:12)(cid:12)(cid:12)q\n(cid:12)(cid:12)(cid:12)U [j1, (cid:96)]\u03c1(2j1\u2212\u03b1u)\n\n(cid:88)\n\nu\n\nS[j1, (cid:96), q]\u03c1 =\n\nTranslating or rotating \u03c1 does not modify S[j1, (cid:96), q]\u03c1. Let J > 0 denote the number of scales j1,\nand L > 0 the number of angular oscillations (cid:96). We choose q \u2208 Q = {1/2, 1, 2, 3, 4} which yields\n|Q|JL invariant coef\ufb01cients.\nThe summation eliminates the variability of the U [j1, (cid:96)]\u03c1(u) along u. To avoid loosing too much\ninformation, a scattering transform retransforms this function along u in order to capture the lost\nvariabilities. This is done by calculating a convolution with a second family of wavelets at different\nscales 2j2 and again computing a modulus in order to obtain coef\ufb01cients which remain covariant to\ntranslations and rotations. This means that U [j1, (cid:96)]\u03c1(u) is retransformed by the wavelet tranform\nmodulus operator U [j2, (cid:96)]. Clearly U [j2, (cid:96)] U [j1, (cid:96)]\u03c1(u) is still covariant to translations and rotations\nof \u03c1, since U [j1, (cid:96)] and U [j2, (cid:96)] are covariant to translations and rotations.\nThe variable u is again subsampled at intervals 2j2\u2212\u03b1 with an oversampling factor \u03b1 adjusted\nto eliminate the aliasing. Second order scattering invariants are computed by summing over the\nsubsampled spatial variable u:\n\n(cid:12)(cid:12)(cid:12)U [j2, (cid:96)] U [j1, (cid:96)]\u03c1(2j2\u2212\u03b1u)\n\n(cid:12)(cid:12)(cid:12)q\n\n.\n\n(cid:88)\n\nu\n\nS[j1, j2, (cid:96), q]\u03c1 =\n\nThese coef\ufb01cients are computed only for j2 > j1 because one can verify [12] that the amplitude of\nthese invariant coef\ufb01cients is negligible for j2 \u2264 j1. The total number of computed second order\ninvariants is thus |Q|LJ(J \u2212 1)/2.\nIn the following, we shall write S\u03c1 = {S[p]\u03c1}p the scattering representation of \u03c1, de\ufb01ned by\nthe indices p = (j1, (cid:96), q) and p = (j1, j2, (cid:96), q). These coef\ufb01cients are computed with iterated\nconvolutions with wavelets, modulus non-linearities, and averaging. It is proved in [13] that such\nwavelet convolutions and non-linearities can be implemented with a deep convolutional network,\nwhose \ufb01lters depend upon the wavelets and whose depth J is the maximum scale index of all wavelets\nj1 < j2 \u2264 J.\nBesides translation and rotation invariance, one can prove that a scattering transform is Lipschitz\ncontinuous to deformations [12]. This means that if \u03c1(u) is deformed by a small (in maximum\ngradient norm) diffeomorphism applied to u, then the scattering vector stays within an error radius\nproportional to the size of the diffeomorphism. This property is particularly important to linearly\nregress functions which are also stable to deformations.\n\n3 Solid harmonic scattering for quantum energy regression\n\nWe study the application of solid harmonic scattering invariants to the regression of quantum molecular\nenergies. The next section introduces the translation and rotation invariance properties of these\nenergies.\n\n3.1 Molecular regression invariances\n\nA molecule containing K atoms is entirely de\ufb01ned by its nuclear charges zk and its nuclear position\nvectors rk indexed by k. Denoting by x the state vector of a molecule, we have\n\nx = {(rk, zk) \u2208 R3 \u00d7 R : k = 1, . . . , K}.\n\nThe ground-state energy of a molecule has the following invariance properties outlined in [1]:\n\n4\n\n\fInvariance to permutations Energies do not depend on the indexation order k of each nuclei;\nIsometry invariance Energies are invariance to rigid translations, rotations, and re\ufb02ections of the\n\nmolecule and hence of the rk;\n\nDeformation stability The energy is Lipschitz continuous with respect to scaling of distances\n\nbetween atoms.\n\nMultiscale interactions The energy has a multiscale structure, with highly energetic bonds between\nneighboring atoms, and weaker interactions at larger distances, such as Van-der-Waals\ninteractions.\n\nTo regress quantum energies, a machine learning representation must satisfy the same invariance and\nstability properties while providing a set of descriptors which is rich enough to accurately estimate\nthe atomization energy of a diverse collection of molecules.\nA rotation invariant scattering transform has been proposed to regress quantum energies of planar\nmolecules [9]. However this approach involves too much computations in 3D because it requires to\nuse a large number of oriented wavelets to cover the 3D spheres. The following sections explains\nhow to regress the energies of 3D molecules from a spherical harmonic scattering.\n\n3.2 Scattering transform of an electronic density\n\nDensity Functional Theory computes molecular energies by introducing an electronic density \u03c1(u)\nwhich speci\ufb01es the probability density of presence of an electron at a point u. Similarly, we associate\nto the state vector x of the molecule to a naive electronic density \u03c1 which is a sum of Gaussians\ndensities centered on each nuclei. This density incorporates no information on chemical bounds that\nmay arise in the molecule. For K atoms placed at {rk}K\nk=1, the resulting\ndensity is\n\nk=1 having charges {zk}K\n\n\u03c1x(r) =\n\nc(zk)g(r \u2212 rk),\n\nK(cid:88)\n\nk=1\n\n(cid:88)\n\nwhere g is a Gaussian, roughly representing an electron density localized around the nucleus, and\nc(zk) is a vector-valued \u201celectronic channel\u201d. It encodes different aspects of the atomic structure.\nWe shall use three channels: the total nuclear charge zk of the atom, the valence electronic charge\nvk which speci\ufb01es the number of electrons which can be involved in chemical bounds, and the core\nelectronic charge zk \u2212 vk. It results that c(zk) = (zk, vk, zk \u2212 vk)T . The molecule embedding\nveri\ufb01es\n\n(cid:90)\n\n\u03c1x(u)du =\n\n(zk, vk, zk \u2212 vk)T .\n\nk\n\nThis integral gives the total number of nucleus charges and valence and core electrons. This naive\ndensity is invariant to permutations of atom indices k.\nThe density \u03c1x is invariant to permutations of atom indices but it is not invariant to isometries and it\ncan not separate multiscale interactions. These missing invariances and the separation of scales into\ndifferent channels are obtained by computing its scattering representation S\u03c1x with solid harmonic\nwavelets.\nIn Figure 1, there is an example of a 2D solid harmonic wavelet modulus U [j, (cid:96)]\u03c1x for one molecule\nat different scales and angular frequencies.\n\n3.3 Multilinear regression\n\nMolecular energies are regressed with multilinear combinations of scattering coef\ufb01cients S\u03c1x[p]. A\nmultilinear regression of order r is de\ufb01ned by:\n\n(cid:88)\n\nr(cid:89)\n\n\u02dcEr(\u03c1x) = b +\n\n(\u03bdi\n\n((cid:104)S\u03c1x, w(j)\n\ni\n\n(cid:105) + c(j)\n\ni )).\n\nFor r = 1 this is a standard linear regression. For r = 2 this form introduces a non-linearity similar\nto those found in factored gated autoencoders [14]. Trilinear regressions for r = 3 are also used.\n\ni\n\nj=1\n\n5\n\n\fFigure 1: Left: Real parts of 2D solid harmonic wavelets \u03c8(cid:96),j(u). The (cid:96) parameters increases from\n0 to 4 vertically where as the scale 2j increases from left to right. Cartesian slices of 3D spherical\nharmonic wavelets yield similar patterns. Right: Solid harmonic wavelet moduli S[j, (cid:96), 1](\u03c1x)(u) =\n|\u03c1x \u2217 \u03c8j,(cid:96)|(u) of a molecule \u03c1x. The interference patterns at the different scales are reminiscent of\nmolecular orbitals obtained in e.g. density functional theory.\n\nFigure 2: Mean absolute error (MAE) on the validation set as a function of the number of training\npoints used. We observe a fast drop to low estimation errors with as few as 2000 training examples.\nWhile it is still always better to sample more of chemical space, it shows that the representation\ncarries useful information easily amenable to further analysis, while keeping suf\ufb01cient complexity to\nbene\ufb01t from when more datapoints are available.\n\nHere we extend the interactions to an arbitrary number of multiplicative factors. We optimize the\nparameters of the multilinear model by minimizing a quadratic loss function\n\nL(y, \u03c1x) = (y \u2212 \u02dcEr(\u03c1x))2\n\nusing the Adam algorithm for stochastic gradient descent [11]. The model described above is\nnon-linear in the parameter space and therefore it is reasonable to assume that stochastic gradient\ndescent will converge to a local optimum. We \ufb01nd that we can mitigate the effects of local optimum\nconvergence by averaging the predictions of multiple models trained with different initializations2.\n\n4 Numerical Experiments on Chemical Databases\n\nQuantum energy regressions are computed on two standard datasets: QM7 (GDB7-12) [18] has\n7165 molecules of up to 23 atoms among H, C, O, N and S, and QM9 (GDB9-14) [17] has 133885\n\n2For implementation details see http://www.di.ens.fr/data/software/\n\n6\n\n\fmolecules of up to 29 atoms among H, C, O, N and F. We \ufb01rst review results of existing maching\nlearning algorithms before giving results obtained with the solid harmonic scattering transform.\n\n4.1 State of the art algorithms\n\nTables 1 and 2 gives the mean absolute error for each algorithm described below. The \ufb01rst machine\nlearning approaches for quantum energy regressions were based on kernel ridge regression algorithms,\noptimized with different types of kernels. Kernels were \ufb01rst computed with Coulomb matrices, which\nencode pairwise nucleus-nucleus repulsion forces for each molecule [18, 15, 8, 16]. Coulomb matrices\nare not invariant to permutations of indices of atoms in the molecules, which leads to regression\ninstabilities. Improvements have been obtained with bag-of-bonds descriptors [7], which groups\nmatrix entries according to bond type, or with \ufb01xed-length smooth bond-distance histograms [2].\nThe BAML method (Bonds, Angles, etc, and machine learning) [10] re\ufb01nes the kernel by collecting\natomic information, bond information, bond angle information and bond torsion information. The\nHDAD (Histograms of Distances, Angles, and Dihedral angles) kernels [5] improve results with\ncomputing histograms of these quantities. Smooth overlap of atomic positions (SOAP) kernels [3]\ncan also obtain precise regression results with local descriptors computed with spherical harmonics.\nThey are invariant to translations and rotations. However, these kernels only involve local interactions,\nand regression results thus degrade in presence of large-scale interactions.\nDeep neural networks have also been optimized to estimate quantum molecular energies. They hold\nthe state of the art on large databases as shown in Tables 1 and 2. Deep tensor networks [19] combine\npairwise distance matrix representations in a deep learning architecture. MPNN (Message Passing\nNeural Networks) learns a neural network representation on the molecules represented as bond graphs.\nIt obtains the best results on the larger QM9 data base.\n\n4.2 Solid harmonic scattering results\n\nWe performed rigid af\ufb01ne coordinate transforms to align each molecule with its principle axis, making\nit possible to \ufb01t every molecule in a box of one long sidelength and two shorter ones. The Gaussian\nwidth of the electronic embedding is adjusted so that Gaussians located around the two atoms with\nminimal distance do not overlap too much. In all computations, the sampling grid is adjusted to keep\naliasing errors negligible. Scattering vectors are standardized to have a 0 mean and unit variance\nbefore computing the multilayer regression.\n\nQM7 Scattering vectors are computed with L = 5. We estimated quantum energies with a linear\nridge regression from scattering coef\ufb01cients. The dataset comes with a split into 5 folds, where the\nenergy properties are approximately strati\ufb01ed. The average of the mean absolute error (MAE) over 5\nfolds is 2.4 kcal/mol. It shows that scattering coef\ufb01cients are suf\ufb01ciently discriminative to obtain\ncompetitive results with a linear regression.\nBilinear regressions involve more parameters and provides near state of the art performance. We\naverage 5 differently initialized models over the 5 folds to obtain a mean absolute error of 1.2.\nFigure 2 evaluates the performance of the bilinear regression on invariant scattering descriptors. From\nas few as 2000 training samples onward, the test set error drops below 3kcal/mol indicating that the\ninvariant representation gives immediate access to relevant molecular properties. The fact that we\nobserve improvement with larger data samples means that the representation also exhibits suf\ufb01cient\n\ufb02exibility to accommodate relevant information from larger sweeps over chemical space.\n\nQM9 Scattering vectors are computed with L = 2. Quantum energies were estimated from\nscattering vectors with linear, bilinear and trilinear regressions. For cross-validation, the dataset is\nsplit into 5 folds, where the energy properties are approximately strati\ufb01ed. The average of the mean\nabsolute error (MAE) over 5 folds with a trilinear regression across the 5 folds is 0.55.\n\n4.3 Discussion\n\nThe solid harmonic scattering transform followed by a multilinear regression is a domain agnostic\nregression scheme which only relies on prior knowledge of translation and rotation invariance as\nwell as deformation stability. However, it leads to close to state of the art results on each data base.\n\n7\n\n\fQM7\nMAE\n\nRSCM BoB\n3.1\n1.5\n\nSOAP\n0.9\n\nDTN\n1\n\nCBoB\n1.2\n\nL-Scat\n2.4\n\nB-Scat\n1.2\n\nTable 1: Mean Absolute Error in kcal/mol of quantum energy regression in QM7 for different\nalgorithms. (RSCM: Random Sorted Coulomb Matrix[8], BoB: Bag of Bonds[7], SOAP: smooth\noverlap of atomic positions[3], DTN: deep tensor networks[19], CBoB: Continuous bag of bonds[2],\nL-Scat: Linear regression on Scattering invariants, B-Scat: Bilinear regression on Scattering invariants\n\nQM9\nMAE\n\nHDAD\n0.59\n\nBAML\n1.20\n\nCM\n2.97\n\nBOB\n1.42\n\nDTN\n0.84\n\nMPNN\n0.44\n\nT-Scat\n0.55\n\nTable 2: QM9 regression results. (HDAD: Histograms of Distances, Angles and Dihedral Angles\n[5], BAML: Bonds, Angles and Machine Learning [10] , RSCM: Random Sorted Coulomb Matrices,\nBOB: Bags of Bonds, DTN: Deep Tensor Networks, MPNN: Message Passing Neural Networks [6],\nT-Scat: Trilinear regression on scattering invariants\n\nThe size of a scattering descriptor set grows logarithmically with the maximum number of atoms in\nthe molecule (with increasing molecule size one continues to add scales to the wavelet transform,\nwhich adds logarithmically many coef\ufb01cients) as opposed to most other methods such as [3] whose\ndescriptor size grows linearly in the number of atoms in the molecule. Indeed, these techniques are\nbased on measurements of local individual interactions within neighborhoods of atoms.\nThe representation splits the information across scales and provides scale interaction coef\ufb01cients\nwhich can be related to physical phenomena as opposed to millions of deep neural net weights\nwhich are dif\ufb01cult to interpret. Introducing multilinear regression between solid harmonic wavelet\ninvariants further improves the performance on the energy regression task, achieving near state of the\nart performance. This may also be related to multilinear expansions of physical potentials.\nIt is important to issue a word of caution on the chemical interpretation of these algorithmic regres-\nsions. Indeed, all data bases are computed with DFT numerical codes, which only approximate\nthe energy. For the QM9 database, validation errors are on average 5 kcal/mol [17] on calculated\nenergies compared to true chemical energies of ground state molecules. Re\ufb01ned results of fractions\nof kcal/mol thus no longer add true chemical information but rather re\ufb02ect the ability to estimate the\nvalues produced by DFT numerical codes.\n\n5 Conclusion\n\nWe introduced a 2D and 3D solid harmonic wavelet scattering transform which is invariant to\ntranslations and rotations and stable to deformations. It is computed with two successive convolutions\nwith solid harmonic wavelets and complex modulus. Together with multilinear regressions, this\nrepresentation provides near state of the art results for estimation of quantum molecular energies.\nThe same representation is used for small and large data bases. The mathematical simplicity of\nthese descriptors opens the possibility to relate these regression to multiscale properties of quantum\nchemical interactions.\n\nAcknowledgements\n\nM.E., G.E. and S.M. are supported by ERC grant InvariantClass 320959; M.H. is supported by the\nAlfred P. Sloan Fellowship, the DARPA YFA, and NSF grant 1620216.\n\nReferences\n[1] Albert P. Bart\u00f3k, Risi Kondor, and G\u00e1bor Cs\u00e1nyi. On representing chemical environments.\n\nPhysical Review B, 87(18), may 2013.\n\n[2] Christopher R. Collins, Geoffrey J. Gordon, O. Anatole von Lilienfeld, and David J. Yaron.\n\nConstant size molecular descriptors for use with machine learning. arXiv, 2017.\n\n[3] Sandip De, Albert P. Bart\u00f3k, G\u00e1bor Cs\u00e1nyi, and Michele Ceriotti. Comparing molecules and\nsolids across structural and alchemical space. Phys. Chem. Chem. Phys., 18(20):13754\u201313769,\n2016.\n\n8\n\n\f[4] Peter Deglmann, Ansgar Sch\u00e4fer, and Christian Lennartz. Application of quantum calculations in\nthe chemical industry - an overview. International Journal of Quantum Chemistry, 115(3):107\u2013\n136, 2014.\n\n[5] Felix A. Faber, Luke Hutchison, Bing Huang, Justin Gilmer, Samuel S. Schoenholz, George E.\nDahl, Oriol Vinyals, Steven Kearnes, Patrick F. Riley, and O. Anatole von Lilienfeld. Prediction\nerrors of molecular machine learning models lower than hybrid dft error. Journal of Chemical\nTheory and Computation, 0(0):null, 0. PMID: 28926232.\n\n[6] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl.\n\nNeural message passing for quantum chemistry. CoRR, abs/1704.01212, 2017.\n\n[7] Katja Hansen, Franziska Biegler, Raghunathan Ramakrishnan, Wiktor Pronobis, O. Anatole\nvon Lilienfeld, Klaus-Robert M\u00fcller, and Alexandre Tkatchenko. Machine learning predictions\nof molecular properties: Accurate many-body potentials and nonlocality in chemical space. The\nJournal of Physical Chemistry Letters, 6(12):2326\u20132331, 2015. PMID: 26113956.\n\n[8] Katja Hansen, Gr\u00e9goire Montavon, Franziska Biegler, Siamac Fazli, Matthias Rupp, Matthias\nSchef\ufb02er, O. Anatole von Lilienfeld, Alexandre Tkatchenko, and Klaus-Robert M\u00fcller. Assess-\nment and validation of machine learning methods for predicting molecular atomization energies.\nJournal of Chemical Theory and Computation, 9(8):3404\u20133419, 2013.\n\n[9] Matthew Hirn, St\u00e9phane Mallat, and Nicolas Poilvert. Wavelet scattering regression of\nquantum chemical energies. Multiscale Modeling and Simulation, 15(2):827\u2013863, 2017.\narXiv:1605.04654.\n\n[10] Bing Huang and O. Anatole von Lilienfeld. Communication: Understanding molecular rep-\nresentations in machine learning: The role of uniqueness and target similarity. The Journal of\nChemical Physics, 145(16):161102, 2016.\n\n[11] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[12] St\u00e9phane Mallat. Group invariant scattering. Communications on Pure and Applied\n\nMathematics, 65(10):1331\u20131398, October 2012.\n\n[13] St\u00e9phane Mallat. Understanding deep convolutional networks. Phil. Trans. R. Soc. A,\n\n374(2065):20150203, 2016.\n\n[14] Roland Memisevic. Gradient-based learning of higher-order image features. In Computer\n\nVision (ICCV), 2011 IEEE International Conference on, pages 1591\u20131598. IEEE, 2011.\n\n[15] Gr\u00e9goire Montavon, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska Biegler, Andreas\nZiehe, Alexandre Tkatchenko, O. Anatole von Lilienfeld, and Klaus-Robert M\u00fcller. Learn-\ning invariant representations of molecules for atomization energy prediction. In P. Bartlett,\nF.C.N. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural\nInformation Processing Systems 25, pages 449\u2013457. 2012.\n\n[16] Gr\u00e9goire Montavon, Matthias Rupp, Vivekanand Gobre, Alvaro Vazquez-Mayagoitia, Katja\nHansen, Alexandre Tkatchenko, Klaus-Robert M\u00fcller, and O Anatole von Lilienfeld. Machine\nlearning of molecular electronic properties in chemical compound space. New Journal of\nPhysics, 15(9):095003, 2013.\n\n[17] Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and O. Anatole von Lilienfeld.\nQuantum chemistry structures and properties of 134 kilo molecules. Scienti\ufb01c Data, 1:140022\nEP \u2013, 08 2014.\n\n[18] M. Rupp, A. Tkatchenko, K.-R. M\u00fcller, and O. A. von Lilienfeld. Fast and accurate modeling of\nmolecular atomization energies with machine learning. Physical Review Letters, 108:058301,\n2012.\n\n[19] Kristof T. Sch\u00fctt, Farhad Arbabzadah, Stefan Chmiela, Klaus R. M\u00fcller, and Alexan-\ndre Tkatchenko. Quantum-chemical insights from deep tensor neural networks. Nature\nCommunications, 8:13890 EP \u2013, Jan 2017. Article.\n\n9\n\n\f[20] Laurent Sifre and St\u00e9phane Mallat. Rotation, scaling and deformation invariant scattering for\ntexture discrimination. In Proceedings of the IEEE conference on computer vision and pattern\nrecognition, pages 1233\u20131240, 2013.\n\n10\n\n\f", "award": [], "sourceid": 3285, "authors": [{"given_name": "Michael", "family_name": "Eickenberg", "institution": "UC Berkeley"}, {"given_name": "Georgios", "family_name": "Exarchakis", "institution": "\u00c9cole Normale Sup\u00e9rieure"}, {"given_name": "Matthew", "family_name": "Hirn", "institution": "Michigan State University"}, {"given_name": "Stephane", "family_name": "Mallat", "institution": "Ecole normale superieure"}]}