{"title": "SchNet: A continuous-filter convolutional neural network for modeling quantum interactions", "book": "Advances in Neural Information Processing Systems", "page_first": 991, "page_last": 1001, "abstract": "Deep learning has the potential to revolutionize quantum chemistry as it is ideally suited to learn representations for structured data and speed up the exploration of chemical space. While convolutional neural networks have proven to be the first choice for images, audio and video data, the atoms in molecules are not restricted to a grid. Instead, their precise locations contain essential physical information, that would get lost if discretized. Thus, we propose to use continuous-filter convolutional layers to be able to model local correlations without requiring the data to lie on a grid. We apply those layers in SchNet: a novel deep learning architecture modeling quantum interactions in molecules. We obtain a joint model for the total energy and interatomic forces that follows fundamental quantum-chemical principles. Our architecture achieves state-of-the-art performance for benchmarks of equilibrium molecules and molecular dynamics trajectories. Finally, we introduce a more challenging benchmark with chemical and structural variations that suggests the path for further work.", "full_text": "SchNet: A continuous-\ufb01lter convolutional neural\n\nnetwork for modeling quantum interactions\n\nK. T. Sch\u00fctt1\u2217, P.-J. Kindermans1, H. E. Sauceda2, S. Chmiela1\n\nA. Tkatchenko3, K.-R. M\u00fcller1,4,5\u2020\n\n1 Machine Learning Group, Technische Universit\u00e4t Berlin, Germany\n\n2 Theory Department, Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin, Germany\n3 Physics and Materials Science Research Unit, University of Luxembourg, Luxembourg\n\n4 Max-Planck-Institut f\u00fcr Informatik, Saarbr\u00fccken, Germany\n\n5 Dept. of Brain and Cognitive Engineering, Korea University, Seoul, South Korea\n\n\u2217 kristof.schuett@tu-berlin.de \u2020 klaus-robert.mueller@tu-berlin.de\n\nAbstract\n\nDeep learning has the potential to revolutionize quantum chemistry as it is ideally\nsuited to learn representations for structured data and speed up the exploration\nof chemical space. While convolutional neural networks have proven to be the\n\ufb01rst choice for images, audio and video data, the atoms in molecules are not\nrestricted to a grid. Instead, their precise locations contain essential physical\ninformation, that would get lost if discretized. Thus, we propose to use continuous-\n\ufb01lter convolutional layers to be able to model local correlations without requiring\nthe data to lie on a grid. We apply those layers in SchNet: a novel deep learning\narchitecture modeling quantum interactions in molecules. We obtain a joint model\nfor the total energy and interatomic forces that follows fundamental quantum-\nchemical principles. Our architecture achieves state-of-the-art performance for\nbenchmarks of equilibrium molecules and molecular dynamics trajectories. Finally,\nwe introduce a more challenging benchmark with chemical and structural variations\nthat suggests the path for further work.\n\n1\n\nIntroduction\n\nThe discovery of novel molecules and materials with desired properties is crucial for applications\nsuch as batteries, catalysis and drug design. However, the vastness of chemical compound space\nand the computational cost of accurate quantum-chemical calculations prevent an exhaustive explo-\nration. In recent years, there have been increased efforts to use machine learning for the accelerated\ndiscovery of molecules and materials with desired properties [1\u20139]. However, these methods are\nonly applied to stable systems in so-called equilibrium, i.e., local minima of the potential energy\nsurface E(r1, . . . , rn) where ri is the position of atom i. Data sets such as the established QM9\nbenchmark [10] contain only equilibrium molecules. Predicting stable atom arrangements is in itself\nan important challenge in quantum chemistry and material science.\nIn general, it is not clear how to obtain equilibrium conformations without optimizing the atom\npositions. Therefore, we need to compute both the total energy E(r1, . . . , rn) and the forces acting\non the atoms\n\nFi(r1, . . . , rn) = \u2212 \u2202E\n\u2202ri\n\n(r1, . . . , rn).\n\n(1)\n\nOne possibility is to use a less computationally costly, however, also less accurate quantum-chemical\napproximation. Instead, we choose to extend the domain of our machine learning model to both\ncompositional (chemical) and con\ufb01gurational (structural) degrees of freedom.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fIn this work, we aim to learn a representation for molecules using equilibrium and non-equilibrium\nconformations. Such a general representation for atomistic systems should follow fundamental\nquantum-mechanical principles. Most importantly, the predicted force \ufb01eld has to be curl-free.\nOtherwise, it would be possible to follow a circular trajectory of atom positions such that the energy\nkeeps increasing, i.e., breaking the law of energy conservation. Furthermore, the potential energy\nsurface as well as its partial derivatives have to be smooth, e.g., in order to be able to perform geometry\noptimization. Beyond that, it is bene\ufb01cial that the model incorporates the invariance of the molecular\nenergy with respect to rotation, translation and atom indexing. Being able to model both chemical\nand conformational variations constitutes an important step towards ML-driven quantum-chemical\nexploration.\nThis work provides the following key contributions:\n\n\u2022 We propose continuous-\ufb01lter convolutional (cfconv) layers as a means to move beyond\ngrid-bound data such as images or audio towards modeling objects with arbitrary positions\nsuch as astronomical observations or atoms in molecules and materials.\n\n\u2022 We propose SchNet: a neural network speci\ufb01cally designed to respect essential quantum-\nchemical constraints. In particular, we use the proposed cfconv layers in R3 to model\ninteractions of atoms at arbitrary positions in the molecule. SchNet delivers both rotationally\ninvariant energy prediction and rotationally equivariant force predictions. We obtain a\nsmooth potential energy surface and the resulting force-\ufb01eld is guaranteed to be energy-\nconserving.\n\n\u2022 We present a new, challenging benchmark \u2013 ISO17 \u2013 including both chemical and confor-\nmational changes3. We show that training with forces improves generalization in this setting\nas well.\n\n2 Related work\n\nPrevious work has used neural networks and Gaussian processes applied to hand-crafted features to\n\ufb01t potential energy surfaces [11\u201316]. Graph convolutional networks for circular \ufb01ngerprint [17] and\nmolecular graph convolutions [18] learn representations for molecules of arbitrary size. They encode\nthe molecular structure using neighborhood relationships as well as bond features, e.g., one-hot\nencodings of single, double and triple bonds. In the following, we brie\ufb02y review the related work that\nwill be used in our empirical evaluation: gradient domain machine learning (GDML), deep tensor\nneural networks (DTNN) and enn-s2s.\n\nGradient-domain machine learning (GDML) Chmiela et al. [19] proposed GDML as a method\nto construct force \ufb01elds that explicitly obey the law of energy conservation. GDML captures the\nrelationship between energy and interatomic forces (see Eq. 1) by training the gradient of the energy\nestimator. The functional relationship between atomic coordinates and interatomic forces is thus\nlearned directly and energy predictions are obtained by re-integration. However, GDML does not\nscale well due to its kernel matrix growing quadratically with the number of atoms as well as the\nnumber of examples. Beyond that, it is not designed to represent different compositions of atom\ntypes unlike SchNet, DTNN and enn-s2s.\n\nDeep tensor neural networks (DTNN) Sch\u00fctt et al. [20] proposed the DTNN for molecules that\nare inspired by the many-body Hamiltonian applied to the interactions of atoms. They have been\nshown to reach chemical accuracy on a small set of molecular dynamics trajectories as well as QM9.\nEven though the DTNN shares the invariances with our proposed architecture, its interaction layers\nlack the continuous-\ufb01lter convolution interpretation. It falls behind in accuracy compared to SchNet\nand enn-s2s.\n\nenn-s2s Gilmer et al. [21] proposed the enn-s2s as a variant of message-passing neural networks that\nuses bond type features in addition to interatomic distances. It achieves state-of-the-art performance\non all properties of the QM9 benchmark [21]. Unfortunately, it cannot be used for molecular dynamics\npredictions (MD-17). This is caused by discontinuities in their potential energy surface due to the\n\n3ISO17 is publicly available at www.quantum-machine.org.\n\n2\n\n\fFigure 1: The discrete \ufb01lter (left) is not able to capture the subtle positional changes of the atoms\nresulting in discontinuous energy predictions \u02c6E (bottom left). The continuous \ufb01lter captures these\nchanges and yields smooth energy predictions (bottom right).\n\ndiscreteness of the one-hot encodings in their input. In contrast, SchNet does not use such features\nand yields a continuous potential energy surface by using continuous-\ufb01lter convolutional layers.\n\n3 Continuous-\ufb01lter convolutions\n\nIn deep learning, convolutional layers operate on discretized signals such as image pixels [22, 23],\nvideo frames [24] or digital audio data [25]. While it is suf\ufb01cient to de\ufb01ne the \ufb01lter on the same\ngrid in these cases, this is not possible for unevenly spaced inputs such as the atom positions of a\nmolecule (see Fig. 1). Other examples include astronomical observations [26], climate data [27]\nand the \ufb01nancial market [28]. Commonly, this can be solved by a re-sampling approach de\ufb01ning\na representation on a grid [7, 29, 30]. However, choosing an appropriate interpolation scheme is\na challenge on its own and, possibly, requires a large number of grid points. Therefore, various\nextensions of convolutional layers even beyond the Euclidean space exist, e.g., for graphs [31, 32]\nand 3d shapes[33]. Analogously, we propose to use continuous \ufb01lters that are able to handle unevenly\nspaced data, in particular, atoms at arbitrary positions.\ni \u2208 RF at locations\nGiven the feature representations of n objects X l = (xl\nR = (r1, . . . , rn) with ri \u2208 RD, the continuous-\ufb01lter convolutional layer l requires a \ufb01lter-generating\nfunction\n\nn) with xl\n\n1, . . . , xl\n\nW l : RD \u2192 RF ,\n\nthat maps from a position to the corresponding \ufb01lter values. This constitutes a generalization of a\n\ufb01lter tensor in discrete convolutional layers. As in dynamic \ufb01lter networks [34], this \ufb01lter-generating\nfunction is modeled with a neural network. While dynamic \ufb01lter networks generate weights restricted\nto a grid structure, our approach generalizes this to arbitrary position and number of objects. The\noutput xl+1\n\nfor the convolutional layer at position ri is then given by\n\ni\n\n(cid:88)\n\ni = (X l \u2217 W l)i =\nxl+1\n\nj \u25e6 W l(ri \u2212 rj),\nxl\n\n(2)\n\nwhere \"\u25e6\" represents the element-wise multiplication. We apply these convolutions feature-wise\nfor computational ef\ufb01ciency [35]. The interactions between feature maps are handled by separate\nobject-wise or, speci\ufb01cally, atom-wise layers in SchNet.\n\nj\n\n4 SchNet\n\nSchNet is designed to learn a representation for the prediction of molecular energies and atomic\nforces. It re\ufb02ects fundamental physical laws including invariance to atom indexing and translation, a\nsmooth energy prediction w.r.t. atom positions as well as energy-conservation of the predicted force\n\ufb01elds. The energy and force predictions are rotationally invariant and equivariant, respectively.\n\n3\n\n\fFigure 2: Illustration of SchNet with an architectural overview (left), the interaction block (middle)\nand the continuous-\ufb01lter convolution with \ufb01lter-generating network (right). The shifted softplus is\nde\ufb01ned as ssp(x) = ln(0.5ex + 0.5).\n\n4.1 Architecture\n\nFig. 2 shows an overview of the SchNet architecture. At each layer, the molecule is represented atom-\nwise analogous to pixels in an image. Interactions between atoms are modeled by the three interaction\nblocks. The \ufb01nal prediction is obtained after atom-wise updates of the feature representation and\npooling of the resulting atom-wise energy. In the following, we discuss the different components of\nthe network.\n\nMolecular representation A molecule in a certain conformation can be described uniquely by a set\nof n atoms with nuclear charges Z = (Z1, . . . , Zn) and atomic positions R = (r1, . . . rn). Through\nthe layers of the neural network, we represent the atoms using a tuple of features X l = (xl\nn),\n1, . . . xl\ni \u2208 RF with the number of feature maps F , the number of atoms n and the current layer l. The\nwith xl\nrepresentation of atom i is initialized using an embedding dependent on the atom type Zi:\n\nx0\n\ni = aZi.\n\n(3)\n\nThe atom type embeddings aZ are initialized randomly and optimized during training.\n\nAtom-wise layers A recurring building block in our architecture are atom-wise layers. These are\ndense layers that are applied separately to the representation xl\n\ni of atom i:\n\nxl+1\ni = W lxl\n\ni + bl\n\nThese layers is responsible for the recombination of feature maps. Since weights are shared across\natoms, our architecture remains scalable with respect to the size of the molecule.\n\nInteraction The interaction blocks, as shown in Fig. 2 (middle), are responsible for updating the\natomic representations based on the molecular geometry R = (r1, . . . rn). We keep the number of\nfeature maps constant at F = 64 throughout the interaction part of the network. In contrast to MPNN\nand DTNN, we do not use weight sharing across multiple interaction blocks.\nThe blocks use a residual connection inspired by ResNet [36]:\n\ni + vl\ni.\nAs shown in the interaction block in Fig. 2, the residual vl\ni is computed through an atom-wise layer,\nan interatomic continuous-\ufb01lter convolution (cfconv) followed by two more atom-wise layers with a\nsoftplus non-linearity in between. This allows for a \ufb02exible residual that incorporates interactions\nbetween atoms and feature maps.\n\nxl+1\ni = xl\n\n4\n\n\f(a) 1st interaction block\n\n(b) 2nd interaction block\n\n(c) 3rd interaction block\n\nFigure 3: 10x10 \u00c5 cuts through all 64 radial, three-dimensional \ufb01lters in each interaction block of\nSchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.\n\nFilter-generating networks The cfconv layer including its \ufb01lter-generating network are depicted\nat the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,\nwe restrict our \ufb01lters for the cfconv layers to be rotationally invariant. The rotational invariance is\nobtained by using interatomic distances\n\ndij = (cid:107)ri \u2212 rj(cid:107)\n\nas input for the \ufb01lter network. Without further processing, the \ufb01lters would be highly correlated since\na neural network after initialization is close to linear. This leads to a plateau at the beginning of\ntraining that is hard to overcome. We avoid this by expanding the distance with radial basis functions\n\nek(ri \u2212 rj) = exp(\u2212\u03b3(cid:107)dij \u2212 \u00b5k(cid:107)2)\n\nlocated at centers 0\u00c5 \u2264 \u00b5k \u2264 30\u00c5 every 0.1\u00c5 with \u03b3 = 10\u00c5. This is chosen such that all distances\noccurring in the data sets are covered by the \ufb01lters. Due to this additional non-linearity, the initial\n\ufb01lters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds\nto reducing the resolution of the \ufb01lter, while restricting the range of the centers corresponds to the\n\ufb01lter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is\nleft for future work. We feed the expanded distances into two dense layers with softplus activations\nto compute the \ufb01lter weight W (ri \u2212 rj) as shown in Fig. 2 (right).\nFig 3 shows 2d-cuts through generated \ufb01lters for all three interaction blocks of SchNet trained on\nan ethanol molecular dynamics trajectory. We observe how each \ufb01lter emphasizes certain ranges of\ninteratomic distances. This enables its interaction block to update the representations according to the\nradial environment of each atom. The sequential updates from three interaction blocks allow SchNet\nto construct highly complex many-body representations in the spirit of DTNNs [20] while keeping\nrotational invariance due to the radial \ufb01lters.\n\n4.2 Training with energies and forces\n\nAs described above, the interatomic forces are related to the molecular energy, so that we can obtain\nan energy-conserving force model by differentiating the energy model w.r.t. the atom positions\n\n\u02c6Fi(Z1, . . . , Zn, r1, . . . , rn) = \u2212 \u2202 \u02c6E\n\u2202ri\n\n(Z1, . . . , Zn, r1, . . . , rn).\n\n(4)\n\nChmiela et al. [19] pointed out that this leads to an energy-conserving force-\ufb01eld by construction.\nAs SchNet yields rotationally invariant energy predictions, the force predictions are rotationally\nequivariant by construction. The model has to be at least twice differentiable to allow for gradient\ndescent of the force loss. We chose a shifted softplus ssp(x) = ln(0.5ex + 0.5) as non-linearity\nthroughout the network in order to obtain a smooth potential energy surface. The shifting ensures that\nssp(0) = 0 and improves the convergence of the network. This activation function shows similarity\nto ELUs [37], while having in\ufb01nite order of continuity.\n\n5\n\n\fTable 1: Mean absolute errors for energy predictions in kcal/mol on the QM9 data set with given\ntraining set size N. Best model in bold.\n\nN\n50,000\n100,000\n110,462\n\nSchNet DTNN [20]\n0.94\n0.84\n\u2013\n\n0.59\n0.34\n0.31\n\nenn-s2s [21]\n\u2013\n\u2013\n0.45\n\nenn-s2s-ens5 [21]\n\u2013\n\u2013\n0.33\n\nWe include the total energy E as well as forces Fi in the training loss to train a neural network that\nperforms well on both properties:\n\n(cid:32)\n\u2212 \u2202 \u02c6E\n\u2202Ri\n\n(cid:33)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n\n.\n\n(5)\n\n(cid:96)( \u02c6E, (E, F1, . . . , Fn)) = \u03c1(cid:107)E \u2212 \u02c6E(cid:107)2 +\n\n1\nn\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)Fi \u2212\n\nn(cid:88)\n\ni=0\n\nThis kind of loss has been used before for \ufb01tting a restricted potential energy surfaces with MLPs [38].\nIn our experiments, we use \u03c1 = 0.01 for combined energy and force training. The value of \u03c1 was\noptimized empirically to account for different scales of energy and forces.\nDue to the relation of energies and forces re\ufb02ected in the model, we expect to see improved gen-\neralization, however, at a computational cost. As we need to perform a full forward and backward\npass on the energy model to obtain the forces, the resulting force model is twice as deep and, hence,\nrequires about twice the amount of computation time.\nEven though the GDML model captures this relationship between energies and forces, it is explicitly\noptimized to predict the force \ufb01eld while the energy prediction is a by-product. Models such as\ncircular \ufb01ngerprints [17], molecular graph convolutions or message-passing neural networks[21] for\nproperty prediction across chemical compound space are only concerned with equilibrium molecules,\ni.e., the special case where the forces are vanishing. They can not be trained with forces in a similar\nmanner, as they include discontinuities in their predicted potential energy surface caused by discrete\nbinning or the use of one-hot encoded bond type information.\n\n5 Experiments and results\n\nIn this section, we apply the SchNet to three different quantum chemistry datasets: QM9, MD17 and\nISO17. We designed the experiments such that each adds another aspect towards modeling chemical\nspace. While QM9 only contains equilibrium molecules, for MD17 we predict conformational\nchanges of molecular dynamics of single molecules. Finally, we present ISO17 combining both\nchemical and structural changes.\nFor all datasets, we report mean absolute errors in kcal/mol for the energies and in kcal/mol/\u00c5 for\nthe forces. The architecture of the network was \ufb01xed after an evaluation on the MD17 data sets for\nbenzene and ethanol (see supplement). In each experiment, we split the data into a training set of\ngiven size N and use a validation set of 1,000 examples for early stopping. The remaining data is\nused as test set. All models are trained with SGD using the ADAM optimizer [39] with 32 molecules\nper mini-batch. We use an initial learning rate of 10\u22123 and an exponential learning rate decay with\nratio 0.96 every 100,000 steps. The model used for testing is obtained using an exponential moving\naverage over weights with decay rate 0.99.\n\n5.1 QM9 \u2013 chemical degrees of freedom\n\nQM9 is a widely used benchmark for the prediction of various molecular properties in equilibrium [10,\n40, 41]. Therefore, the forces are zero by de\ufb01nition and do not need to be predicted. In this setting,\nwe train a single model that generalizes across different compositions and sizes.\nQM9 consists of \u2248130k organic molecules with up to 9 heavy atoms of the types {C, O, N, F}. As the\nsize of the training set varies across previous work, we trained our models each of these experimental\nsettings. Table 1 shows the performance of various competing methods for predicting the total energy\n(property U0 in QM9). We provide comparisons to the DTNN [20] and the best performing MPNN\ncon\ufb01guration denoted enn-s2s and an ensemble of MPNNs (enn-s2s-ens5) [21]. SchNet consistently\nobtains state-of-the-art performance with an MAE of 0.31 kcal/mol at 110k training examples.\n\n6\n\n\fTable 2: Mean absolute errors for energy and force predictions in kcal/mol and kcal/mol/\u00c5, respec-\ntively. GDML and SchNet test errors for training with 1,000 and 50,000 examples of molecular\ndynamics simulations of small, organic molecules are shown. SchNets were trained only on energies\nas well as energies and forces combined. Best results in bold.\n\nN = 1,000\n\nN = 50,000\n\nSchNet\n\nSchNet\n\nBenzene\n\nToluene\n\nMalonaldehyde\n\nSalicylic acid\n\nAspirin\n\nEthanol\n\nUracil\n\nNaphtalene\n\nenergy\nforces\nenergy\nforces\nenergy\nforces\nenergy\nforces\nenergy\nforces\nenergy\nforces\nenergy\nforces\nenergy\nforces\n\nGDML [19]\nforces\n0.07\n0.23\n0.12\n0.24\n0.16\n0.80\n0.12\n0.28\n0.27\n0.99\n0.15\n0.79\n0.11\n0.24\n0.12\n0.23\n\nenergy\n1.19\n14.12\n2.95\n22.31\n2.03\n20.41\n3.27\n23.21\n4.20\n23.54\n0.93\n6.56\n2.26\n20.08\n3.58\n25.36\n\nboth\n0.08\n0.31\n0.12\n0.57\n0.13\n0.66\n0.20\n0.85\n0.37\n1.35\n0.08\n0.39\n0.14\n0.56\n0.16\n0.58\n\nDTNN [20]\nenergy\n0.04\n\u2013\n0.18\n\u2013\n0.19\n\u2013\n0.41\n\u2013\n\u2013\n\u2013\n\u2013\n\u2013\n\u2013\n\u2013\n\u2013\n\u2013\n\nenergy\n0.08\n1.23\n0.16\n1.79\n0.13\n1.51\n0.25\n3.72\n0.25\n7.36\n0.07\n0.76\n0.13\n3.28\n0.20\n2.58\n\nboth\n0.07\n0.17\n0.09\n0.09\n0.08\n0.08\n0.10\n0.19\n0.12\n0.33\n0.05\n0.05\n0.10\n0.11\n0.11\n0.11\n\n5.2 MD17 \u2013 conformational degrees of freedom\n\nMD17 is a collection of eight molecular dynamics simulations for small organic molecules. These\ndata sets were introduced by Chmiela et al. [19] for prediction of energy-conserving force \ufb01elds\nusing GDML. Each of these consists of a trajectory of a single molecule covering a large variety\nof conformations. Here, the task is to predict energies and forces using a separate model for each\ntrajectory. This molecule-wise training is motivated by the need for highly-accurate force predictions\nwhen doing molecular dynamics.\nTable 2 shows the performance of SchNet using 1,000 and 50,000 training examples in comparison\nwith GDML and DTNN. Using the smaller data set, GDML achieves remarkably accurate energy and\nforce predictions despite being only trained on forces. The energies are only used to \ufb01t the integration\nconstant. As mentioned before, GDML does not scale well with the number of atoms and training\nexamples. Therefore, it cannot be trained on 50,000 examples. The DTNN was evaluated only on\nfour of these MD trajectories using the larger training set [20]. Note that the enn-s2s cannot be used\non this dataset due to discontinuities in its inferred potential energy surface.\nWe trained SchNet using just energies and using both energies and forces. While the energy-only\nmodel shows high errors for the small training set, the model including forces achieves energy\npredictions comparable to GDML. In particular, we observe that SchNet outperforms GDML on the\nmore \ufb02exible molecules malonaldehyde and ethanol, while GDML reaches much lower force errors\non the remaining MD trajectories that all include aromatic rings.\nThe real strength of SchNet is its scalability, as it outperforms the DTNN in three of four data sets\nusing 50,000 training examples using only energies in training. Including force information, SchNet\nconsistently obtains accurate energies and forces with errors below 0.12 kcal/mol and 0.33 kcal/mol/\u00c5,\nrespectively. Remarkably, when training on energies and forces using 1,000 training examples, SchNet\nperforms better than training the same model on energies alone for 50,000 examples.\n\n7\n\n\fTable 3: Mean absolute errors on C7O2H10 isomers in kcal/mol.\n\nmean predictor\n\nknown molecules /\nenergy\nunknown conformation forces\nunknown molecules /\nenergy\nunknown conformation forces\n\n14.89\n19.56\n15.54\n19.15\n\nSchNet\nenergy+forces\n0.36\n1.00\n2.40\n2.18\n\nenergy\n0.52\n4.13\n3.11\n5.71\n\n5.3\n\nISO17 \u2013 chemical and conformational degrees of freedom\n\nAs the next step towards quantum-chemical exploration, we demonstrate the capability of SchNet\nto represent a complex potential energy surface including conformational and chemical changes.\nWe present a new dataset \u2013 ISO17 \u2013 where we consider short MD trajectories of 129 isomers, i.e.,\nchemically different molecules with the same number and types of atoms. In contrast to MD17, we\ntrain a joint model across different molecules. We calculate energies and interatomic forces from short\nMD trajectories of 129 molecules drawn randomly from the largest set of isomers in QM9. While\nthe composition of all included molecules is C7O2H10, the chemical structures are fundamentally\ndifferent. With each trajectory consisting of 5,000 conformations, the data set consists of 645,000\nlabeled examples.\nWe consider two scenarios with this dataset: In the \ufb01rst variant, the molecular graph structures\npresent in training are also present in the test data. This demonstrates how well our model is able to\nrepresent a complex potential energy surface with chemical and conformational changes. In the more\nchallenging scenario, the test data contains a different subset of molecules. Here we evaluate the\ngeneralization of our model to previously unseen chemical structures. We predict forces and energies\nin both cases and compare to the mean predictor as a baseline. We draw a subset of 4,000 steps from\n80% of the MD trajectories for training and validation. This leaves us with a separate test set for each\nscenario: (1) the unseen 1,000 conformations of molecule trajectories included in the training set and\n(2) all 5,000 conformations of the remaining 20% of molecules not included in training.\nTable 3 shows the performance of the SchNet on both test sets. Our proposed model reaches chemical\naccuracy for the prediction of energies and forces for the test set of known molecules. Including\nforces in the training improves the performance here as well as on the set of unseen molecules. This\nshows that using force information does not only help to accurately predict nearby conformations of\na single molecule, but indeed helps to generalize across chemical compound space.\n\n6 Conclusions\n\nWe have proposed continuous-\ufb01lter convolutional layers as a novel building block for deep neural\nnetworks. In contrast to the usual convolutional layers, these can model unevenly spaced data as\noccurring in astronomy, climate reasearch and, in particular, quantum chemistry. We have developed\nSchNet to demonstrate the capabilities of continuous-\ufb01lter convolutional layers in the context of\nmodeling quantum interactions in molecules. Our architecture respects quantum-chemical constraints\nsuch as rotationally invariant energy predictions as well as rotationally equivariant, energy-conserving\nforce predictions.\nWe have evaluated our model in three increasingly challenging experimental settings. Each brings us\none step closer to practical chemical exploration driven by machine learning. SchNet improves the\nstate-of-the-art in predicting energies for molecules in equilibrium of the QM9 benchmark. Beyond\nthat, it achieves accurate predictions for energies and forces for all molecular dynamics trajectories in\nMD17. Finally, we have introduced ISO17 consisting of 645,000 conformations of various C7O2H10\nisomers. While we achieve promising results on this new benchmark, modeling chemical and\nconformational variations remains dif\ufb01cult and needs further improvement. For this reason, we expect\nthat ISO17 will become a new standard benchmark for modeling quantum interactions with machine\nlearning.\n\n8\n\n\fAcknowledgments\n\nThis work was supported by the Federal Ministry of Education and Research (BMBF) for the Berlin\nBig Data Center BBDC (01IS14013A). Additional support was provided by the DFG (MU 987/20-1)\nand from the European Union\u2019s Horizon 2020 research and innovation program under the Marie\nSklodowska-Curie grant agreement NO 657679. K.R.M. gratefully acknowledges the BK21 program\nfunded by Korean National Research Foundation grant (No. 2012-005741) and the Institute for\nInformation & Communications Technology Promotion (IITP) grant funded by the Korea government\n(no. 2017-0-00451).\n\nReferences\n[1] M. Rupp, A. Tkatchenko, K.-R. M\u00fcller, and O. A. Von Lilienfeld. Fast and accurate modeling\nof molecular atomization energies with machine learning. Phys. Rev. Lett., 108(5):058301,\n2012.\n\n[2] G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen, A. Tkatchenko, K.-R.\nM\u00fcller, and O. A. von Lilienfeld. Machine learning of molecular electronic properties in\nchemical compound space. New J. Phys., 15(9):095003, 2013.\n\n[3] K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Schef\ufb02er, O. A. Von Lilienfeld,\nA. Tkatchenko, and K.-R. M\u00fcller. Assessment and validation of machine learning methods for\npredicting molecular atomization energies. J. Chem. Theory Comput., 9(8):3404\u20133419, 2013.\n\n[4] K. T. Sch\u00fctt, H. Glawe, F. Brockherde, A. Sanna, K.-R. M\u00fcller, and EKU Gross. How\nto represent crystal structures for machine learning: Towards fast prediction of electronic\nproperties. Phys. Rev. B, 89(20):205118, 2014.\n\n[5] K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. A. von Lilienfeld, K.-R. M\u00fcller, and\nA. Tkatchenko. Machine learning predictions of molecular properties: Accurate many-body\npotentials and nonlocality in chemical space. J. Phys. Chem. Lett., 6:2326, 2015.\n\n[6] F. A. Faber, L. Hutchison, B. Huang, Ju. Gilmer, S. S. Schoenholz, G. E. Dahl, O. Vinyals,\nS. Kearnes, P. F. Riley, and O. A. von Lilienfeld. Fast machine learning models of electronic\nand energetic properties consistently reach approximation errors better than dft accuracy. arXiv\npreprint arXiv:1702.05532, 2017.\n\n[7] F. Brockherde, L. Voigt, L. Li, M. E. Tuckerman, K. Burke, and K.-R. M\u00fcller. Bypassing the\n\nKohn-Sham equations with machine learning. Nature Communications, 8(872), 2017.\n\n[8] W. Boomsma and J. Frellsen. Spherical convolutions and their application in molecular mod-\n\nelling. In Advances in Neural Information Processing Systems 30, pages 3436\u20133446. 2017.\n\n[9] M. Eickenberg, G. Exarchakis, M. Hirn, and S. Mallat. Solid harmonic wavelet scattering:\nPredicting quantum molecular energy from invariant descriptors of 3d electronic densities. In\nAdvances in Neural Information Processing Systems 30, pages 6543\u20136552. 2017.\n\n[10] R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilienfeld. Quantum chemistry structures\n\nand properties of 134 kilo molecules. Scienti\ufb01c Data, 1, 2014.\n\n[11] S. Manzhos and T. Carrington Jr. A random-sampling high dimensional model representation\n\nneural network for building potential energy surfaces. J. Chem. Phys., 125(8):084109, 2006.\n\n[12] R. Malshe, M .and Narulkar, L. M. Raff, M. Hagan, S. Bukkapatnam, P. M. Agrawal, and R. Ko-\nmanduri. Development of generalized potential-energy surfaces using many-body expansions,\nneural networks, and moiety energy approximations. J. Chem. Phys., 130(18):184102, 2009.\n\n[13] J. Behler and M. Parrinello. Generalized neural-network representation of high-dimensional\n\npotential-energy surfaces. Phys. Rev. Lett., 98(14):146401, 2007.\n\n[14] A. P. Bart\u00f3k, M. C. Payne, R. Kondor, and G. Cs\u00e1nyi. Gaussian approximation potentials: The\naccuracy of quantum mechanics, without the electrons. Phys. Rev. Lett., 104(13):136403, 2010.\n\n9\n\n\f[15] J. Behler. Atom-centered symmetry functions for constructing high-dimensional neural network\n\npotentials. J. Chem. Phys., 134(7):074106, 2011.\n\n[16] A. P. Bart\u00f3k, R. Kondor, and G. Cs\u00e1nyi. On representing chemical environments. Phys. Rev. B,\n\n87(18):184115, 2013.\n\n[17] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik,\nand R. P. Adams. Convolutional networks on graphs for learning molecular \ufb01ngerprints. In\nC. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, NIPS, pages\n2224\u20132232, 2015.\n\n[18] S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. F. Riley. Molecular graph convolutions:\nmoving beyond \ufb01ngerprints. Journal of Computer-Aided Molecular Design, 30(8):595\u2013608,\n2016.\n\n[19] S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Sch\u00fctt, and K.-R. M\u00fcller.\nMachine learning of accurate energy-conserving molecular force \ufb01elds. Science Advances, 3(5):\ne1603015, 2017.\n\n[20] K. T. Sch\u00fctt, F. Arbabzadah, S. Chmiela, K.-R. M\u00fcller, and A. Tkatchenko. Quantum-chemical\n\ninsights from deep tensor neural networks. Nature Communications, 8(13890), 2017.\n\n[21] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for\nquantum chemistry. In Proceedings of the 34th International Conference on Machine Learning,\npages 1263\u20131272, 2017.\n\n[22] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel.\nBackpropagation applied to handwritten zip code recognition. Neural computation, 1(4):\n541\u2013551, 1989.\n\n[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classi\ufb01cation with deep convolutional\nneural networks. In Advances in neural information processing systems, pages 1097\u20131105,\n2012.\n\n[24] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video\nclassi\ufb01cation with convolutional neural networks. In Proceedings of the IEEE conference on\nComputer Vision and Pattern Recognition, pages 1725\u20131732, 2014.\n\n[25] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner,\nA. Senior, and K. Kavukcuoglu. Wavenet: A generative model for raw audio. In 9th ISCA\nSpeech Synthesis Workshop, pages 125\u2013125, 2016.\n\n[26] W. Max-Moerbeck, J. L. Richards, T. Hovatta, V. Pavlidou, T. J. Pearson, and A. C. S. Readhead.\nA method for the estimation of the signi\ufb01cance of cross-correlations in unevenly sampled\nred-noise time series. Monthly Notices of the Royal Astronomical Society, 445(1):437\u2013459,\n2014.\n\n[27] K. B. \u00d3lafsd\u00f3ttir, M. Schulz, and M. Mudelsee. Red\ufb01t-x: Cross-spectral analysis of unevenly\n\nspaced paleoclimate time series. Computers & Geosciences, 91:11\u201318, 2016.\n\n[28] L. E. Nieto-Barajas and T. Sinha. Bayesian interpolation of unequally spaced time series.\n\nStochastic environmental research and risk assessment, 29(2):577\u2013587, 2015.\n\n[29] J. C. Snyder, M. Rupp, K. Hansen, K.-R. M\u00fcller, and K. Burke. Finding density functionals\n\nwith machine learning. Physical review letters, 108(25):253002, 2012.\n\n[30] M. Hirn, S. Mallat, and N. Poilvert. Wavelet scattering regression of quantum chemical energies.\n\nMultiscale Modeling & Simulation, 15(2):827\u2013863, 2017.\n\n[31] J. Bruna, W. Zaremba, A. Szlam, and Y. Lecun. Spectral networks and locally connected\n\nnetworks on graphs. In ICLR, 2014.\n\n[32] M. Henaff, J. Bruna, and Y. LeCun. Deep convolutional networks on graph-structured data.\n\narXiv preprint arXiv:1506.05163, 2015.\n\n10\n\n\f[33] J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst. Geodesic convolutional neural\nnetworks on riemannian manifolds. In Proceedings of the IEEE international conference on\ncomputer vision workshops, pages 37\u201345, 2015.\n\n[34] X. Jia, B. De Brabandere, T. Tuytelaars, and L. V. Gool. Dynamic \ufb01lter networks. In D. D. Lee,\nM. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information\nProcessing Systems 29, pages 667\u2013675. 2016.\n\n[35] F. Chollet. Xception: Deep learning with depthwise separable convolutions. arXiv preprint\n\narXiv:1610.02357, 2016.\n\n[36] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.\n\nIn\nProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages\n770\u2013778, 2016.\n\n[37] D.-A. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by\n\nexponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.\n\n[38] A Pukrittayakamee, M Malshe, M Hagan, LM Raff, R Narulkar, S Bukkapatnum, and R Ko-\nmanduri. Simultaneous \ufb01tting of a potential-energy surface and its corresponding force \ufb01elds\nusing feedforward neural networks. The Journal of chemical physics, 130(13):134101, 2009.\n\n[39] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.\n\n[40] L. C. Blum and J.-L. Reymond. 970 million druglike small molecules for virtual screening in\n\nthe chemical universe database GDB-13. J. Am. Chem. Soc., 131:8732, 2009.\n\n[41] J.-L. Reymond. The chemical space project. Acc. Chem. Res., 48(3):722\u2013730, 2015.\n\n11\n\n\f", "award": [], "sourceid": 647, "authors": [{"given_name": "Kristof", "family_name": "Sch\u00fctt", "institution": "TU Berlin"}, {"given_name": "Pieter-Jan", "family_name": "Kindermans", "institution": "Google AI Resident"}, {"given_name": "Huziel Enoc", "family_name": "Sauceda Felix", "institution": "Fritz-Haber-Institut der Max-Planck-Gesellschaft"}, {"given_name": "Stefan", "family_name": "Chmiela", "institution": "Technische Universit\u00e4t Berlin"}, {"given_name": "Alexandre", "family_name": "Tkatchenko", "institution": "University of Luxembourg"}, {"given_name": "Klaus-Robert", "family_name": "M\u00fcller", "institution": "TU Berlin"}]}