{"title": "Approximate Inference Turns Deep Networks into Gaussian Processes", "book": "Advances in Neural Information Processing Systems", "page_first": 3094, "page_last": 3104, "abstract": "Deep neural networks (DNN) and Gaussian processes (GP) are two powerful models with several theoretical connections relating them, but the relationship between their training methods is not well understood. In this paper, we show that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors. This enables us to relate solutions and iterations of a deep-learning algorithm to GP inference. As a result, we can obtain a GP kernel and a nonlinear feature map while training a DNN. Surprisingly, the resulting kernel is the neural tangent kernel. We show kernels obtained on real datasets and demonstrate the use of the GP marginal likelihood to tune hyperparameters of DNNs. Our work aims to facilitate further research on combining DNNs and GPs in practical settings.", "full_text": "Approximate Inference Turns Deep Networks into\n\nGaussian Processes\n\nMohammad Emtiyaz Khan\nRIKEN Center for AI Project\n\nTokyo, Japan\n\nemtiyaz.khan@riken.jp\n\nEhsan Abedi* \u2020\n\nEPFL\n\nLausanne, Switzerland\nehsan.abedi@epfl.ch\n\nAlexander Immer* \u2020\n\nEPFL\n\nLausanne, Switzerland\n\nalexander.immer@epfl.ch\n\nMaciej Korzepa* \u2020\n\nTechnical University of Denmark\n\nKgs. Lyngby, Denmark\n\nmjko@dtu.dk\n\nAbstract\n\nDeep neural networks (DNN) and Gaussian processes (GP) are two powerful\nmodels with several theoretical connections relating them, but the relationship\nbetween their training methods is not well understood. In this paper, we show that\ncertain Gaussian posterior approximations for Bayesian DNNs are equivalent to\nGP posteriors. This enables us to relate solutions and iterations of a deep-learning\nalgorithm to GP inference. As a result, we can obtain a GP kernel and a nonlinear\nfeature map while training a DNN. Surprisingly, the resulting kernel is the neural\ntangent kernel. We show kernels obtained on real datasets and demonstrate the use\nof the GP marginal likelihood to tune hyperparameters of DNNs. Our work aims\nto facilitate further research on combining DNNs and GPs in practical settings.\n\n1\n\nIntroduction\n\nDeep neural networks (DNN) and Gaussian processes (GP) models are both powerful models with\ncomplementary strengths and weaknesses. DNNs achieve state-of-the-art results on many real-world\nproblems providing scalable end-to-end learning, but they can over\ufb01t on small datasets and be\novercon\ufb01dent. In contrast, GPs are suitable for small datasets and compute con\ufb01dence estimates,\nbut they are not scalable and choosing a good kernel in practice is challenging [3]. Combining their\nstrengths to solve real-world problems is an important problem.\nTheoretically, the two models are closely related to each other. Previous work has shown that as the\nwidth of a DNN increases to in\ufb01nity, the DNN converges to a GP [4, 5, 13, 16, 22]. This relationship\nis surprising and gives us hope that a practical combination could be possible. Unfortunately, it is not\nclear how one can use such connections in practice, e.g., to perform fast inference in GPs by using\ntraining methods of DNNs, or to reduce over\ufb01tting in DNNs by using GP inference. We argue that, to\nsolve such practical problems, we need the relationship not only between the models but also between\ntheir training procedures. The purpose of this paper is to provide such a theoretical relationship.\nWe present theoretical results aimed at connecting the training methods of deep learning and GP\nmodels. We show that the Gaussian posterior approximations for Bayesian DNNs, such as those\nobtained by Laplace approximation and variational inference (VI), are equivalent to posterior dis-\ntributions of GP regression models. This result enables us to relate the solutions and iterations of a\ndeep-learning algorithm to GP inference. See Fig. 1 for our approach called DNN2GP. In addition,\n\n\u2020Equal contribution. *This work is performed during an internship at the RIKEN Center for AI project.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: A summary of our approach called DNN2GP in three steps.\n\n(c) GP kernel\n\n(a) 2D classi\ufb01cation problem\n\n(b) GP kernel feature \u03c6(x)\n\n(d) GP posterior mean\n\nFigure 2: Fig. (a) shows a 2D binary-classi\ufb01cation problem along with the predictive distribution\nof a DNN using 513 parameters. The corresponding feature and kernel matrices obtained using\nour approach are shown in (b) and (c), respectively (the two classes are grouped, and marked with\nblue and orange color along the axes). Fig. (d) shows the GP posterior mean where we see a clear\nseparation between the two classes. Surprisingly, the border points A and D in (a) are also at the\nboundary in (d).\n\nwe can obtain GP kernels and nonlinear feature maps while training a DNN (see Fig. 2). Surprisingly,\na GP kernel we derive is equivalent to the recently proposed neural tangent kernel (NTK) [8].We\npresent empirical results where we visualize the feature-map obtained on benchmark datasets such\nas MNIST and CIFAR, and demonstrate their use for DNN hyperparameter tuning. The code to\nreproduce our results is available at https://github.com/team-approx-bayes/dnn2gp. The\nwork presented in this paper aims to facilitate further research on combining the strengths of DNNs\nand GPs in practical settings.\n\n1.1 Related Work\n\nThe equivalence between in\ufb01nitely-wide neural networks and GPs was originally discussed by Neal\n[16]. Subsequently, many works derived explicit expressions for the GP kernel corresponding to\nneural networks [4, 7, 16] and their deep variants [5, 6, 13, 18]. These works use a prior distribution\non weights and derive kernels by averaging over the prior. Our work differs from these works in the\nfact that we use the posterior approximations to relate DNNs to GPs. Unlike these previous results,\nour results hold for DNNs of \ufb01nite width.\nA GP kernel we derive is equivalent to the recently proposed Neural Tangent Kernel (NTK) [8],\nwhich is obtained by using the Jacobian of the DNN outputs. For randomly initialized trajectories,\nas the DNN width goes to in\ufb01nity, the NTK converges in probability to a deterministic kernel and\nremains asymptotically constant when training with gradient descent. Jacot et al. [8] motivate the\nNTK by using kernel gradient descent. Surprisingly, the NTK appears in our work with an entirely\ndifferent approach where we consider approximations of the posterior distribution over weights. Due\n\n2\n\nPosterior Approx.DNNGPLinear ModelStep A: Find a Gaussian posterior approximation given DNN weights\nBStep C: Find a GP whose predictions are the same as those of the linear model.p(w|y,X)\u21e1N(w|\u00b5,\u2303)AAACSXicbZBLSwMxFIUzrc/6qroUJCiCgpQZXehKBDeupKLVQqfUO2mmBpOZkGTUMs4P0R/kxp07/4MbF4q4MtMqPg8EPs7NITcnkJxp47oPTqE4MDg0PDJaGhufmJwqT88c6ThRhNZIzGNVD0BTziJaM8xwWpeKggg4PQ7OdvL58TlVmsXRoelK2hTQiVjICBhrtconctkXYE6DML3Irj6xm61+Yj1bwT5IqeJL3PMI8HQv+y/li+Qr5x+wjoBspVVedCtuT/gveB+wuD1/neum2irf++2YJIJGhnDQuuG50jRTUIYRTrOSn2gqgZxBhzYsRiCobqa9JjK8ZJ02DmNlT2Rwz/2eSEFo3RWBvZnvqX/PcvO/WSMx4WYzZZFMDI1I/6Ew4djEOK8Vt5mixPCuBSCK2V0xOQUFxNjyS7YE7/eX/8LRWsVbr6zt2za2UF8jaA4toGXkoQ20jXZRFdUQQbfoET2jF+fOeXJenbf+1YLzkZlFP1QovgOD9LlfACStep B: Find a linear model with outputs , features , and noise , such that its posterior is equal to the Gaussian approximation, i.e.,p(w|\u02dcy,X)=N(w|\u00b5,\u2303)AAACTHicbVDLSgMxFM3Ud33Vx85NsAgtlDJTBd0IghtXUtHWQmcomTTThiYzQ5JRypgPdOPCnV/hxoUigum0iloPBM499x7uzfFjRqWy7ScrNzM7N7+wuJRfXlldWy9sbDZllAhMGjhikWj5SBJGQ9JQVDHSigVB3Gfk2h+cjvrXN0RIGoVXahgTj6NeSAOKkTJSp4DjksuR6vtBeqvvXEVZl6RfylDrCvwqWroMj8cVRiw917+ME+ryRFe+i0va40iXO4WiXbUzwGniTEjxZDvIUO8UHt1uhBNOQoUZkrLt2LHyUiQUxYzovJtIEiM8QD3SNjREnEgvzcLQcM8oXRhEwrxQwUz96UgRl3LIfTM5ulP+7Y3E/3rtRAVHXkrDOFEkxONFQcKgiuAoWdilgmDFhoYgLKi5FeI+Eggrk3/ehOD8/fI0adaqzn61dmHSOABjLIIdsAtKwAGH4AScgTpoAAzuwTN4BW/Wg/VivVsf49GcNfFsgV/IzX8C4kW5Aw==\u02dcyAAAB8HicbVDLSsNAFL2prxofrbp0MyiCbkpSBd1ZEMFlBfuQNpTJZNIOnUzCzEQIoV/hxoUibv0F/8Kdn+BfOH0stPXAhcM593LvPX7CmdKO82UVlpZXVteK6/bG5tZ2qbyz21RxKgltkJjHsu1jRTkTtKGZ5rSdSIojn9OWP7wa+60HKhWLxZ3OEupFuC9YyAjWRrrvasYDmmejXvnQqTgToEXizshhrfR9eWJ/XNd75c9uEJM0okITjpXquE6ivRxLzQinI7ubKppgMsR92jFU4IgqL58cPEJHRglQGEtTQqOJ+nsix5FSWeSbzgjrgZr3xuJ/XifV4YWXM5GkmgoyXRSmHOkYjb9HAZOUaJ4Zgolk5lZEBlhiok1GtgnBnX95kTSrFfe0Ur01aZzBFEXYhwM4BhfOoQY3UIcGEIjgEZ7hxZLWk/VqvU1bC9ZsZg/+wHr/AdZGk08=\u270fAAAB73icbZDLSgMxFIbPeK31VnXpJlgEV2WmCrqz4MZlBXuBtpRMeqYNzWTGJCOUoS/hRrwgbn0EX8Odb2Om7UJbfwh8/P855Jzjx4Jr47rfztLyyuraem4jv7m1vbNb2Nuv6yhRDGssEpFq+lSj4BJrhhuBzVghDX2BDX94leWNe1SaR/LWjGLshLQvecAZNdZqtjHWXESyWyi6JXcisgjeDIqXn0+Znqvdwle7F7EkRGmYoFq3PDc2nZQqw5nAcb6daIwpG9I+tixKGqLupJN5x+TYOj0SRMo+acjE/d2R0lDrUejbypCagZ7PMvO/rJWY4KKTchknBiWbfhQkgpiIZMuTHlfIjBhZoExxOythA6ooM/ZEeXsEb37lRaiXS95pqXzjFitnMFUODuEITsCDc6jANVShBgwEPMALvDp3zqPz5rxPS5ecWc8B/JHz8QNqhJSh\u02dcy=(x)>w+\u270fAAACLXicbVBNaxsxENWmbeo6TeK0veUiYgouAbPrFNJLIdAecnQh/gCvE7TaWVtYuxLSbBqz7B/qpX+lFHJIKb32b1S7diFfD4Qeb2akeS/SUlj0/Rtv48nTZ5vPGy+aWy+3d3Zbe6+GVuWGw4Arqcw4YhakyGCAAiWMtQGWRhJG0eJTVR9dgrFCZWe41DBN2SwTieAMnXTR+hyikDEUy5J+pGGkZGyXqbuKUM9F2QlThvMoKa7Kd+chKk3/C19LekhD0FbI6pm23/Vr0IckWJP2yZukRv+i9TOMFc9TyJBLZu0k8DVOC2ZQcAllM8wtaMYXbAYTRzOWgp0WtduSvnVKTBNl3MmQ1urtiYKltjLhOqtl7f1aJT5Wm+SYfJgWItM5QsZXHyW5pKhoFR2NhQGOcukI40a4XSmfM8M4uoCbLoTgvuWHZNjrBkfd3heXxnuyQoPskwPSIQE5JifklPTJgHDyjfwgN+SX99279n57f1atG9565jW5A+/vP0kKrDs=(x)AAACCHicbVC7TsMwFHV4lvIKj40BiwqpLFVSkGCsxMJYJPqQmqhyHKe16tiR7SCqKCMLv8LCAEKsfAIbf4PTdoCWI1k+Ovde3XNPkDCqtON8W0vLK6tr66WN8ubW9s6uvbffViKVmLSwYEJ2A6QIo5y0NNWMdBNJUBww0glG10W9c0+kooLf6XFC/BgNOI0oRtpIffvYCwQL1Tg2X+YlQ5pXMy9GehhE2UOen/XtilNzJoCLxJ2RSuMwmqDZt7+8UOA0JlxjhpTquU6i/QxJTTEjedlLFUkQHqEB6RnKUUyUn00OyeGpUUIYCWke13Ci/p7IUKwKr6az8Kjma4X4X62X6ujKzyhPUk04ni6KUga1gEUqMKSSYM3GhiAsqfEK8RBJhLXJrmxCcOdPXiTtes09r9VvTRoXYIoSOAInoApccAka4AY0QQtg8AiewSt4s56sF+vd+pi2LlmzmQPwB9bnD52DnTo=\u02dcyAAAB+3icbVDLSsNAFJ34rPUV69LNYBFclUQEXRbduKxgH9CEMplM2qGTSZi5EUvIr7hxoYhbf8Sdf+OkzUJbDwwczrmXe+YEqeAaHOfbWlvf2Nzaru3Ud/f2Dw7to0ZPJ5mirEsTkahBQDQTXLIucBBskCpG4kCwfjC9Lf3+I1OaJ/IBZinzYzKWPOKUgJFGdsOLCUyCKPeAi5Dls6IY2U2n5cyBV4lbkSaq0BnZX16Y0CxmEqggWg9dJwU/Jwo4Fayoe5lmKaFTMmZDQyWJmfbzefYCnxklxFGizJOA5+rvjZzEWs/iwEyWSfWyV4r/ecMMoms/5zLNgEm6OBRlAkOCyyJwyBWjIGaGEKq4yYrphChCwdRVNyW4y19eJb2Lluu03PvLZvumqqOGTtApOkcuukJtdIc6qIsoekLP6BW9WYX1Yr1bH4vRNavaOUZ/YH3+APIzlQQ=AAAB+3icbVDLSsNAFJ34rPUV69LNYBFclUQEXRbduKxgH9CEMplM2qGTSZi5EUvIr7hxoYhbf8Sdf+OkzUJbDwwczrmXe+YEqeAaHOfbWlvf2Nzaru3Ud/f2Dw7to0ZPJ5mirEsTkahBQDQTXLIucBBskCpG4kCwfjC9Lf3+I1OaJ/IBZinzYzKWPOKUgJFGdsOLCUyCKPeAi5Dls6IY2U2n5cyBV4lbkSaq0BnZX16Y0CxmEqggWg9dJwU/Jwo4Fayoe5lmKaFTMmZDQyWJmfbzefYCnxklxFGizJOA5+rvjZzEWs/iwEyWSfWyV4r/ecMMoms/5zLNgEm6OBRlAkOCyyJwyBWjIGaGEKq4yYrphChCwdRVNyW4y19eJb2Lluu03PvLZvumqqOGTtApOkcuukJtdIc6qIsoekLP6BW9WYX1Yr1bH4vRNavaOUZ/YH3+APIzlQQ=AAAB+3icbVDLSsNAFJ34rPUV69LNYBFclUQEXRbduKxgH9CEMplM2qGTSZi5EUvIr7hxoYhbf8Sdf+OkzUJbDwwczrmXe+YEqeAaHOfbWlvf2Nzaru3Ud/f2Dw7to0ZPJ5mirEsTkahBQDQTXLIucBBskCpG4kCwfjC9Lf3+I1OaJ/IBZinzYzKWPOKUgJFGdsOLCUyCKPeAi5Dls6IY2U2n5cyBV4lbkSaq0BnZX16Y0CxmEqggWg9dJwU/Jwo4Fayoe5lmKaFTMmZDQyWJmfbzefYCnxklxFGizJOA5+rvjZzEWs/iwEyWSfWyV4r/ecMMoms/5zLNgEm6OBRlAkOCyyJwyBWjIGaGEKq4yYrphChCwdRVNyW4y19eJb2Lluu03PvLZvumqqOGTtApOkcuukJtdIc6qIsoekLP6BW9WYX1Yr1bH4vRNavaOUZ/YH3+APIzlQQ=AAAB+3icbVDLSsNAFJ34rPUV69LNYBFclUQEXRbduKxgH9CEMplM2qGTSZi5EUvIr7hxoYhbf8Sdf+OkzUJbDwwczrmXe+YEqeAaHOfbWlvf2Nzaru3Ud/f2Dw7to0ZPJ5mirEsTkahBQDQTXLIucBBskCpG4kCwfjC9Lf3+I1OaJ/IBZinzYzKWPOKUgJFGdsOLCUyCKPeAi5Dls6IY2U2n5cyBV4lbkSaq0BnZX16Y0CxmEqggWg9dJwU/Jwo4Fayoe5lmKaFTMmZDQyWJmfbzefYCnxklxFGizJOA5+rvjZzEWs/iwEyWSfWyV4r/ecMMoms/5zLNgEm6OBRlAkOCyyJwyBWjIGaGEKq4yYrphChCwdRVNyW4y19eJb2Lluu03PvLZvumqqOGTtApOkcuukJtdIc6qIsoekLP6BW9WYX1Yr1bH4vRNavaOUZ/YH3+APIzlQQ=w2AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmopTQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AypdkcA=AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmopTQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AypdkcA=AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmopTQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AypdkcA=AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmopTQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AypdkcA=w1AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AyjZkb8=AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AyjZkb8=AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AyjZkb8=AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AyjZkb8=1(x)AAACCHicbVC7TsMwFHV4lvIKMDJgUSGVpUoQEowVLIxFog+piSLHcVqrjh3ZDqKKOrLwKywMIMTKJ7DxNzhtBmi5kuWjc+7VPfeEKaNKO863tbS8srq2Xtmobm5t7+zae/sdJTKJSRsLJmQvRIowyklbU81IL5UEJSEj3XB0XejdeyIVFfxOj1PiJ2jAaUwx0oYK7CMvFCxS48R8uZcO6SRw616C9DCM84fJaWDXnIYzLbgI3BLUQFmtwP7yIoGzhHCNGVKq7zqp9nMkNcWMTKpepkiK8AgNSN9AjhKi/Hx6yASeGCaCsZDmcQ2n7O+JHCWq8Go6C4tqXivI/7R+puNLP6c8zTTheLYozhjUAhapwIhKgjUbG4CwpMYrxEMkEdYmu6oJwZ0/eRF0zhqu03Bvz2vNqzKOCjgEx6AOXHABmuAGtEAbYPAInsEreLOerBfr3fqYtS5Z5cwB+FPW5w9Y1ZooAAACCHicbVC7TsMwFHV4lvIKMDJgUSGVpUoQEowVLIxFog+piSLHcVqrjh3ZDqKKOrLwKywMIMTKJ7DxNzhtBmi5kuWjc+7VPfeEKaNKO863tbS8srq2Xtmobm5t7+zae/sdJTKJSRsLJmQvRIowyklbU81IL5UEJSEj3XB0XejdeyIVFfxOj1PiJ2jAaUwx0oYK7CMvFCxS48R8uZcO6SRw616C9DCM84fJaWDXnIYzLbgI3BLUQFmtwP7yIoGzhHCNGVKq7zqp9nMkNcWMTKpepkiK8AgNSN9AjhKi/Hx6yASeGCaCsZDmcQ2n7O+JHCWq8Go6C4tqXivI/7R+puNLP6c8zTTheLYozhjUAhapwIhKgjUbG4CwpMYrxEMkEdYmu6oJwZ0/eRF0zhqu03Bvz2vNqzKOCjgEx6AOXHABmuAGtEAbYPAInsEreLOerBfr3fqYtS5Z5cwB+FPW5w9Y1ZooAAACCHicbVC7TsMwFHV4lvIKMDJgUSGVpUoQEowVLIxFog+piSLHcVqrjh3ZDqKKOrLwKywMIMTKJ7DxNzhtBmi5kuWjc+7VPfeEKaNKO863tbS8srq2Xtmobm5t7+zae/sdJTKJSRsLJmQvRIowyklbU81IL5UEJSEj3XB0XejdeyIVFfxOj1PiJ2jAaUwx0oYK7CMvFCxS48R8uZcO6SRw616C9DCM84fJaWDXnIYzLbgI3BLUQFmtwP7yIoGzhHCNGVKq7zqp9nMkNcWMTKpepkiK8AgNSN9AjhKi/Hx6yASeGCaCsZDmcQ2n7O+JHCWq8Go6C4tqXivI/7R+puNLP6c8zTTheLYozhjUAhapwIhKgjUbG4CwpMYrxEMkEdYmu6oJwZ0/eRF0zhqu03Bvz2vNqzKOCjgEx6AOXHABmuAGtEAbYPAInsEreLOerBfr3fqYtS5Z5cwB+FPW5w9Y1ZooAAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4AAAB/XicbVC9TsMwGPzCbykFAisDERVSWaqEBUYkFsYi0R+piSLHcVqrjh3ZDqKKMrLwKiwMIMRrsPE2OG0HaDnJ8unOn3zfRRmjSrvut7W2vrG5tV3bqe829vYP7MNGT4lcYtLFggk5iJAijHLS1VQzMsgkQWnESD+a3FR+/4FIRQW/19OMBCkacZpQjLSRQvvEjwSL1TQ1V+FnY1qGXstPkR5HSfFYnod20227MzirxFuQJizQCe0vPxY4TwnXmCGlhp6b6aBAUlPMSFn3c0UyhCdoRIaGcpQSFRSzRUrnzCixkwhpDtfOTP09UaBUVVnNyyqiWvYq8T9vmOvkKigoz3JNOJ5/lOTM0cKpWnFiKgnWbGoIwpKarA4eI4mwNt3VTQne8sqrpHfR9ty2d+dCDY7hFFrgwSVcwy10oAsYnuAF3uDderZerY95XWvWorcj+APr8we4/Ji1AAAB/XicbVC9TsMwGPzCbykFAisDERVSWaqEBUYkFsYi0R+piSLHcVqrjh3ZDqKKMrLwKiwMIMRrsPE2OG0HaDnJ8unOn3zfRRmjSrvut7W2vrG5tV3bqe829vYP7MNGT4lcYtLFggk5iJAijHLS1VQzMsgkQWnESD+a3FR+/4FIRQW/19OMBCkacZpQjLSRQvvEjwSL1TQ1V+FnY1qGXstPkR5HSfFYnod20227MzirxFuQJizQCe0vPxY4TwnXmCGlhp6b6aBAUlPMSFn3c0UyhCdoRIaGcpQSFRSzRUrnzCixkwhpDtfOTP09UaBUVVnNyyqiWvYq8T9vmOvkKigoz3JNOJ5/lOTM0cKpWnFiKgnWbGoIwpKarA4eI4mwNt3VTQne8sqrpHfR9ty2d+dCDY7hFFrgwSVcwy10oAsYnuAF3uDderZerY95XWvWorcj+APr8we4/Ji1AAACCHicbVC7TsMwFHXKq5RXgJEBiwqpLFXCAmMFC2OR6ENqoshxnNaqY0e2g6iijiz8CgsDCLHyCWz8DU6bAVquZPnonHt1zz1hyqjSjvNtVVZW19Y3qpu1re2d3T17/6CrRCYx6WDBhOyHSBFGOeloqhnpp5KgJGSkF46vC713T6Sigt/pSUr8BA05jSlG2lCBfeyFgkVqkpgv99IRnQZuw0uQHoVx/jA9C+y603RmBZeBW4I6KKsd2F9eJHCWEK4xQ0oNXCfVfo6kppiRac3LFEkRHqMhGRjIUUKUn88OmcJTw0QwFtI8ruGM/T2Ro0QVXk1nYVEtagX5nzbIdHzp55SnmSYczxfFGYNawCIVGFFJsGYTAxCW1HiFeIQkwtpkVzMhuIsnL4PuedN1mu6tU29dlXFUwRE4AQ3gggvQAjegDToAg0fwDF7Bm/VkvVjv1se8tWKVM4fgT1mfP1eVmiQ=AAACCHicbVC7TsMwFHV4lvIKMDJgUSGVpUoQEowVLIxFog+piSLHcVqrjh3ZDqKKOrLwKywMIMTKJ7DxNzhtBmi5kuWjc+7VPfeEKaNKO863tbS8srq2Xtmobm5t7+zae/sdJTKJSRsLJmQvRIowyklbU81IL5UEJSEj3XB0XejdeyIVFfxOj1PiJ2jAaUwx0oYK7CMvFCxS48R8uZcO6SRw616C9DCM84fJaWDXnIYzLbgI3BLUQFmtwP7yIoGzhHCNGVKq7zqp9nMkNcWMTKpepkiK8AgNSN9AjhKi/Hx6yASeGCaCsZDmcQ2n7O+JHCWq8Go6C4tqXivI/7R+puNLP6c8zTTheLYozhjUAhapwIhKgjUbG4CwpMYrxEMkEdYmu6oJwZ0/eRF0zhqu03Bvz2vNqzKOCjgEx6AOXHABmuAGtEAbYPAInsEreLOerBfr3fqYtS5Z5cwB+FPW5w9Y1ZooAAACCHicbVC7TsMwFHV4lvIKMDJgUSGVpUoQEowVLIxFog+piSLHcVqrjh3ZDqKKOrLwKywMIMTKJ7DxNzhtBmi5kuWjc+7VPfeEKaNKO863tbS8srq2Xtmobm5t7+zae/sdJTKJSRsLJmQvRIowyklbU81IL5UEJSEj3XB0XejdeyIVFfxOj1PiJ2jAaUwx0oYK7CMvFCxS48R8uZcO6SRw616C9DCM84fJaWDXnIYzLbgI3BLUQFmtwP7yIoGzhHCNGVKq7zqp9nMkNcWMTKpepkiK8AgNSN9AjhKi/Hx6yASeGCaCsZDmcQ2n7O+JHCWq8Go6C4tqXivI/7R+puNLP6c8zTTheLYozhjUAhapwIhKgjUbG4CwpMYrxEMkEdYmu6oJwZ0/eRF0zhqu03Bvz2vNqzKOCjgEx6AOXHABmuAGtEAbYPAInsEreLOerBfr3fqYtS5Z5cwB+FPW5w9Y1ZooAAACCHicbVC7TsMwFHV4lvIKMDJgUSGVpUoQEowVLIxFog+piSLHcVqrjh3ZDqKKOrLwKywMIMTKJ7DxNzhtBmi5kuWjc+7VPfeEKaNKO863tbS8srq2Xtmobm5t7+zae/sdJTKJSRsLJmQvRIowyklbU81IL5UEJSEj3XB0XejdeyIVFfxOj1PiJ2jAaUwx0oYK7CMvFCxS48R8uZcO6SRw616C9DCM84fJaWDXnIYzLbgI3BLUQFmtwP7yIoGzhHCNGVKq7zqp9nMkNcWMTKpepkiK8AgNSN9AjhKi/Hx6yASeGCaCsZDmcQ2n7O+JHCWq8Go6C4tqXivI/7R+puNLP6c8zTTheLYozhjUAhapwIhKgjUbG4CwpMYrxEMkEdYmu6oJwZ0/eRF0zhqu03Bvz2vNqzKOCjgEx6AOXHABmuAGtEAbYPAInsEreLOerBfr3fqYtS5Z5cwB+FPW5w9Y1ZooAAACCHicbVC7TsMwFHV4lvIKMDJgUSGVpUoQEowVLIxFog+piSLHcVqrjh3ZDqKKOrLwKywMIMTKJ7DxNzhtBmi5kuWjc+7VPfeEKaNKO863tbS8srq2Xtmobm5t7+zae/sdJTKJSRsLJmQvRIowyklbU81IL5UEJSEj3XB0XejdeyIVFfxOj1PiJ2jAaUwx0oYK7CMvFCxS48R8uZcO6SRw616C9DCM84fJaWDXnIYzLbgI3BLUQFmtwP7yIoGzhHCNGVKq7zqp9nMkNcWMTKpepkiK8AgNSN9AjhKi/Hx6yASeGCaCsZDmcQ2n7O+JHCWq8Go6C4tqXivI/7R+puNLP6c8zTTheLYozhjUAhapwIhKgjUbG4CwpMYrxEMkEdYmu6oJwZ0/eRF0zhqu03Bvz2vNqzKOCjgEx6AOXHABmuAGtEAbYPAInsEreLOerBfr3fqYtS5Z5cwB+FPW5w9Y1ZooAAACCHicbVC7TsMwFHV4lvIKMDJgUSGVpUoQEowVLIxFog+piSLHcVqrjh3ZDqKKOrLwKywMIMTKJ7DxNzhtBmi5kuWjc+7VPfeEKaNKO863tbS8srq2Xtmobm5t7+zae/sdJTKJSRsLJmQvRIowyklbU81IL5UEJSEj3XB0XejdeyIVFfxOj1PiJ2jAaUwx0oYK7CMvFCxS48R8uZcO6SRw616C9DCM84fJaWDXnIYzLbgI3BLUQFmtwP7yIoGzhHCNGVKq7zqp9nMkNcWMTKpepkiK8AgNSN9AjhKi/Hx6yASeGCaCsZDmcQ2n7O+JHCWq8Go6C4tqXivI/7R+puNLP6c8zTTheLYozhjUAhapwIhKgjUbG4CwpMYrxEMkEdYmu6oJwZ0/eRF0zhqu03Bvz2vNqzKOCjgEx6AOXHABmuAGtEAbYPAInsEreLOerBfr3fqYtS5Z5cwB+FPW5w9Y1ZooAAACCHicbVC7TsMwFHV4lvIKMDJgUSGVpUoQEowVLIxFog+piSLHcVqrjh3ZDqKKOrLwKywMIMTKJ7DxNzhtBmi5kuWjc+7VPfeEKaNKO863tbS8srq2Xtmobm5t7+zae/sdJTKJSRsLJmQvRIowyklbU81IL5UEJSEj3XB0XejdeyIVFfxOj1PiJ2jAaUwx0oYK7CMvFCxS48R8uZcO6SRw616C9DCM84fJaWDXnIYzLbgI3BLUQFmtwP7yIoGzhHCNGVKq7zqp9nMkNcWMTKpepkiK8AgNSN9AjhKi/Hx6yASeGCaCsZDmcQ2n7O+JHCWq8Go6C4tqXivI/7R+puNLP6c8zTTheLYozhjUAhapwIhKgjUbG4CwpMYrxEMkEdYmu6oJwZ0/eRF0zhqu03Bvz2vNqzKOCjgEx6AOXHABmuAGtEAbYPAInsEreLOerBfr3fqYtS5Z5cwB+FPW5w9Y1Zoo2(x)AAACCHicbVC7TsMwFHXKq5RXgJEBiwqpLFVSIcFYwcJYJPqQmihyHKe16sSR7SCqKCMLv8LCAEKsfAIbf4PTZoCWK1k+Oude3XOPnzAqlWV9G5WV1bX1jepmbWt7Z3fP3D/oSZ4KTLqYMy4GPpKE0Zh0FVWMDBJBUOQz0vcn14XevydCUh7fqWlC3AiNYhpSjJSmPPPY8TkL5DTSX+YkY5p7rYYTITX2w+whP/PMutW0ZgWXgV2COiir45lfTsBxGpFYYYakHNpWotwMCUUxI3nNSSVJEJ6gERlqGKOISDebHZLDU80EMORCv1jBGft7IkORLLzqzsKiXNQK8j9tmKrw0s1onKSKxHi+KEwZVBwWqcCACoIVm2qAsKDaK8RjJBBWOruaDsFePHkZ9FpN22rat+f19lUZRxUcgRPQADa4AG1wAzqgCzB4BM/gFbwZT8aL8W58zFsrRjlzCP6U8fkDWmWaKQ==AAACCHicbVC7TsMwFHXKq5RXgJEBiwqpLFVSIcFYwcJYJPqQmihyHKe16sSR7SCqKCMLv8LCAEKsfAIbf4PTZoCWK1k+Oude3XOPnzAqlWV9G5WV1bX1jepmbWt7Z3fP3D/oSZ4KTLqYMy4GPpKE0Zh0FVWMDBJBUOQz0vcn14XevydCUh7fqWlC3AiNYhpSjJSmPPPY8TkL5DTSX+YkY5p7rYYTITX2w+whP/PMutW0ZgWXgV2COiir45lfTsBxGpFYYYakHNpWotwMCUUxI3nNSSVJEJ6gERlqGKOISDebHZLDU80EMORCv1jBGft7IkORLLzqzsKiXNQK8j9tmKrw0s1onKSKxHi+KEwZVBwWqcCACoIVm2qAsKDaK8RjJBBWOruaDsFePHkZ9FpN22rat+f19lUZRxUcgRPQADa4AG1wAzqgCzB4BM/gFbwZT8aL8W58zFsrRjlzCP6U8fkDWmWaKQ==AAACCHicbVC7TsMwFHXKq5RXgJEBiwqpLFVSIcFYwcJYJPqQmihyHKe16sSR7SCqKCMLv8LCAEKsfAIbf4PTZoCWK1k+Oude3XOPnzAqlWV9G5WV1bX1jepmbWt7Z3fP3D/oSZ4KTLqYMy4GPpKE0Zh0FVWMDBJBUOQz0vcn14XevydCUh7fqWlC3AiNYhpSjJSmPPPY8TkL5DTSX+YkY5p7rYYTITX2w+whP/PMutW0ZgWXgV2COiir45lfTsBxGpFYYYakHNpWotwMCUUxI3nNSSVJEJ6gERlqGKOISDebHZLDU80EMORCv1jBGft7IkORLLzqzsKiXNQK8j9tmKrw0s1onKSKxHi+KEwZVBwWqcCACoIVm2qAsKDaK8RjJBBWOruaDsFePHkZ9FpN22rat+f19lUZRxUcgRPQADa4AG1wAzqgCzB4BM/gFbwZT8aL8W58zFsrRjlzCP6U8fkDWmWaKQ==AAACCHicbVC7TsMwFHXKq5RXgJEBiwqpLFVSIcFYwcJYJPqQmihyHKe16sSR7SCqKCMLv8LCAEKsfAIbf4PTZoCWK1k+Oude3XOPnzAqlWV9G5WV1bX1jepmbWt7Z3fP3D/oSZ4KTLqYMy4GPpKE0Zh0FVWMDBJBUOQz0vcn14XevydCUh7fqWlC3AiNYhpSjJSmPPPY8TkL5DTSX+YkY5p7rYYTITX2w+whP/PMutW0ZgWXgV2COiir45lfTsBxGpFYYYakHNpWotwMCUUxI3nNSSVJEJ6gERlqGKOISDebHZLDU80EMORCv1jBGft7IkORLLzqzsKiXNQK8j9tmKrw0s1onKSKxHi+KEwZVBwWqcCACoIVm2qAsKDaK8RjJBBWOruaDsFePHkZ9FpN22rat+f19lUZRxUcgRPQADa4AG1wAzqgCzB4BM/gFbwZT8aL8W58zFsrRjlzCP6U8fkDWmWaKQ==x1AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AypgkcA=AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AypgkcA=AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AypgkcA=AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AypgkcA=x2AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmIpbQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AyvkkcE=AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmIpbQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AyvkkcE=AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmIpbQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AyvkkcE=AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmIpbQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AyvkkcE=x1AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AypgkcA=AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AypgkcA=AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AypgkcA=AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe27du7+sNW6KOspwAqdwDh5cQQPuoAktYJDAM7zCm5M6L86787EYLTnFzjH8gfP5AypgkcA=x2AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmIpbQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AyvkkcE=AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmIpbQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AyvkkcE=AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmIpbQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AyvkkcE=AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTK9aYdOJmFmIpbQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEsG1cd1vZ219Y3Nru7RT3t3bPzisHB23dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5zf3OIyrNY/lgpgn2IzqSPOSMGiv5fkTNOAizp9mgPqhU3Zo7B1klXkGqUKA5qHz5w5ilEUrDBNW657mJ6WdUGc4Ezsp+qjGhbEJH2LNU0gh1P5tnnpFzqwxJGCv7pCFz9fdGRiOtp1FgJ/OMetnLxf+8XmrC637GZZIalGxxKEwFMTHJCyBDrpAZMbWEMsVtVsLGVFFmbE1lW4K3/OVV0q7XPLfm3V9WGzdFHSU4hTO4AA+uoAF30IQWMEjgGV7hzUmdF+fd+ViMrjnFzgn8gfP5AyvkkcE=ABCDEF00.20.40.60.8150100500400300200100\u221215\u221210\u22125051015data examplesDNN weights2040608010010080604020\u22121e+3\u22125e+20e+05e+21e+3data examplesdata examplesABCDEF\u2212200\u22121000100200\fto connections to the NTK, we expect similar properties for our kernel. Our approach additionally\nshows that we can obtain other types of kernels by using different approximate inference methods.\nIn a recent work, Lee et al. [14] derive the mean and covariance function corresponding to the GP\ninduced by the NTK. Unfortunately, the model does not correspond to inference in a GP model (see\nSection 2.3.1 in their paper). Our approach does not have this issue and we can express Gaussian\nposterior approximations on a Bayesian DNN as inference in a GP regression model.\n\n2 Deep Neural Networks (DNNs) and Gaussian Processes (GPs)\n\nThe goal of this paper is to present a theoretical relationship between training methods of DNNs\nand GPs. DNNs are typically trained by minimizing an empirical loss between the data and the\ni=1 of N examples\npredictions. For example, in supervised learning with a dataset D := {(xi, yi)}N\nof input xi \u2208 RD and output yi \u2208 RK, we can minimize a loss of the following form:\n2 \u03b4w(cid:62)w, where (cid:96)i(w) := (cid:96)(yi, f w(xi)),\n\nN(cid:88)\n\n(cid:96)i(w) + 1\n\n\u00af(cid:96)(D, w) :=\n\ni=1\n\n(1)\n\nwhere f w(x) \u2208 RK denotes the DNN outputs with weights w \u2208 RP , (cid:96)(y, f (x)) denotes a loss\nfunction between an output y and the function f (x), and \u03b4 is a small L2 regularizer.2. We assume the\nloss function to be twice differentiable and strictly convex in f (e.g., squared loss and cross-entropy\nloss). An attractive feature of DNNs is that they can be trained using stochastic-gradient (SG)\nmethods [11]. Such methods scale well to large data settings.\nGP models use an entirely different modeling approach which is based on directly modeling the\nfunctions rather than the parameters. For example, for regression problems with scalar outputs yi \u2208 R,\nconsider the following linear basis-function model with a nonlinear feature-map \u03c6(x) : RD (cid:55)\u2192 RP :\n(2)\n\n(cid:62)w + \u0001, with \u0001 \u223c N (0, \u03c32), and w \u223c N (0, \u03b4\u22121IP ),\n\ny = \u03c6(x)\n\n)) ,\n\nwhere IP is a P \u00d7 P identity matrix and \u03c32 is the output noise variance. De\ufb01ning the function to be\nf (x) := \u03c6(x)(cid:62)w, the predictive distribution p(f (x\u2217)|x\u2217,D) at a new test input x\u2217 is equal to that\nof the following model directly de\ufb01ned with a GP prior over f (x) [23]:\ny = f (x) + \u0001, with f (x) \u223c GP (0, \u03ba(x, x(cid:48)\n\n(3)\nwhere \u03ba(x, x(cid:48)) := E[f (x)f (x(cid:48))] = \u03b4\u22121\u03c6(x)(cid:62)\u03c6(x(cid:48)) is the covariance function or kernel of the GP.\nThe function-space model is more general in the sense that it can also deal with in\ufb01nite-dimensional\nvector feature maps \u03c6(x), giving us a nonparametric model. This view has been used to show that as\na DNN becomes in\ufb01nitely wide it tends to a GP, by essentially showing that averaging over p(w)\nwith the feature map induced by a DNN leads to a GP covariance function [16].\nAn attractive property of the function-space formulation as opposed to the weight-space formulation,\nsuch as (1), is that the posterior distribution has a closed-form expression. Another attractive property\nis that the posterior is usually unimodal, unlike the loss \u00afl(D, w) which is typically nonconvex.\nUnfortunately, the computation of the posterior takes O(N 3) which is infeasible for large datasets.\nGPs also require choosing a good kernel [23]. Unlike DNNs, inference in GPs remains much more\ndif\ufb01cult.\nTo summarize, despite the similarities between the two models, their training methods are fundamen-\ntally different. While DNNs employ stochastic optimization, GPs use closed-form updates. How can\nwe relate these seemingly different training procedures in practical settings, e.g., without assuming\nin\ufb01nite-width DNNs? In this paper, we provide an answer to this question. We derive theoretical\nresults that relate the solutions and iterations of deep-learning algorithms to GP inference. We do\nso by \ufb01rst \ufb01nding a Gaussian posterior approximation (Step A in Fig. 1), then use it to \ufb01nd a linear\nbasis-function model (Step B in Fig. 1) and its corresponding GP (Step C in Fig. 1). We start in the\nnext section with our \ufb01rst theoretical result.\n\n2We can assume that \u03b4 is small enough that it does not affect the DNN\u2019s generalization.\n\n3\n\n\f3 Relating Minima of the Loss to GP Inference via Laplace Approximation\n\nIn this section, we present theoretical results relating minima of a deep-learning loss (1) to inference\nin GP models. A local minimizer w\u2217 of the loss (1) satis\ufb01es the following \ufb01rst-order and second-\norder conditions [17]: \u2207w \u00af(cid:96)(D, w\u2217) = 0 and \u22072\n\u00af(cid:96)(D, w\u2217) (cid:31) 0. Deep-learning optimizers, such as\nRMSprop and Adam, aim to \ufb01nd such minimizers, and our goal is to relate them to GP inference.\nStep A (Laplace Approximation): To do so, we will use an approximate inference method called\nthe Laplace approximation [1]. The minima of the loss (1) corresponds to a mode of the Bayesian\ni=1 e\u2212(cid:96)i(w)p(w) with prior distribution p(w) := N (w|0, \u03b4\u22121IP ), assuming\nthat the posterior is well-de\ufb01ned. The posterior distribution p(w|D) = p(D, w)/p(D) is usually\ncomputationally intractable and requires computationally-feasible approximation methods. The\nLaplace approximation uses the following Gaussian approximation for the posterior:\n\nmodel: p(D, w) :=(cid:81)N\n\nww\n\nN(cid:88)\n\ni=1\n\np(w|D) \u2248 N (w|\u00b5, \u03a3), where \u00b5 = w\u2217 and \u03a3\u22121 =\n\n\u22072\nww(cid:96)i(w\u2217) + \u03b4IP .\n\n(4)\n\nThis approximation can be directly built using the solutions found by deep-learning optimizers.\nStep B (Linear Model): The next step is to \ufb01nd a linear basis-function model whose posterior\ndistribution is equal to the Gaussian approximation (4). We will now show that this is always possible\nwhenever the gradient and Hessian of the loss3 can be approximated as follows:\n\n\u2207w(cid:96)(w) \u2248 \u03c6w(x)vw(x, y),\n\n(5)\nwhere \u03c6w(x) is a P \u00d7Q feature matrix with Q as a positive integer, vw(x, y) is a Q length vector, and\nDw(x, y) is a Q \u00d7 Q symmetric positive-de\ufb01nite matrix. We will now present results for a speci\ufb01c\nchoice \u03c6w, vw, and Dw. Our proof trivially generalizes to arbitrary choices of these quantities.\nFor the loss of form (1), the gradient and Hessian take the following form [15, 17]:\n\n\u22072\nww(cid:96)(w) \u2248 \u03c6w(x)Dw(x, y)\u03c6w(x)\n\n(cid:62),\n\n(cid:62)rw(x, y), \u22072\n\nww(cid:96)(w) = Jw(x)\n\n\u2207w(cid:96)(w) = Jw(x)\n\n(6)\nwhere Jw(x) := \u2207wf w(x)(cid:62) is a K \u00d7 P Jacobian matrix, rw(x, y) := \u2207f (cid:96)(y, f ) is the residual\nf f (cid:96)(y, f ), referred to as the noise precision, is the\nvector evaluated at f := f w(x), \u039bw(x, y) := \u22072\nK \u00d7 K Hessian matrix of the loss evaluated at f := f w(x), and Hf := \u22072\nwwf w(x). The similarity\nww(cid:96)(w) in (6),\nbetween (5) and (6) is striking. In fact, if we ignore the second term for the Hessian \u22072\nwe get the well-known Generalized Gauss-Newton (GGN) approximation [15, 17]:\n\n(cid:62)\u039bw(x, y)Jw(x) + Hf rw(x, y),\n\n(cid:62)\u039bw(x, y)Jw(x).\n\n\u22072\nww(cid:96)(w) \u2248 Jw(x)\n\n(7)\nThis gives us one choice for the approximation (5) where we can set \u03c6w(x) := Jw(x)(cid:62), vw(x, y) :=\nrw(x, y), and Dw(x, y) := \u039bw(x, y).\nWe are now ready to present our \ufb01rst theoretical result. Consider a Laplace approximation (4) but\nwith the GGN approximation (7) for the Hessian. We refer to this as Laplace-GGN, and denote\n\nit by N (w|\u00b5,(cid:101)\u03a3) where (cid:101)\u03a3 is the covariance obtained by using the GGN approximation. We\nWe construct a transformed dataset (cid:101)D = {(xi, \u02dcyi)}N\ndenote the Jacobian, noise-precision, and residual at w = w\u2217 by J\u2217(x), \u039b\u2217(x, y), and r\u2217(x, y).\n\u02dcyi := J\u2217(xi)w\u2217 \u2212 \u039b\u2217(xi, yi)\u22121r\u2217(xi, yi). We consider the following linear model for(cid:101)D:\ni=1 where the outputs \u02dcyi \u2208 RK are equal to\n\u22121) and w \u223c N (0, \u03b4\u22121IP ).\nTheorem 1. The Laplace approximation N (w|\u00b5,(cid:101)\u03a3) is equal to the posterior distribution p(w|(cid:101)D)\n\n\u02dcy = J\u2217(x)w + \u0001, with \u0001 \u223c N (0, (\u039b\u2217(x, y))\n\nThe following theorem states our result.\n\n(8)\n\nof the linear model (8).\n\nA proof is given in Appendix A.1. The linear model uses J\u2217(x) as the nonlinear feature map, and the\nnoise prevision \u039b\u2217(x, y) is obtained using the Hessian of the loss evaluated at f w\u2217 (x). The model is\nconstructed such that its posterior is equal to the Laplace approximation and it exploits the quadratic\napproximation at w\u2217. We now describe the \ufb01nal step relating the linear model to GPs.\n\n3For notational convenience, we sometime use (cid:96)(w) to denote (cid:96)(y, f w(x)).\n\n4\n\n\f(cid:62)(cid:1) .\n\n(cid:0)0, \u03b4\u22121J\u2217(x)J\u2217(x(cid:48)\n\nStep C (GP Model): To get a GP model, we use the equivalence between the weight-space view\nshown in (2) and the function-space view shown in (3). With this, we get the following GP regression\n\nmodel whose predictive distribution p(f (x\u2217)|x\u2217,(cid:101)D) is equal to that of the linear model (8):\n\n)\n\n\u02dcy = f (x) + \u0001, with f (x) \u223c GP\n\n(9)\nNote that the kernel here is a multi-dimensional K \u00d7 K kernel. The steps A, B, and C together\nconvert a DNN de\ufb01ned in the weight-space to a GP de\ufb01ned in the function-space. We refer to this\napproach as \u201cDNN2GP\u201d.\nThe resulting GP predicts in the space of outputs \u02dcy and therefore results in different predictions\nthan the DNN, but it is connected to it through the Laplace approximation as shown in Theorem 1.\nIn Appendix B, we describe prediction of the outputs y (instead of \u02dcy) using this GP. Note that\nour approach leads to a heteroscedastic GP which could be bene\ufb01cial. Even though our derivation\nassumes a Gaussian prior and DNN model, the approach holds for other types of priors and models.\nRelationship to NTK: The GP kernel in (9) is the Neural Tangent Kernel 4 (NTK) [8] which has\ndesirable theoretical properties. As the width of the DNN is increasing to in\ufb01nity, the kernel converges\nin probability to a deterministic kernel and also remains asymptotically constant during training. Our\nkernel is the NTK de\ufb01ned at w\u2217 and is expected to have similar properties. It is also likely that, as the\nDNN width is increased, the Laplace-GGN approximation has similar properties as a GP posterior,\nand can be potentially used to improve the performance of DNNs. For example, we can use GPs to\ntune hyperparameters of DNNs. The function-space view is also useful to understand relationships\nbetween data examples. Another advantage of our approach is that we can derive kernels other than\nthe NTK. Any approximation of the form (5) will always result in a linear model similar to (8).\nAccuracy of the GGN approximation: This approximation is accurate when the model f w(x) can\n\ufb01t the data well, in which case the residuals rw(x, y) are close to zero for all training examples and\nthe second term in (6) goes to zero [2, 15, 17]. The GGN approximation is a convenient option to\nderive DNN2GP, but, as it is clear from (5), other types of approximations can also be used.\n\n4 Relating Iterations of a Deep-Learning Algorithm to GP Inference via VI\n\nIn this section, we present theoretical results relating iterations of an RMSprop-like algorithm to GP\ninference. The RMSprop algorithm [21] uses the following updates (all operations are element-wise):\n\n\u22121 \u02c6g(wt),\n\nwt+1 \u2190 wt \u2212 \u03b1t (\u221ast+1 + \u2206)\n\nst+1 \u2190 (1 \u2212 \u03b2t)st + \u03b2t (\u02c6g(wt))2 ,\n\n(10)\nwhere t is the iteration, \u03b1t > 0 and 0 < \u03b2t < 1 are learning rates, \u2206 > 0 is a small scalar, and \u02c6g(w)\nis a stochastic-gradient estimate for \u00af(cid:96)(D, w) obtained using minibatches. Our goal is to relate the\niterates wt to GP inference using our DNN2GP approach, but this requires a posterior approximation\nde\ufb01ned at each wt. We cannot use the Laplace approximation because it is only valid at w\u2217. We will\ninstead use a version of RMSprop proposed in [10] for variational inference (VI), which enables us\nto construct a GP inference problem at each wt.\nStep A (Variational Inference): The variational online-Newton (VON) algorithm proposed in [10]\noptimizes the variational objective, but takes an algorithmic form similar to RMSprop (see a detailed\ndiscussion in [10]). Below, we show a batch version of VON, derived using Eq. (54) in [10]:\n\n\u00b5t+1 \u2190 \u00b5t \u2212 \u03b2t(St+1 + \u03b4IP )\nSt+1 \u2190 (1 \u2212 \u03b2t)St + \u03b2t\n\nN(cid:88)\n\ni=1\n\n\u2207w \u00af(cid:96)(D, w)(cid:3) ,\n(cid:2)\n\u22121Eqt(w)\n(cid:2)\nww(cid:96)i(w)(cid:3) ,\nEqt(w)\n\u22072\n\n(11)\n\n(12)\n\nwhere St is a scaling matrix similar to the scaling vector st in RMSprop, and the Gaussian approxi-\nmation at iteration t is de\ufb01ned as qt(w) := N (w|\u00b5t, \u03a3t) where \u03a3t := (St + \u03b4IP )\u22121. Since there\nare no closed-form expressions for the expectations, the Monte Carlo (MC) approximation is used.\nStep B (Linear Model): As before, we assume the choices for (5) obtained by using the GGN\napproximation (7). We consider the variant for VON where the GGN approximation is used for the\nHessian and MC approximation is used for the expectations with respect to qt(w). We call this the\n\n4The NTK corrsponds to \u03b4 = 1 which implies a standard normal prior on weights.\n\n5\n\n\fVariational Online GGN or VOGGN algorithm. A similar algorithm has recently been used in [19]\nwhere it shows competitive performance to Adam and SGD.\nWe now present a theorem relating iterations of VOGGN to linear models. We denote the Gaussian\n\nthe GGN approximation. We present theoretical results for VOGGN with 1 MC sample which\nis denoted by wt \u223c \u02dcqt(w). Our proof in Appendix A.2 discusses a more general setting with\nmultiple MC samples. Similarly to the previous section, we \ufb01rst de\ufb01ne a transformed dataset:\ni=1 where \u02dcyi,t := Jwt(xi)wt \u2212 \u039bwt(xi, yi)\u22121rwt(xi, yi), and then a linear\n\napproximation obtained at iteration t by \u02dcqt(w) := N (w|\u00b5t,(cid:101)\u03a3t) where (cid:101)\u03a3t is used to emphasize\n(cid:101)Dt := {(xi, \u02dcyi,t)}N\n:= (1 \u2212 \u03b2t)(cid:101)\u03a3\n\n\u02dcyt = Jwt(x)w + \u0001, with \u0001 \u223c N (0, (\u03b2t\u039bwt(x, y))\n\nbasis-function model:\n\n(13)\n\n\u22121\n\nt\n\nt + \u03b2t\u03b4IP and mt := (1 \u2212 \u03b2t)Vt(cid:101)\u03a3\n\n\u22121) and w \u223c N (mt, Vt)\n\u22121\nwith V\u22121\nt wt. The model is very similar to\nthe one obtained for Laplace approximation, but is now de\ufb01ned using the iterates wt instead of the\nminimum w\u2217. The prior over w is not the standard Gaussian anymore, rather a correlated Gaussian\nderived from qt(w). The theorem below states the result (a proof is given in Appendix A.2).\n\nTheorem 2. The Gaussian approximation N (w|wt+1,(cid:101)\u03a3t+1) at iteration t + 1 of the VOGGN\nupdate is equal to the posterior distribution p(w|(cid:101)Dt) of the linear model (13).\n(cid:0)Jwt(x)mt, Jwt(x)VtJwt(x(cid:48)\n\nStep C (GP Model): The linear model (13) has the same predictive distribution as the GP below:\n\n(cid:62)(cid:1) .\n\n\u02dcyt = f t(x) + \u0001, with f t(x) \u223c GP\n\n(14)\n\n)\n\nThe kernel here is similar to the NTK but now there is a covariance term Vt which incorporates the\neffect of the previous qt(w) as a prior. Our DNN2GP approach shows that one iteration of VOGGN\nin the weight-space is equivalent to inference in a GP regression model de\ufb01ned in a transformed\nfunction-space with respect to a kernel similar to the NTK. This can be compared with the results\nin [8], where learning by plain gradient descent is shown to be equivalent to kernel gradient descent\nin function-space. Similarly to the Laplace case, the resulting GP predicts in the space of outputs \u02dcyt,\nbut predictions for yt can be obtained using a method described in Appendix B.\nA Deep-Learning Optimizer Derived from VOGGN: The VON algorithm, even though similar to\nRMSprop, does not converge to the minimum of the loss. This is because it optimizes the variational\nobjective. Fortunately, a slight modi\ufb01cation of this algorithm gives us a deep-learning optimizer\nwhich is similar to RMSprop but is guaranteed to converge to the minimum of the loss. For this, we\napproximate the expectations in the updates (11)-(12) at the mean \u00b5t. This is called the zeroth-order\ndelta approximation; see Appendix A.6 in [9] for details of this method. Using this approximation\nand denoting the mean \u00b5t by wt, we get the following update:\n\nN(cid:88)\n\ni=1\n\nww(cid:96)i(wt)(cid:3) .\n\n(cid:2)\n\u22072\n\nwt+1 \u2190 wt \u2212 \u03b2t(\u02c6St+1 + \u03b4IP )\n\n\u22121\u2207w \u00af(cid:96)(D, wt),\n\n\u02c6St+1 \u2190 (1 \u2212 \u03b2t)\u02c6St + \u03b2t\n\nWe refer to this as Online GGN or OGGN method. A \ufb01xed point w\u2217 of this iteration is also a\nminimizer of the loss since we have \u2207w \u00af(cid:96)(D, w\u2217) = 0. Unlike RMSprop, at each iteration, we still\nposterior of the linear model from Theorem (2) is equivalent to \u02c6qt when (cid:101)\u03a3t is replaced by \u02c6\u03a3t (see\nget a Gaussian approximation \u02c6qt(w) := N (w|wt, \u02c6\u03a3t) with \u02c6\u03a3t := (\u02c6St + \u03b4IP )\u22121. Therefore, the\nAppendix A.3). In conclusion, by using VI in our DNN2GP approach, we are able to relate the\niterations of a deep-learning optimizer to GP inference.\nImplementation of DNN2GP: In practice, both VOGGN and OGGN are computationally more\nexpensive than RMSprop because they involve computation of full covariance matrices. To address\nthis issue, we simply use the diagonal versions of these algorithms discussed in [10, 19]. Speci\ufb01cally,\nwe use the VOGN and OGN algorithms discussed in [19]. This implies that Vt is a diagonal matrix\nand the GP kernel can be obtained without requiring any computation of large matrices. Only\nJacobian computations are required. In our experiments, we also resort to computing the kernel over\na subset of data instead of the whole data, which further reduces the cost.\n\n6\n\n\fFigure 3: This \ufb01gure shows a visualization of the predictive distributions on a modi\ufb01ed version of\nthe Snelson dataset [20]. The left \ufb01gure shows Laplace and the right one shows VI. DNN2GP is\nour proposed method, elaborated upon in Appendix B, while DNN refers to a diagonal Gaussian\napproximation. We also compare to a GP with RBF kernel (GP-RBF). An MLP is used for DNN2GP\nand DNN. We see that, wherever the data is missing, the uncertainties are larger for our method than\nthe others. For classi\ufb01cation, we give an example in Fig. 9 in the appendix.\n5 Experimental Results\n\n5.1 Comparison of DNN2GP Uncertainty\n\nIn this section, we visualize the quality of the uncertainty of the GP obtained with our DNN2GP\napproach on a simple regression task. To approximate predicitive uncertainty for our approach, we\nuse the method described in Appendix B. We use both Laplace and VI approximations, referred to\nas \u2018DNN2GP-Laplace\u2019 and \u2018DNN2GP-VI\u2019, respectively. We compare it to the uncertainty obtained\nusing an MC approximation in the DNN (referred to as \u2018DNN-Laplace\u2019 and \u2018DNN-VI\u2019). We also\ncompare to a standard GP regression model with an RBF kernel (refer to as \u2018GP-RBF\u2019), whose kernel\nhyperparameters are chosen by optimizing the GP marginal likelihood.\nWe consider a version of the Snelson dataset [20] where, to assess the \u2018in-between\u2019 uncertainty,\nwe remove the data points between x = 1.5 and x = 3. We use a single hidden-layer MLP with\n32 units and sigmoidal transfer function. Fig. 3 shows the results for Laplace (left) and VI (right)\napproximation. For Laplace, we use Adam [11], and, for VI, we use VOGN [10]. The uncertainty\nprovided by DNN2GP is bigger than the other methods wherever the data is not observed.\n\n5.2 GP Kernel and Predictive Distribution for Classi\ufb01cation Datasets\n\nIn this section, we visualize the GP kernel and predictive distribution for DNNs trained on CIFAR-10\nand MNIST. Our goal is to show that our GP kernel and its predictions enhance our understanding\nof a DNN\u2019s performance on classi\ufb01cation tasks. We consider LeNet-5 [12] and compute both the\nLaplace and VI approximations. We show the visualization at the posterior mean.\nThe K \u00d7 K GP kernel \u03ba\u2217(x, x(cid:48)) := J\u2217(x)J\u2217(x(cid:48))(cid:62) results in a kernel matrix of dimensionality\nN K \u00d7 N K which makes it dif\ufb01cult to visualize for our datasets. To simplify, we compute the sum of\nthe diagonal entries of \u03ba\u2217(x, x(cid:48)) to get an N \u00d7 N matrix. This corresponds to modelling the output\nfor each class with an individual GP and then summing the kernels of these GPs. We also visualize the\nGP posterior mean: E[f (x)|D] = E[J\u2217(x)w|D] = J\u2217(x)w\u2217 \u2208 RK. and use the reparameterization\nthat allows to predict in the data space y instead of \u02dcy which is explained in Appendix B.\nFig. 4a shows the GP kernel matrix and the posterior mean for the Laplace approximation on MNIST.\nThe rows and columns containing 300 data examples are grouped according to the classes. The kernel\nmatrix clearly shows the correlations learned by the DNN. As expected, each row in the posterior\nmean also re\ufb02ects that the classes are correctly classi\ufb01ed (DNN test accuracy is 99%). Fig. 4b shows\nthe GP posterior mean after reparameterization for CIFAR-10 where we see a more noisy pattern due\nto a lower accuracy of around 68% on this task.\nFig. 4d shows the two components of the predictive variances that can be interpreted as \u201caleatoric\u201d\nand \u201cepistemic\u201d uncertainty. As shown in Eq.\n(48) in Appendix B.2, for a multiclass clas-\nsi\ufb01cation loss, the variance of the prediction of a label at an input x\u2217 is equal to \u039b\u2217(x\u2217) +\n\n\u039b\u2217(x\u2217)J\u2217(x\u2217)(cid:101)\u03a3J\u2217(x\u2217)(cid:62)\u039b\u2217(x\u2217). Similar to the linear basis function model, the two terms here\n\nhave an interpretation (e.g., see Eq. 3.59 in [1]). The \ufb01rst term can be interpreted as the aleatoric\nuncertainty (label noise), while the second term takes a form that resembles the epistemic uncertainty\n\n7\n\n\u22124\u221220246810x\u22122\u22121012yDNN-LaplaceDNN2GP-LaplaceGP-RBF\u22124\u221220246810x\u22122\u22121012yDNN-VIDNN2GP-VIGP-RBF\f(a) MNIST: GP posterior mean (left) and GP kernel matrix (right)\n\n(b) CIFAR: GP posterior mean\n\n(c) Binary-MNIST on digits 0 and 1\n\n(d) Epistemic (left) and aleatoric (right) uncertainties\nFigure 4: DNN2GP kernels, posterior means and uncertainties with LeNet5 of 300 samples on\nbinary MNIST in Fig. (c), MNIST in Fig. (a), and CIFAR-10 in Fig. (b,d). The colored regions\non the y-axis mark the classes. Fig. (a) shows the kernel and the predictive mean for the Laplace\napproximation, which gives 99% test accuracy. We see in the kernel that examples with same class\nlabels are correlated. Fig. (c) shows the same for binary MNIST trained only on digits 0 and 1 by\nusing VI. The kernel clearly shows the out-of-class predictive behavior where predictions are not\ncertain. Fig. (b) and (d) show the Laplace-GP on the more complex CIFAR-10 data set where we\nobtain 68% accuracy. Fig. (d) shows the two components of the predictive variance for CIFAR-10\nthat can be interpreted as epistemic (left) and aleatoric (right) uncertainties. The estimated epistemic\nuncertainty is much lower than the aleatoric uncertainty, implying that the model is not \ufb02exible\nenough. This is plausible since the accuracy of the model is not too high (merely 68%).\n\n(model noise). Fig. 4d shows these for CIFAR-10 where we see that the uncertainty of the model is\nlow (left) and the label noise rather high (right). This interpretation implies that the model is unable\nto \ufb02exibly model the data and instead explains it with high label noise.\nIn Fig. 4c, we study the kernel for classes outside of the training dataset using VI. We train LeNet-5\non digits 0 and 1 with VOGN and visualize the predictive mean and kernel on all 10 classes denoted\nby differently colored regions on the y-axis. We can see that there are slight correlations to the\nout-of-class samples but no overcon\ufb01dent predictions. In contrast, the pattern between 0 and 1 is\nquite strong. The kernel obtained with DNN2GP helps to interpret and visualize such correlations.\n\n5.3 Tuning the Hyperparameters of a DNN Using the GP Marginal Likelihood\n\nIn this section, we demonstrate the tuning of DNN hyperparameters by using the GP marginal\nlikelihood on a real and synthetic regression dataset. In the deep-learning literature, this is usually\ndone using cross-validation. Our goal is to demonstrate that with DNN2GP we can do this by simply\ncomputing the marginal likelihood on the training set.\nWe generate a synthetic regression dataset (N = 100; see Fig. 5) where there are a few data points\naround x = 0 but plenty away from it. We \ufb01t the data by using a neural network with single hidden\nlayer of 20 units and tanh nonlinearity. Our goal is to tune the regularization parameter \u03b4 to trade-off\nunder\ufb01tting vs over\ufb01tting. Fig. 5b and 5c show the train log marginal-likelihood obtained with the GP\nobtained by DNN2GP, along with the test and train mean-square error (MSE) obtained using a point\nestimate. Black stars indicate the hyperparameters chosen by using the test loss and log marginal\n\n8\n\n012345678930025020015010050\u2212150\u2212100\u221250050100150classdata examples5010015020025030030025020015010050\u22123e+4\u22122e+4\u22121e+40e+01e+42e+43e+4data examplesdata examples01234567893002502001501005000.20.40.60.81classdata\u00a0examples0130025020015010050\u221260\u221240\u221220020406080classdata examples5010015020025030030025020015010050\u22121.5e+4\u22121e+4\u22125e+30e+05e+31e+41.5e+4data examplesdata examples01234567893002502001501005000.050.10.150.20.25classdata\u00a0examples01234567893002502001501005000.050.10.150.20.25classdata\u00a0examples\f(a) Model \ufb01ts\n\n(b) Laplace Approximation\n\n(c) Variational Inference\n\nFigure 5: This \ufb01gure demonstrates the use of the GP marginal likelihood to tune hyperparameters of\na DNN. We tune the regularization parameter \u03b4 on a synthetic dataset shown in (a). Fig. (b) and (c)\nshow train and test MSE along with log of the marginal likelihoods on training data obtained with\nLaplace and VI respectively. We show the standard error over 10 runs. The optimal hyperparameters\naccording to test loss and marginal-likelihood (shown with black stars) match well.\n\nFigure 6: This is same as Fig. 5 but on a real dataset: UCI Red Wine Quality. All the plots use\nLaplace approximation, and the standard errors are estimated over 20 splits. We tune the following\nhyperparameters: the regularization parameter \u03b4 (left), the noise-variance \u03c3 (middle), and the DNN\nwidth (right). The train log marginal-likelihood chooses hyperparameters that give a low test error.\n\nlikelihood, respectively. We clearly see that the train marginal-likelihood chooses hyperparameters\nthat give low test error. The train MSE on the other hand over\ufb01ts as \u03b4 is reduced.\nNext, we discuss results for a real dataset: UCI Red Wine Quality (N = 1599) with an input-\ndimensionality of 12 and a scalar output. We use an MLP with 2 hidden layers 20 units each and\ntanh transfer function. We consider tuning the regularizer \u03b4, the noise-variance \u03c3, and the DNN\nwidth. We use the Laplace approximation and tune one parameter at a time while keeping the others\n\ufb01xed (we use respectively \u03c3 = 0.64, \u03b4 = 30 and \u03c3 = 0.64, \u03b4 = 3, 1 hidden layer). Similarly to the\nsynthetic data case, the train marginal-likelihood selects hyperparameters that give low test error.\nThese experiments show that the DNN2GP framework can be useful to tune DNN hyperparameters,\nalthough this needs to be con\ufb01rmed for larger networks than we used here.\n\n6 Discussion and Future Work\n\nIn this paper, we present theoretical results connecting approximate inference on DNNs to GP\nposteriors. Our work enables the extraction of feature maps and GP kernels by simply training DNNs.\nIt provides a natural way to combine the two different models.\nOur hope is that our theoretical results will facilitate further research on combining strengths of\nDNNs and GPs. A computational bottleneck is the Jacobian computation which prohibits application\nto large problems. There are several ways to reduce this computation, e.g., by choosing a different\ntype of GGN approximation that uses gradients instead of the Jacobians. Exploration of such methods\nis a future direction that needs to be pursued.\nExact inference on the GP model we derive is still computationally infeasible for large problems.\nHowever, further approximations could enable inference on bigger datasets. Finally, our work opens\nmany other interesting avenues where a combination of GPs and DNNs can be useful such as model\nselection, deep reinforcement learning, Bayesian optimization, active learning, interpretation, etc.\nWe hope that our work enables the community to conduct further research on such problems.\n\n9\n\n05\u03b4=0.01\u22122.50.02.5y\u03b4=0.63\u2212202x\u22122.50.02.5\u03b4=2510\u2212210\u22121100101102hyperparameter\u03b40.10.20.3MSEtrainlosstestloss110120130140-logmarginallikelihoodTrainMargLik10\u2212210\u22121100101102hyperparameter\u03b40.10.20.30.4MSEtrainlosstestloss110120130140-logmarginallikelihoodTrainMargLik100102hyperparameter\u03b40.00.20.40.60.81.0MSEtrainlosstestloss2000225025002750-logmarginallikelihoodTrainMargLik10\u22121100101hyperparameter\u03c30.00.20.40.60.81.0MSEtrainlosstestloss20004000600080001000012000-logmarginallikelihoodTrainMargLik100101102width0.20.30.40.50.6MSEtrainlosstestloss1800190020002100-logmarginallikelihoodTrainMargLik\fAcknowledgements\nWe would like to thank Kazuki Osawa (Tokyo Institute of Technology), Anirudh Jain (RIKEN),\nand Runa Eschenhagen (RIKEN) for their help with the experiments. We would also like to thank\nMatthias Bauer (DeepMind) for discussions and useful feedback. Many thanks to Roman Bachmann\n(RIKEN) for helping with the visualization in Fig. 1. We also thank Stephan Mandt (UCI) for\nsuggesting the marginal likelihood experiment. We thank the reviewers and the area chair for their\nfeedback as well. We are also thankful for the RAIDEN computing system and its support team at\nthe RIKEN Center for Advanced Intelligence Project which we used extensively for our experiments.\n\nReferences\n[1] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.\n\n[2] L. Bottou, F. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning.\n\nSIAM Review, 60(2):223\u2013311, 2018. doi: 10.1137/16M1080173.\n\n[3] John Bradshaw, Alexander G de G Matthews, and Zoubin Ghahramani. Adversarial examples,\nuncertainty, and transfer testing robustness in Gaussian process hybrid deep networks. arXiv\npreprint arXiv:1707.02476, 2017.\n\n[4] Youngmin Cho and Lawrence K. Saul. Kernel methods for deep learning.\n\nIn Y. Bengio,\nD. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural\nInformation Processing Systems 22, pages 342\u2013350. Curran Associates, Inc., 2009.\n\n[5] Alexander G. de G. Matthews, Jiri Hron, Mark Rowland, Richard E. Turner, and Zoubin\nGhahramani. Gaussian process behaviour in wide deep neural networks. In International\nConference on Learning Representations, 2018.\n\n[6] Adri\u00e0 Garriga-Alonso, Carl Edward Rasmussen, and Laurence Aitchison. Deep convolutional\nnetworks as shallow Gaussian processes. In International Conference on Learning Representa-\ntions, 2019.\n\n[7] Tamir Hazan and Tommi S. Jaakkola. Steps toward deep kernel methods from in\ufb01nite neural\n\nnetworks. CoRR, abs/1508.05133, 2015.\n\n[8] Arthur Jacot, Franck Gabriel, and Clement Hongler. Neural tangent kernel: Convergence and\ngeneralization in neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman,\nN. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems\n31, pages 8571\u20138580. Curran Associates, Inc., 2018.\n\n[9] Mohammad Khan. Variational learning for latent Gaussian model of discrete data. PhD thesis,\n\nUniversity of British Columbia, 2012.\n\n[10] Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, and Akash\nSrivastava. Fast and scalable Bayesian deep learning by weight-perturbation in Adam. In\nInternational Conference on Machine Learning, pages 2616\u20132625, 2018.\n\n[11] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[12] Yann LeCun, L\u00e9on Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning\n\napplied to document recognition. Proceedings of the IEEE, 86(11):2278\u20132324, 1998.\n\n[13] Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and\nJascha Sohl-Dickstein. Deep neural networks as Gaussian processes. In 6th International\nConference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May\n3, 2018, Conference Track Proceedings, 2018.\n\n[14] Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha\nSohl-Dickstein, and Jeffrey Pennington. Wide Neural Networks of Any Depth Evolve as Linear\nModels Under Gradient Descent. arXiv e-prints, art. arXiv:1902.06720, Feb 2019.\n\n[15] James Martens. New perspectives on the natural gradient method. CoRR, abs/1412.1193, 2014.\n\n10\n\n\f[16] Radford M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag, Berlin, Heidelberg,\n\n1996. ISBN 0387947248.\n\n[17] Jorge Nocedal and Stephen Wright. Numerical optimization. Springer Science & Business\n\nMedia, 2006.\n\n[18] Roman Novak, Lechao Xiao, Yasaman Bahri, Jaehoon Lee, Greg Yang, Daniel A. Abola\ufb01a,\nJeffrey Pennington, and Jascha Sohl-dickstein. Bayesian deep convolutional networks with many\nchannels are Gaussian processes. In International Conference on Learning Representations,\n2019.\n\n[19] Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard Turner, Rio\nYokota, and Mohammad Emtiyaz Khan. Practical deep learning with Bayesian principles. In\nAdvances in Neural Information Processing Systems, 2019.\n\n[20] Edward Snelson and Zoubin Ghahramani. Sparse gaussian processes using pseudo-inputs. In\nY. Weiss, B. Sch\u00f6lkopf, and J. C. Platt, editors, Advances in Neural Information Processing\nSystems 18, pages 1257\u20131264. MIT Press, 2006.\n\n[21] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-RMSprop: Divide the gradient by a running\naverage of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4, 2012.\n\n[22] Christopher KI Williams. Computing with in\ufb01nite networks. In Advances in neural information\n\nprocessing systems, pages 295\u2013301, 1997.\n\n[23] Christopher KI Williams and Carl Edward Rasmussen. Gaussian processes for machine learning,\n\nvolume 2. MIT Press Cambridge, MA, 2006.\n\n11\n\n\f", "award": [], "sourceid": 1751, "authors": [{"given_name": "Mohammad Emtiyaz", "family_name": "Khan", "institution": "RIKEN"}, {"given_name": "Alexander", "family_name": "Immer", "institution": "EPFL, RIKEN"}, {"given_name": "Ehsan", "family_name": "Abedi", "institution": "EPFL"}, {"given_name": "Maciej", "family_name": "Korzepa", "institution": "Technical University of Denmark"}]}