{"title": "Learning Attractor Dynamics for Generative Memory", "book": "Advances in Neural Information Processing Systems", "page_first": 9379, "page_last": 9388, "abstract": "A central challenge faced by memory systems is the robust retrieval of a stored pattern in the presence of interference due to other stored patterns and noise. A theoretically well-founded solution to robust retrieval is given by attractor dynamics, which iteratively cleans up patterns during recall. However, incorporating attractor dynamics into modern deep learning systems poses difficulties: attractor basins are characterised by vanishing gradients, which are known to make training neural networks difficult.  In this work, we exploit recent advances in variational inference and avoid the vanishing gradient problem by training a generative distributed memory with a variational lower-bound-based Lyapunov function. The model is minimalistic with surprisingly few parameters. Experiments shows it converges to correct patterns upon iterative retrieval and achieves competitive performance as both a memory model and a generative model.", "full_text": "Learning Attractor Dynamics\n\nfor Generative Memory\n\nYan Wu, Greg Wayne, Karol Gregor, Timothy Lillicrap\n\nDeepMind\n\n{yanwu,gregwayne,karolg,countzero}@google.com\n\nAbstract\n\nA central challenge faced by memory systems is the robust retrieval of a stored\npattern in the presence of interference due to other stored patterns and noise. A the-\noretically well-founded solution to robust retrieval is given by attractor dynamics,\nwhich iteratively clean up patterns during recall. However, incorporating attractor\ndynamics into modern deep learning systems poses dif\ufb01culties: attractor basins are\ncharacterised by vanishing gradients, which are known to make training neural net-\nworks dif\ufb01cult. In this work, we avoid the vanishing gradient problem by training a\ngenerative distributed memory without simulating the attractor dynamics. Based\non the idea of memory writing as inference, as proposed in the Kanerva Machine,\nwe show that a likelihood-based Lyapunov function emerges from maximising the\nvariational lower-bound of a generative memory. Experiments shows it converges\nto correct patterns upon iterative retrieval and achieves competitive performance as\nboth a memory model and a generative model.\n\n1\n\nIntroduction\n\nMemory plays an important role in both arti\ufb01cial and biological learning systems [4]. Various forms\nof external memory have been used to augment neural networks [5, 14, 25, 29, 31, 32]. Most of these\napproaches use attention-based reading mechanisms that compute a weighted average of memory\ncontents. These mechanisms typically retrieve items in a single step and are \ufb01xed after training.\nWhile external-memory offers the potential of quickly adapting to new data after training, it is unclear\nwhether these previously proposed attention-based mechanisms can fully exploit this potential. For\nexample, when inputs are corrupted by noise that is unseen during training, are such one-step attention\nprocesses always optimal?\nIn contrast, experimental and theoretical studies of neural systems suggest memory retrieval is a\ndynamic and iterative process: memories are retrieved through a potentially varying period of time,\nrather than a single step, during which information can be continuously integrated [3, 7, 20]. In\nparticular, attractor dynamics are hypothesised to support the robust performance of various forms of\nmemory via their self-stabilising property [8, 12, 16, 28, 33]. For example, point attractors eventually\nconverge to a set of \ufb01xed points even from noisy initial states. Memories stored at such \ufb01xed points\ncan thus be retrieved robustly. To our knowledge, only the Kanerva Machine (KM) incorporates\niterative reconstruction of a retrieved pattern within a modern deep learning model, but it does not\nhave any guarantee of convergence [32].\nIncorporating attractor dynamics into modern neural networks is not straightforward. Although\nrecurrent neural networks can in principle learn any dynamics, they face the problem of vanishing\ngradients. This problem is aggravated when directly training for attractor dynamics, which by\nde\ufb01nition imply vanishing gradients [23] (see also Section 2.2). In this work, we avoid vanishing\ngradients by constructing our model to dynamically optimise a variational lower-bound. After\ntraining, the stored patterns serve as attractive \ufb01xed-points to which even random patterns will\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fconverge. Thanks to the underlying probabilistic model, we do not need to simulate the attractor\ndynamics during training, thus avoiding the vanishing gradient problem. We applied our approach\nto a generative distributed memory. In this context we focus on demonstrating high capacity and\nrobustness, though the framework may be used for any other memory model with a well-de\ufb01ned\nlikelihood.\nTo con\ufb01rm that the emerging attractor dynamics help memory retrieval, we experiment with the\nOmniglot dataset [22] and images from DMLab [6], showing that the attractor dynamics consistently\nimprove images corrupted by noise unseen during training, as well as low-quality prior samples. The\nimprovement of sampling quality tracks the decrease of an energy which we de\ufb01ned based on the\nvariational lower-bound.\n\n2 Background and Notation\n\nAll vectors are assumed to be column vectors. Samples from a dataset D, as well as other variables,\nare indexed with the subscript t when the temporal order is speci\ufb01ed. We use the short-hand subscript\n<t and (cid:54)t to indicate all elements with indexes \u201cless than\u201d and \u201cless than or equally to\u201d t, respectively.\n(cid:104)f (x)(cid:105)p(x) is used to denotes the expectation of function f (x) over the distribution p(x).\n\n2.1 Kanerva Machines\n\nOur model shares the same essential structure as the Kanerva Machine (\ufb01gure 1, left) [32], which\nviews memory as a global latent variable in a generative model. Underlying the inference process is\nthe assumption of exchangeability of the observations: i.e., an episode of observations x1, x2, . . . , xT\nis exchangeable if shuf\ufb02ing the indices within the episode does not affect its probability [2]. This\nensures that a pattern xt can be retrieved regardless of the order it was stored in the memory \u2014 there\nis no forgetting of earlier patterns. Formally, exchangeability implies all the patterns in an episode\n\nare conditionally independent: p(x1, x2, . . . xT|M) =(cid:81)T\n\nMore speci\ufb01cally, p (M; R, U) de\ufb01nes the distribution over the K \u00d7 C memory matrix M, where\nK is the number of rows and C is the code size used by the memory. The statistical structure of the\nmemory is summarised in its mean and covariance through parameters R and U. Intuitively, while\nthe mean provides materials for the memory to synthesise observations, the covariance coordinates\nmemory reads and writes. R is the mean matrix of M, which has same K \u00d7 C shape. M\u2019s columns\nare independent, with the same variance for all elements in a given row. The covariance between rows\nof M is encoded in the K \u00d7 K covariance matrix U. The vectorised form of M has the multivariate\ndistribution p (vec (M)) = N (vec (M)| vec (R) , I \u2297 U), where vec (\u00b7) is the vectorisation operator\nand \u2297 denotes the Kronecker product. Equivalently, the memory can be summarised as the matrix\nvariate normal distribution p(M) = MN (R, U, I), Reading from memory is achieved via a weighted\nsum over rows of M, weighted by addressing weights w:\n\nt=1 p(xt|M).\n\nK(cid:88)\n\nk=1\n\nz =\n\nw(k) \u00b7 M(k) + \u03be\n\n(1)\n\nwhere k indexes the elements of w and the rows of M. \u03be is observation noise with \ufb01xed variance to\nensure the model\u2019s likelihood is well de\ufb01ned (Appendix A). The memory interfaces with data inputs\nx via neural network encoders and decoders.\nSince the memory is a linear Gaussian model, its posterior distribution p(M|z(cid:54)t, w(cid:54)t) is analytically\ntractable and online Bayesian inference can be performed ef\ufb01ciently. [32] interpreted inferring the\nposterior of memory as a writing process that optimally balances previously stored patterns and new\npatterns. To infer w, however, the KM uses an amortised inference model q(w|x), similar to the\nencoder of a variational autoencoder (VAE) [19, 27], which does not access the memory. Although\nit can distil information about the memory into its parameters during training, such parameterised\ninformation cannot easily by adapted to test-time data. This can damage performance during testing,\nfor example, when the memory is loaded with different numbers of patterns, as we shall demonstrated\nin experiments.\n\n2\n\n\fFigure 1: Variables: M \u2013 the memory, x \u2013 inputs (e.g., images), w \u2013 addressing weigths, z \u2013\nembedding of x. Left: The probabilistic graphical model shared by the Kanerva Machine and\nour model. The memory is a latent variable shared by all patterns in an episode, and provides\nexchangeability within the episode. z is omitted since the deterministic embedding of x does not\naffect the graphical model. Right: Schematic structure of our model. The memory is a Gaussian\nrandom matrix.\n\n2.2 Attractor Dynamics\n\nA theoretically well-founded approach for robust memory retrieval is to employ attractor dynamics\n[3, 15, 18, 33]. In this paper, we focus on point attractors, although other types of attractor may\nalso support memory systems [12]. For a discrete-time dynamical system with state x and dynamics\nspeci\ufb01ed by the function f (\u00b7), its states evolve as: xn+1 = f (xn). A \ufb01xed-point x\u2217 = xn satis\ufb01es\n= 0, so that xn+1 = xn = x\u2217. A \ufb01xed point x\u2217 is attractive\nthe condition \u2202xn+1\n\u2202xn\nif, for any point near x\u2217, iterative application of f (\u00b7) converges to x\u2217. A more formal de\ufb01nition of\na point attractor is given in Appendix E, along with a proof of attractor dynamics for our model.\nGradient-based training of attractors with parametrised models f (\u00b7; \u03b8), such as neural networks, is\ndif\ufb01cult: for any loss function L that depends on the n\u2019th state xn, the gradient\n\n(cid:12)(cid:12)x=xn\n\n= \u2202f (x)\n\u2202x\n\n(2)\nt=1\nxu \u2192 0 when xu \u2192 x\u2217 according\nvanishes when xt approaches a \ufb01xed point, since xn\nxt\nto the \ufb01xed-point condition. This is the \u201cvanishing gradients\u201d problem, which makes backpropagating\ngradients through the attractor settling dynamics dif\ufb01cult [23, 24].\n\n=(cid:81)n\u22121\n\nxu+1\n\nu=t\n\n\u2202xn\n\u2202xt \u00b7\n\n\u2202xt\n\u2202\u03b8\n\nn(cid:88)\n\n\u2202L\n\u2202\u03b8\n\n=\n\n\u2202L\n\u2202xn \u00b7\n\n3 Dynamic Kanerva Machines\n\nWe call our model the Dynamic Kanerva Machine (DKM), because it optimises weights w at each\nstep via dynamic addressing. We depart from both Kanerva\u2019s original sparse distributed memory [18]\nand the KM by removing the static addresses that are \ufb01xed after training. The DKM is illustrated\nin \ufb01gure 1 (right). Following the KM, we use a Gaussian random matrix M for the memory, and\napproximate samples of M using its mean R. We use subscripts t for Mt, Rt and Ut to distinguish\nthe memory or parameters after the online update at the t\u2019th step when necessary. Therefore,\np (Mt|x(cid:54)t) = p (M|x(cid:54)t).\nWe use a neural network encoder e(x) \u2192 z to deterministically map an external input x to embedding\nz. To obtain a valid likelihood function, the decoder is a parametrised distribution d(z) \u2192 p (x|z)\nthat transforms an embedding z to a distribution in the input space, similar to the decoder in the VAE.\nTogether the pair forms an autoencoder.\n\nSimilar to eq. 1, we construct z from the memory and addressing weights via z =(cid:80)K\n(cid:0)\u00b5wt, \u03c32\n\nk=1 w(k) \u00b7\nM(k). Since both mappings e(\u00b7) and d(\u00b7) are deterministic, we hereafter omit all dependencies of\ndistributions on z for brevity. For a Bayesian treatment of the addressing weights, we assume they\nhave the Gaussian prior p(wt) = N (0, 1). The posterior distribution q (wt) = N\nvariance that is trained as a parameter and a mean that is optimised analytically at each step (Section\n3.1). All parameters of the model and their initialisations are summarised in Appendix B.\nTo train the model in a maximum-likelihood setting, we update the model parameters to maximise\nthe log-likelihood of episodes x(cid:54)T sampled from the training set (summarised in Algorithm 1). As\n\n(cid:1) has a\n\nw\n\n3\n\n\u2026x1<latexit sha1_base64=\"XKRpkkbtZtFMa4xQyLBNO+caSI4=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe5bfX9YaN0UdZTiBUzgHD66gAXfQhBYwSOAZXuHNSZ0X5935WIyWnGLnGP7A+fwBKmKRwA==</latexit><latexit sha1_base64=\"XKRpkkbtZtFMa4xQyLBNO+caSI4=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe5bfX9YaN0UdZTiBUzgHD66gAXfQhBYwSOAZXuHNSZ0X5935WIyWnGLnGP7A+fwBKmKRwA==</latexit><latexit sha1_base64=\"XKRpkkbtZtFMa4xQyLBNO+caSI4=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe5bfX9YaN0UdZTiBUzgHD66gAXfQhBYwSOAZXuHNSZ0X5935WIyWnGLnGP7A+fwBKmKRwA==</latexit><latexit sha1_base64=\"XKRpkkbtZtFMa4xQyLBNO+caSI4=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe5bfX9YaN0UdZTiBUzgHD66gAXfQhBYwSOAZXuHNSZ0X5935WIyWnGLnGP7A+fwBKmKRwA==</latexit>x2<latexit sha1_base64=\"+fmTm2ZVR20F4bY6bvBdYsdn+/4=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTKdtEMnkzBzI5bQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53Hrk2IlYPOE14P6IjJULBKFrJ9yOK4yDMnmaD+qBSdWvuHGSVeAWpQoHmoPLlD2OWRlwhk9SYnucm2M+oRsEkn5X91PCEsgkd8Z6likbc9LN55hk5t8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU1lW4K3/OVV0q7XPMvvL6uNm6KOEpzCGVyAB1fQgDtoQgsYJPAMr/DmpM6L8+58LEbXnGLnBP7A+fwBK+aRwQ==</latexit><latexit sha1_base64=\"+fmTm2ZVR20F4bY6bvBdYsdn+/4=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTKdtEMnkzBzI5bQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53Hrk2IlYPOE14P6IjJULBKFrJ9yOK4yDMnmaD+qBSdWvuHGSVeAWpQoHmoPLlD2OWRlwhk9SYnucm2M+oRsEkn5X91PCEsgkd8Z6likbc9LN55hk5t8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU1lW4K3/OVV0q7XPMvvL6uNm6KOEpzCGVyAB1fQgDtoQgsYJPAMr/DmpM6L8+58LEbXnGLnBP7A+fwBK+aRwQ==</latexit><latexit sha1_base64=\"+fmTm2ZVR20F4bY6bvBdYsdn+/4=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTKdtEMnkzBzI5bQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53Hrk2IlYPOE14P6IjJULBKFrJ9yOK4yDMnmaD+qBSdWvuHGSVeAWpQoHmoPLlD2OWRlwhk9SYnucm2M+oRsEkn5X91PCEsgkd8Z6likbc9LN55hk5t8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU1lW4K3/OVV0q7XPMvvL6uNm6KOEpzCGVyAB1fQgDtoQgsYJPAMr/DmpM6L8+58LEbXnGLnBP7A+fwBK+aRwQ==</latexit><latexit sha1_base64=\"+fmTm2ZVR20F4bY6bvBdYsdn+/4=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTKdtEMnkzBzI5bQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53Hrk2IlYPOE14P6IjJULBKFrJ9yOK4yDMnmaD+qBSdWvuHGSVeAWpQoHmoPLlD2OWRlwhk9SYnucm2M+oRsEkn5X91PCEsgkd8Z6likbc9LN55hk5t8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU1lW4K3/OVV0q7XPMvvL6uNm6KOEpzCGVyAB1fQgDtoQgsYJPAMr/DmpM6L8+58LEbXnGLnBP7A+fwBK+aRwQ==</latexit>xT<latexit sha1_base64=\"Lv0eXnBKX961VeUHLOca/6YomRA=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclUQEXRbduKzQFzShTKaTduhkEmZuxBL6G25cKOLWn3Hn3zhts9DWAwOHc+7lnjlhKoVB1/121tY3Nre2Szvl3b39g8PK0XHbJJlmvMUSmehuSA2XQvEWCpS8m2pO41DyTji+m/mdR66NSFQTJykPYjpUIhKMopV8P6Y4CqP8adpv9itVt+bOQVaJV5AqFGj0K1/+IGFZzBUySY3peW6KQU41Cib5tOxnhqeUjemQ9yxVNOYmyOeZp+TcKgMSJdo+hWSu/t7IaWzMJA7t5CyjWfZm4n9eL8PoJsiFSjPkii0ORZkkmJBZAWQgNGcoJ5ZQpoXNStiIasrQ1lS2JXjLX14l7cuaZ/nDVbV+W9RRglM4gwvw4BrqcA8NaAGDFJ7hFd6czHlx3p2PxeiaU+ycwB84nz9fbpHj</latexit><latexit sha1_base64=\"Lv0eXnBKX961VeUHLOca/6YomRA=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclUQEXRbduKzQFzShTKaTduhkEmZuxBL6G25cKOLWn3Hn3zhts9DWAwOHc+7lnjlhKoVB1/121tY3Nre2Szvl3b39g8PK0XHbJJlmvMUSmehuSA2XQvEWCpS8m2pO41DyTji+m/mdR66NSFQTJykPYjpUIhKMopV8P6Y4CqP8adpv9itVt+bOQVaJV5AqFGj0K1/+IGFZzBUySY3peW6KQU41Cib5tOxnhqeUjemQ9yxVNOYmyOeZp+TcKgMSJdo+hWSu/t7IaWzMJA7t5CyjWfZm4n9eL8PoJsiFSjPkii0ORZkkmJBZAWQgNGcoJ5ZQpoXNStiIasrQ1lS2JXjLX14l7cuaZ/nDVbV+W9RRglM4gwvw4BrqcA8NaAGDFJ7hFd6czHlx3p2PxeiaU+ycwB84nz9fbpHj</latexit><latexit sha1_base64=\"Lv0eXnBKX961VeUHLOca/6YomRA=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclUQEXRbduKzQFzShTKaTduhkEmZuxBL6G25cKOLWn3Hn3zhts9DWAwOHc+7lnjlhKoVB1/121tY3Nre2Szvl3b39g8PK0XHbJJlmvMUSmehuSA2XQvEWCpS8m2pO41DyTji+m/mdR66NSFQTJykPYjpUIhKMopV8P6Y4CqP8adpv9itVt+bOQVaJV5AqFGj0K1/+IGFZzBUySY3peW6KQU41Cib5tOxnhqeUjemQ9yxVNOYmyOeZp+TcKgMSJdo+hWSu/t7IaWzMJA7t5CyjWfZm4n9eL8PoJsiFSjPkii0ORZkkmJBZAWQgNGcoJ5ZQpoXNStiIasrQ1lS2JXjLX14l7cuaZ/nDVbV+W9RRglM4gwvw4BrqcA8NaAGDFJ7hFd6czHlx3p2PxeiaU+ycwB84nz9fbpHj</latexit><latexit sha1_base64=\"Lv0eXnBKX961VeUHLOca/6YomRA=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclUQEXRbduKzQFzShTKaTduhkEmZuxBL6G25cKOLWn3Hn3zhts9DWAwOHc+7lnjlhKoVB1/121tY3Nre2Szvl3b39g8PK0XHbJJlmvMUSmehuSA2XQvEWCpS8m2pO41DyTji+m/mdR66NSFQTJykPYjpUIhKMopV8P6Y4CqP8adpv9itVt+bOQVaJV5AqFGj0K1/+IGFZzBUySY3peW6KQU41Cib5tOxnhqeUjemQ9yxVNOYmyOeZp+TcKgMSJdo+hWSu/t7IaWzMJA7t5CyjWfZm4n9eL8PoJsiFSjPkii0ORZkkmJBZAWQgNGcoJ5ZQpoXNStiIasrQ1lS2JXjLX14l7cuaZ/nDVbV+W9RRglM4gwvw4BrqcA8NaAGDFJ7hFd6czHlx3p2PxeiaU+ycwB84nz9fbpHj</latexit>wT<latexit sha1_base64=\"RLT7ZikDAglq0wJwzAf2M9NDR/4=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclUQEXRbduKzQFzShTKaTduhkEmZulBL6G25cKOLWn3Hn3zhts9DWAwOHc+7lnjlhKoVB1/121tY3Nre2Szvl3b39g8PK0XHbJJlmvMUSmehuSA2XQvEWCpS8m2pO41DyTji+m/mdR66NSFQTJykPYjpUIhKMopV8P6Y4CqP8adpv9itVt+bOQVaJV5AqFGj0K1/+IGFZzBUySY3peW6KQU41Cib5tOxnhqeUjemQ9yxVNOYmyOeZp+TcKgMSJdo+hWSu/t7IaWzMJA7t5CyjWfZm4n9eL8PoJsiFSjPkii0ORZkkmJBZAWQgNGcoJ5ZQpoXNStiIasrQ1lS2JXjLX14l7cuaZ/nDVbV+W9RRglM4gwvw4BrqcA8NaAGDFJ7hFd6czHlx3p2PxeiaU+ycwB84nz9d55Hi</latexit><latexit sha1_base64=\"RLT7ZikDAglq0wJwzAf2M9NDR/4=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclUQEXRbduKzQFzShTKaTduhkEmZulBL6G25cKOLWn3Hn3zhts9DWAwOHc+7lnjlhKoVB1/121tY3Nre2Szvl3b39g8PK0XHbJJlmvMUSmehuSA2XQvEWCpS8m2pO41DyTji+m/mdR66NSFQTJykPYjpUIhKMopV8P6Y4CqP8adpv9itVt+bOQVaJV5AqFGj0K1/+IGFZzBUySY3peW6KQU41Cib5tOxnhqeUjemQ9yxVNOYmyOeZp+TcKgMSJdo+hWSu/t7IaWzMJA7t5CyjWfZm4n9eL8PoJsiFSjPkii0ORZkkmJBZAWQgNGcoJ5ZQpoXNStiIasrQ1lS2JXjLX14l7cuaZ/nDVbV+W9RRglM4gwvw4BrqcA8NaAGDFJ7hFd6czHlx3p2PxeiaU+ycwB84nz9d55Hi</latexit><latexit sha1_base64=\"RLT7ZikDAglq0wJwzAf2M9NDR/4=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclUQEXRbduKzQFzShTKaTduhkEmZulBL6G25cKOLWn3Hn3zhts9DWAwOHc+7lnjlhKoVB1/121tY3Nre2Szvl3b39g8PK0XHbJJlmvMUSmehuSA2XQvEWCpS8m2pO41DyTji+m/mdR66NSFQTJykPYjpUIhKMopV8P6Y4CqP8adpv9itVt+bOQVaJV5AqFGj0K1/+IGFZzBUySY3peW6KQU41Cib5tOxnhqeUjemQ9yxVNOYmyOeZp+TcKgMSJdo+hWSu/t7IaWzMJA7t5CyjWfZm4n9eL8PoJsiFSjPkii0ORZkkmJBZAWQgNGcoJ5ZQpoXNStiIasrQ1lS2JXjLX14l7cuaZ/nDVbV+W9RRglM4gwvw4BrqcA8NaAGDFJ7hFd6czHlx3p2PxeiaU+ycwB84nz9d55Hi</latexit><latexit sha1_base64=\"RLT7ZikDAglq0wJwzAf2M9NDR/4=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclUQEXRbduKzQFzShTKaTduhkEmZulBL6G25cKOLWn3Hn3zhts9DWAwOHc+7lnjlhKoVB1/121tY3Nre2Szvl3b39g8PK0XHbJJlmvMUSmehuSA2XQvEWCpS8m2pO41DyTji+m/mdR66NSFQTJykPYjpUIhKMopV8P6Y4CqP8adpv9itVt+bOQVaJV5AqFGj0K1/+IGFZzBUySY3peW6KQU41Cib5tOxnhqeUjemQ9yxVNOYmyOeZp+TcKgMSJdo+hWSu/t7IaWzMJA7t5CyjWfZm4n9eL8PoJsiFSjPkii0ORZkkmJBZAWQgNGcoJ5ZQpoXNStiIasrQ1lS2JXjLX14l7cuaZ/nDVbV+W9RRglM4gwvw4BrqcA8NaAGDFJ7hFd6czHlx3p2PxeiaU+ycwB84nz9d55Hi</latexit>w2<latexit sha1_base64=\"t6lFtd8cCHTCg5TWlOjAY0C6ISs=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTKdtEMnkzBzo5TQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53Hrk2IlYPOE14P6IjJULBKFrJ9yOK4yDMnmaD+qBSdWvuHGSVeAWpQoHmoPLlD2OWRlwhk9SYnucm2M+oRsEkn5X91PCEsgkd8Z6likbc9LN55hk5t8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU1lW4K3/OVV0q7XPMvvL6uNm6KOEpzCGVyAB1fQgDtoQgsYJPAMr/DmpM6L8+58LEbXnGLnBP7A+fwBKl+RwA==</latexit><latexit sha1_base64=\"t6lFtd8cCHTCg5TWlOjAY0C6ISs=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTKdtEMnkzBzo5TQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53Hrk2IlYPOE14P6IjJULBKFrJ9yOK4yDMnmaD+qBSdWvuHGSVeAWpQoHmoPLlD2OWRlwhk9SYnucm2M+oRsEkn5X91PCEsgkd8Z6likbc9LN55hk5t8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU1lW4K3/OVV0q7XPMvvL6uNm6KOEpzCGVyAB1fQgDtoQgsYJPAMr/DmpM6L8+58LEbXnGLnBP7A+fwBKl+RwA==</latexit><latexit sha1_base64=\"t6lFtd8cCHTCg5TWlOjAY0C6ISs=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTKdtEMnkzBzo5TQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53Hrk2IlYPOE14P6IjJULBKFrJ9yOK4yDMnmaD+qBSdWvuHGSVeAWpQoHmoPLlD2OWRlwhk9SYnucm2M+oRsEkn5X91PCEsgkd8Z6likbc9LN55hk5t8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU1lW4K3/OVV0q7XPMvvL6uNm6KOEpzCGVyAB1fQgDtoQgsYJPAMr/DmpM6L8+58LEbXnGLnBP7A+fwBKl+RwA==</latexit><latexit sha1_base64=\"t6lFtd8cCHTCg5TWlOjAY0C6ISs=\">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTKdtEMnkzBzo5TQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53Hrk2IlYPOE14P6IjJULBKFrJ9yOK4yDMnmaD+qBSdWvuHGSVeAWpQoHmoPLlD2OWRlwhk9SYnucm2M+oRsEkn5X91PCEsgkd8Z6likbc9LN55hk5t8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU1lW4K3/OVV0q7XPMvvL6uNm6KOEpzCGVyAB1fQgDtoQgsYJPAMr/DmpM6L8+58LEbXnGLnBP7A+fwBKl+RwA==</latexit>w1<latexit sha1_base64=\"r74evW3svIyiOgkGXGS8tPTvtKg=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe5bfX9YaN0UdZTiBUzgHD66gAXfQhBYwSOAZXuHNSZ0X5935WIyWnGLnGP7A+fwBKNuRvw==</latexit><latexit sha1_base64=\"r74evW3svIyiOgkGXGS8tPTvtKg=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe5bfX9YaN0UdZTiBUzgHD66gAXfQhBYwSOAZXuHNSZ0X5935WIyWnGLnGP7A+fwBKNuRvw==</latexit><latexit sha1_base64=\"r74evW3svIyiOgkGXGS8tPTvtKg=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe5bfX9YaN0UdZTiBUzgHD66gAXfQhBYwSOAZXuHNSZ0X5935WIyWnGLnGP7A+fwBKNuRvw==</latexit><latexit sha1_base64=\"r74evW3svIyiOgkGXGS8tPTvtKg=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0G3qBac+vuHGSVeAWpQYHmoPrlD2OWRlwhk9SYnucm2M+oRsEkn1X81PCEsgkd8Z6likbc9LN55hk5s8qQhLG2TyGZq783MhoZM40CO5lnNMteLv7n9VIMr/uZUEmKXLHFoTCVBGOSF0CGQnOGcmoJZVrYrISNqaYMbU0VW4K3/OVV0r6oe5bfX9YaN0UdZTiBUzgHD66gAXfQhBYwSOAZXuHNSZ0X5935WIyWnGLnGP7A+fwBKNuRvw==</latexit>M<latexit sha1_base64=\"qSAOqPjcFyl69u3u5Tz7jN6qjRo=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiFy9CBVuLbSib7Uu7dLMJuxuhhP4LLx4U8eq/8ea/cdPmoK0DC8PMe+y8CRLBtXHdb6e0srq2vlHerGxt7+zuVfcP2jpOFcMWi0WsOgHVKLjEluFGYCdRSKNA4EMwvs79hydUmsfy3kwS9CM6lDzkjBorPfYiakZBmN1O+9WaW3dnIMvEK0gNCjT71a/eIGZphNIwQbXuem5i/Iwqw5nAaaWXakwoG9Mhdi2VNELtZ7PEU3JilQEJY2WfNGSm/t7IaKT1JArsZJ5QL3q5+J/XTU146WdcJqlByeYfhakgJib5+WTAFTIjJpZQprjNStiIKsqMLaliS/AWT14m7bO6Z/ndea1xVdRRhiM4hlPw4AIacANNaAEDCc/wCm+Odl6cd+djPlpyip1D+APn8we75JDx</latexit><latexit sha1_base64=\"qSAOqPjcFyl69u3u5Tz7jN6qjRo=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiFy9CBVuLbSib7Uu7dLMJuxuhhP4LLx4U8eq/8ea/cdPmoK0DC8PMe+y8CRLBtXHdb6e0srq2vlHerGxt7+zuVfcP2jpOFcMWi0WsOgHVKLjEluFGYCdRSKNA4EMwvs79hydUmsfy3kwS9CM6lDzkjBorPfYiakZBmN1O+9WaW3dnIMvEK0gNCjT71a/eIGZphNIwQbXuem5i/Iwqw5nAaaWXakwoG9Mhdi2VNELtZ7PEU3JilQEJY2WfNGSm/t7IaKT1JArsZJ5QL3q5+J/XTU146WdcJqlByeYfhakgJib5+WTAFTIjJpZQprjNStiIKsqMLaliS/AWT14m7bO6Z/ndea1xVdRRhiM4hlPw4AIacANNaAEDCc/wCm+Odl6cd+djPlpyip1D+APn8we75JDx</latexit><latexit sha1_base64=\"qSAOqPjcFyl69u3u5Tz7jN6qjRo=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiFy9CBVuLbSib7Uu7dLMJuxuhhP4LLx4U8eq/8ea/cdPmoK0DC8PMe+y8CRLBtXHdb6e0srq2vlHerGxt7+zuVfcP2jpOFcMWi0WsOgHVKLjEluFGYCdRSKNA4EMwvs79hydUmsfy3kwS9CM6lDzkjBorPfYiakZBmN1O+9WaW3dnIMvEK0gNCjT71a/eIGZphNIwQbXuem5i/Iwqw5nAaaWXakwoG9Mhdi2VNELtZ7PEU3JilQEJY2WfNGSm/t7IaKT1JArsZJ5QL3q5+J/XTU146WdcJqlByeYfhakgJib5+WTAFTIjJpZQprjNStiIKsqMLaliS/AWT14m7bO6Z/ndea1xVdRRhiM4hlPw4AIacANNaAEDCc/wCm+Odl6cd+djPlpyip1D+APn8we75JDx</latexit><latexit sha1_base64=\"qSAOqPjcFyl69u3u5Tz7jN6qjRo=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiFy9CBVuLbSib7Uu7dLMJuxuhhP4LLx4U8eq/8ea/cdPmoK0DC8PMe+y8CRLBtXHdb6e0srq2vlHerGxt7+zuVfcP2jpOFcMWi0WsOgHVKLjEluFGYCdRSKNA4EMwvs79hydUmsfy3kwS9CM6lDzkjBorPfYiakZBmN1O+9WaW3dnIMvEK0gNCjT71a/eIGZphNIwQbXuem5i/Iwqw5nAaaWXakwoG9Mhdi2VNELtZ7PEU3JilQEJY2WfNGSm/t7IaKT1JArsZJ5QL3q5+J/XTU146WdcJqlByeYfhakgJib5+WTAFTIjJpZQprjNStiIKsqMLaliS/AWT14m7bO6Z/ndea1xVdRRhiM4hlPw4AIacANNaAEDCc/wCm+Odl6cd+djPlpyip1D+APn8we75JDx</latexit>RUneural networkencoder/decoderlinear Gaussianmodeldynamicaddressingwt<latexit sha1_base64=\"S8icjISybvxDgv0M5fbtxGwAO3M=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8AjmeSAg==</latexit><latexit sha1_base64=\"S8icjISybvxDgv0M5fbtxGwAO3M=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8AjmeSAg==</latexit><latexit sha1_base64=\"S8icjISybvxDgv0M5fbtxGwAO3M=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8AjmeSAg==</latexit><latexit sha1_base64=\"S8icjISybvxDgv0M5fbtxGwAO3M=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVJCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8AjmeSAg==</latexit>zt<latexit sha1_base64=\"TjI0qHTFYc+sRP+O2BXd2hFW9NE=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVBDf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8AkvySBQ==</latexit><latexit sha1_base64=\"TjI0qHTFYc+sRP+O2BXd2hFW9NE=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVBDf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8AkvySBQ==</latexit><latexit sha1_base64=\"TjI0qHTFYc+sRP+O2BXd2hFW9NE=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVBDf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8AkvySBQ==</latexit><latexit sha1_base64=\"TjI0qHTFYc+sRP+O2BXd2hFW9NE=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVBDf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8AkvySBQ==</latexit>xt<latexit sha1_base64=\"IvidLzRsEhl3vGMao18Zl4TfqwU=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8Aj+6SAw==</latexit><latexit sha1_base64=\"IvidLzRsEhl3vGMao18Zl4TfqwU=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8Aj+6SAw==</latexit><latexit sha1_base64=\"IvidLzRsEhl3vGMao18Zl4TfqwU=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8Aj+6SAw==</latexit><latexit sha1_base64=\"IvidLzRsEhl3vGMao18Zl4TfqwU=\">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMjVhCf8ONC0Xc+jPu/BsnbRbaemDgcM693DMnSKQw6LrfTmltfWNzq7xd2dnd2z+oHh61TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST29zvPHJtRKwecJrwfkRHSoSCUbSS70cUx0GYPc0GOKjW3Lo7B1klXkFqUKA5qH75w5ilEVfIJDWm57kJ9jOqUTDJZxU/NTyhbEJHvGepohE3/WyeeUbOrDIkYaztU0jm6u+NjEbGTKPATuYZzbKXi/95vRTD634mVJIiV2xxKEwlwZjkBZCh0JyhnFpCmRY2K2FjqilDW1PFluAtf3mVtC/qnuX3l7XGTVFHGU7gFM7BgytowB00oQUMEniGV3hzUufFeXc+FqMlp9g5hj9wPn8Aj+6SAw==</latexit>\fAlgorithm 1 Training the Dynamic Kanerva Machine (Single training step)\n\nsample an episode x1, x2, . . . , xT from D\nInitialise memory q (M0) = p (M0; R0, U0)\nfor t = 1 : T (in arbitrary order) do\ncompute embedding zt = e(xt)\ncompute weights distribution q (wt) by solving \u00b5wt from eq. 6 using q (Mt\u22121)\nupdate memory (Appendix A): q (Mt; Rt, Ut) \u2190 q (Mt\u22121|\u00b5wt, zt; Rt\u22121, Ut\u22121)\n(optional) set q (Mt\u22121) = q (Mt) and repeat the previous 2 steps\n\n// begin writing\n\nend for\nfor t = 1 : T (in arbitrary order) do\ncompute embedding zt = e(xt)\ncompute weights distribution q (wt) by solving \u00b5wt from eq. 6 using q (MT )\ncompute read-out embedding: \u02c6zt \u2190\n\n(cid:80)K\nk=1 wt(k) \u00b7 RT (k) using sample wt \u223c q (wt)\n\nend for\ncompute the the objective O \u2190 LT + LAE (eq. 4 and eq. 11) using previously obtained terms\nupdate parameters via gradient ascent to maximise O\n\n// end of reading\n\n// end of writing\n// begin reading\n\nis common for latent variable models, we achieve this by maximising a variational lower-bound of\nthe likelihood. To avoid cluttered notation we assume all training episodes have the same length T ;\nnothing in our algorithm depends on this assumption. Given an approximated memory distribution\nq (M), the log-likelihood of an episode can be decomposed as (see full derivation in Appendix C):\n\nln p (x(cid:54)T ) = LT +\n\n(cid:104)DKL (q (wt)(cid:107)p (wt|xt, M))(cid:105)q(M) + DKL (q (M)(cid:107)p (M|x(cid:54)T ))\n\n(3)\n\nT(cid:88)\n\nt=1\n\n(cid:17)\n\nwith its variational lower-bound:\n\nT(cid:88)\n\n(cid:16)\n\nt=1\n\nLT =\n\n(cid:104)ln p (xt|wt, M)(cid:105)q(wt) q(M) \u2212 DKL (q (wt)(cid:107)p (wt))\n\n\u2212 DKL (q (M)(cid:107)p (M))\n\n(4)\n\nFor consistency, we write p (wt) = p (w) = N (0, 1). From the perspective of the EM algorithm\n[11], the lower-bound can be maximised in two ways: 1. By tightening the the bound while keeping\nthe likelihood unchanged. This can be achieved by minimising the KL-divergences in eq. 3, so that\nq (wt) approximates the posterior distribution p (wt|xt, M) and q (M) approximates the posterior\ndistribution p (M|x(cid:54)T ). 2. By directly maximising the lower-bound LT as an evidence lower-bound\n1. This may both improve the\nobjective (ELBO) by, for example, gradient ascent on parameters of LT\nquality of posterior approximation by squeezing the bound, and maximising the likelihood of the\ngenerative model.\nWe develop an algorithm analogous to the two step-EM algorithm: it \ufb01rst analytically tighten the\nlower-bound by minimising the KL-divergence terms in eq. 3 via inference of tractable parameters,\nand then maximises the lower-bound by slow updating of the remaining model parameters via\nbackpropagation. The analytic inference in the \ufb01rst step is quick and does not require training,\nallowing the model to adapt to new data at test time.\n\n3.1 Dynamic Addressing\nRecall that the approximate posterior distribution of wt has the form: q (wt) = N\nthe variance parameter is trained using gradient-based updates, dynamic addressing is used to \ufb01nd the\n\u00b5wt that minimises DKL (q (w)(cid:107)p (w|x, M)). Dropping the subscript when it applies to any given\nx and M, it can be shown that the KL-divergence can be approximated by the following quadratic\nform (see Appendix D for derivation):\n\nw\n\n(cid:0)\u00b5wt, \u03c32\n\n(cid:1). While\n\nDKL (q (w)(cid:107)p (w|x, M)) \u2248 \u2212(cid:107)e(x) \u2212 M\n\n2\u03c32\n\u03be\n\n(cid:124)\n\n\u00b5w(cid:107)2\n\n1\n2 (cid:107)\u00b5w(cid:107)2 + . . .\n\n\u2212\n\n(5)\n\n1This differs from the original EM algorithm, which \ufb01xes the approximated posterior in the M step.\n\n4\n\n\fwhere the terms that are independent of \u00b5w are omitted. Then, the optimal \u00b5w can be found by\nsolving the (regularised) least-squares problem:\n\n(6)\nThis operation can be implemented ef\ufb01ciently via an off-the-shelve least-square solver, such as\nTensorFlow\u2019s matrix_solve_ls function which we used in experiments. Intuitively, dynamic\naddressing \ufb01nds the combination of memory rows that minimises the square error between the read\n(cid:124)\nout M\n\n\u00b5w and the embedding z = e(x), subject to the constraint from the prior p (w).\n\n\u00b5w \u2190\n\n+ \u03c32\n\n(cid:0)M M\n\n(cid:124)\n\n\u03be \u00b7 I(cid:1)\u22121\n\n(cid:124)\nM\n\ne(x)\n\n3.2 Bayesian Memory Update\nWe now turn to the more challenging problem of minimising DKL (q (M)(cid:107)p (M|x(cid:54)T )). We tackle\nthis minimisation via a sequential update algorithm. To motivate this algorithm we begin by consider-\ning T = 1. In this case, eq. 3 can be simpli\ufb01ed to:\n\nln p (x1) = L1 + (cid:104)DKL (q (w1)(cid:107)p (w1|x1, M))(cid:105)q(M) + DKL (q (M1)(cid:107)p (M1|x1))\n\n(7)\nWhile it is still unclear how to minimise DKL (q (M1)(cid:107)p (M|x1)), if a suitable weight distribution\nq (w1) were given, a slightly different term DKL (q (M1|w1)(cid:107)p (M1|w1, x1)) can be minimised to\n0. To achieve this, we can set q (M1|w1) = p (M1|x1, w1) by updating the parameters of q(M)\nusing the same Bayesian update rule as in the KM (Appendix A): R1, U1 \u2190 R0, U0. We may then\nmarginalise out w1 to obtain\n\n(cid:90)\n\nq (M1) =\n\nq (M1|w1) q (w1) dw1\n\nA reasonable guess of w1 can be obtained by be solving\n\nq (w1) \u2190 argmax\nq(cid:48)(w1)\n\nDKL (q\n\n(w1)(cid:107)p (w1|x1, M0))\n\n(cid:48)\n\n(8)\n\n(9)\n\nas in section 3.1, but using the prior memory M0. To continue, we treat the current posterior q (M1)\nas next prior, and compute q (M2) using x2 following the same procedure until we obtain q (MT )\nusing all x(cid:54)T .\nMore formally, Appendix C shows this heuristic online update procedure maximises another lower-\nbound of the log-likelihood. In addition, the marginalisation in eq. 8 can be approximated by using\n\u00b5wt instead of sampling wt for each memory update:\n\nq (Mt) \u2248 p (Mt|xt, \u00b5wt )\n\n(10)\nAlthough this lower-bound is looser than LT (eq. 4), Appendix C suggests it can be tighten by\niteratively using the updated memory for addressing (e.g., replacing M0 in eq. 9 by the updated\nM1, the \u201coptional\u201d step in Algorithm 1) and update the memory with the re\ufb01ned q (wt). We found\nthat extra iterations yielded only marginal improvement in our setting, so we did not use it in our\nexperiments.\n\n3.3 Gradient-Based Training\n\nHaving inferred q (wt) and q (M) = q (MT ), we now focus on gradient-based optimisation of the\nlower-bound LT (eq. 4). To ensure the likelihood in eq. 4 ln p (x|w, M) can be produced from\nthe likelihood given by the memory ln p (z|w, M), we ideally need a bijective pair of encoder and\ndecoder x \u21d0\u21d2 z (see Appendix D for more discussion). This is dif\ufb01cult to guarantee, but we can\napproximate this condition by maximising the autoencoder log-likelihood:\n(11)\n\nLAE = (cid:104)ln d (e(x))(cid:105)x\u223cD\n\nTaken together, we maximise the following joint objective using backpropagation:\n\nO = LT + LAE\n\n(12)\n\nWe note that dynamic addressing during online memory updates introduces order dependence since\nq (wt) always depends on the previous memory. This violates the model\u2019s exchangeable structure\n(order-independence). Nevertheless, gradient-ascend on LT mitigates this effect by adjusting the\nmodel so that DKL (q (w)(cid:107)p (w|x, M)) remains close to a minimum even for previous q (wt).\nAppendix C explains this in more details.\n\n5\n\n\f3.4 Prediction / Reading\n\nmemory M: p (x|xq, M) = (cid:82) p (x|w, M) p (w|xq, M) dw This posterior distribution does not\n\nThe predictive distribution of our model is the posterior distribution of x given a query xq and\nhave an analytic form in general (unless d(x|z) is Gaussian). We therefore approximate the integral\nusing the maximum a posteriori (MAP) estimator of w\u2217:\n\n(13)\nThus, w\u2217 can be computed by solving the same least-square problem as in eq. 6 and choosing\nw\u2217 = \u00b5w (see Appendix D for details).\n\np (\u02c6x|xq, M) = p (x|w\n\n, M)\n\n(cid:12)(cid:12)(cid:12)w\u2217=argmaxw p(w|xq,M)\n\n\u2217\n\n3.5 Attractor Dynamics\n\nTo understand the model\u2019s attractor dynamics, we de\ufb01ne the energy of a con\ufb01guration (x, w) with a\ngiven memory M as:\n\nE(x, w) = \u2212(cid:104)ln p (x|w, M)(cid:105)q(M) + DKL (qt(w)(cid:107)p (w))\n\n(14)\nFor a well trained model, with x \ufb01xed, E(x, w) is at minimum with respect to w after minimising\nDKL (q (w)(cid:107)p (w|x, M)) (eq. 6). To see this, note that the negative of E(x, w) consist of just terms\nin LT in eq. 4 that depend on a speci\ufb01c x and w, which are maximised during training. Now we\ncan minimise E(x, w) further by \ufb01xing w and optimising x. Since only the \ufb01rst term depends on x,\nE(x, w) is further minimised by choosing the mode of the likelihood function (cid:104)ln p (x|w, M)(cid:105)q(M).\nFor example, we take the mean for the Gaussian likelihood, and round the sigmoid outputs for the\nBernoulli likelihood. Each step can be viewed as coordinate descent over the energy E(x, w), as\nillustrated in \ufb01gure 2 (left).\nThe step of optimising w following by taking the mode of x is exactly the same as taking the\nmode of the predictive distribution xmode = argmax\u02c6x p (\u02c6x|xq, M) (eq. 13). Therefore, we can\nsimulate the attractor dynamics by repeatedly feeding-back the predictive mode as the next query:\nx1 = argmax\u02c6x p(\u02c6x|x0, M), x2 = argmax\u02c6x p(\u02c6x|x1, M), . . . , xn = argmax\u02c6x p(\u02c6x|xn\u22121, M). This\nsequence converges to a stored pattern in the memory, because each iteration minimises the energy\nE(x, w), so that E(xn, wn) < E(xn\u22121, wn\u22121), unless it has already converged at E(xn, wn) =\nE(xn\u22121, wn\u22121). Therefore, the sequence will converge to some x\u2217, a local minimum in the energy\nlandscape, which in a well trained memory model corresponds to a stored pattern.\nViewing xn = argmax\u02c6x p(\u02c6x|xn\u22121, M) as a dynamical system, the stored patterns correspond to\npoint attractors in this system. See Appendix C for a formal treatment. In this work we employed\ndeterministic dynamics in our experiments and to simplify analysis. Alternatively, sampling from\nq (w) and the predictive distribution would give stochastic dynamics that simulate Markov-Chain\nMonte Carlo (MCMC). We leave this direction for future investigation.\n\n4 Experiments\n\nWe tested our model on Ominglot [22] and frames from DMLab tasks [6]. Both datasets have images\nfrom a large number of classes, well suited to testing fast adapting external memory: 1200 different\ncharacters in Omniglot, and in\ufb01nitely many procedurally generated 3-D maze environments from\nDMLab. We treat Omniglot as binary data, while DMLab has larger real-valued colour images. We\ndemonstrate that the same model structure with identical hyperparameters (except for number of\n\ufb01lters, the predictive distribution, and memory size) can readily handle these different types of data.\nTo compare with the KM, we followed [32] to prepare the Ominglot dataset, and employed the same\nconvolutional encoder and decoder structure. We trained all models using the Adam optimiser [19]\nwith learning rate 1 \u00d7 10\u22124. We used 16 \ufb01lters in the convnet and 32 \u00d7 100 memory for Omniglot,\nand 256 \ufb01lters and 64 \u00d7 200 memory for DMLab. We used the Bernoulli likelihood function for\nOmniglot, and the Gaussian likelihood function for DMLab data. Uniform noise U(0, 1\n128 ) was added\nto the labyrinth data to prevent the Gaussian likelihood from collapsing.\nFollowing [32], we report the lower-bound on the conditional log-likelihood ln p (x(cid:54)T|M) by\nremoving DKL (q (M)(cid:107)p (M)) from LT (eq. 4). This is the negative energy \u2212E, and we obtained\n\n6\n\n\fthe per-image bound (i.e., conditional ELBO) by dividing it by the episode size. We trained the model\nfor Omniglot for approximately 3\u00d7105 steps; the test conditional ELBO reached 77.2, which is worse\nthan the 68.3 reported from the KM [32]. However, we show that the DKM generalises much better to\nunseen long episodes. We trained the model for DMLab for 1.1\u00d7105 steps; the test conditional ELBO\nreached \u22129046.5, which corresponds to 2.75 bits per pixel. After training, we used the same testing\nprotocol as [32], \ufb01rst computing the posterior distribution of memory (writing) given an episode, and\nthen performing tasks using the memory\u2019s posterior mean. For reference, our implementation of the\nmemory module is provided at https://github.com/deepmind/dynamic-kanerva-machines.\n\nCapacity\n\nWe investigated memory capacity using the Omniglot dataset, and compared our model with the\nKM and DNC. To account for the additional O(K 3) cost in the proposed dynamic addressing,\nour model in Omniglot experiments used a signi\ufb01cantly smaller number of memory parameters\n(32 \u00d7 100 + 32 \u00d7 32) than the DNC (64 \u00d7 100), and less than half of that used for the KM in [32].\nMoreover, our model does not have additional parametrised structure, like the memory controllers in\nDNC or the amortised addressing module in the KM. As in [32], we train our model using episodes\nwith 32 patterns randomly sampled from all classes, and test it using episodes with lengths ranging\nfrom 10 to 200, drawn from 2, 4, or 8 classes of characters (i.e. varying the redundancy of the\nobserved data). We report retrieval error as the negative of the conditional ELBO. The results are\nshown in \ufb01gure 2 (right), with results for the KM and DNC adapted from [32].\n\nFigure 2: Left: Illustration of the attractor dynamics that converge to a local minimum of the energy\nE(x, w). The circles shows contours of the energy. Black arrows shows the results from optimising\nw by solving the least-square problem; blue arrows depict optimisation of x by taking the mode of\nthe predictive distribution. Right: Comparing the capacity of our model (diamond lines) with the KM\n(solid lines) and the DNC (dashed lines). Our model compresses and generalises signi\ufb01cantly better\nfor long episodes.\n\nThe capacity curves for our model are strikingly \ufb02at compared with both the DNC and the KM; we\nbelieve that this is because the parameter-free addressing (section 3.1) generalises to longer episodes\nmuch better than the parametrised addressing modules in the DNC or the KM. The errors are larger\nthan the KM for small numbers of patterns (approximately <60), possibly because the KM over-\ufb01ts\nto shorter episodes that were more similar to training episodes.\n\nAttractor Dynamics: Denoising and Sampling\n\nWe next veri\ufb01ed the attractor dynamics through denoising and sampling tasks. These task demonstrate\nhow low-quality patterns, either from noise-corruption or imperfect priors, can be corrected using the\nattractor dynamics. Figure 3 (a) and Figure 4 (a) show the result of denoising. We added salt-and-\npepper noise to Omniglot images by randomly \ufb02ipping 15% of the bits, and independent Gaussian\nnoise N (0, 0.15) to all pixels in DMLab images. Such noise is never presented during training. We\nran the attractor dynamics (section 3.5) for 15 iterations from the noise corrupted images. Despite the\nsigni\ufb01cant corruption of images via different types of noise, the image quality improved steadily for\nboth datasets. Interestingly, the denoised Omniglot patterns are even cleaner and smoother than the\n\n7\n\nw<latexit sha1_base64=\"z/nekUz73YPC7FPrXIbLtsxjNL4=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiF48VbCu2oWy2L+3SzSbsbpQS+i+8eFDEq//Gm//GTZuDtg4sDDPvsfMmSATXxnW/ndLK6tr6RnmzsrW9s7tX3T9o6zhVDFssFrG6D6hGwSW2DDcC7xOFNAoEdoLxde53HlFpHss7M0nQj+hQ8pAzaqz00IuoGQVh9jTtV2tu3Z2BLBOvIDUo0OxXv3qDmKURSsME1brruYnxM6oMZwKnlV6qMaFsTIfYtVTSCLWfzRJPyYlVBiSMlX3SkJn6eyOjkdaTKLCTeUK96OXif143NeGln3GZpAYlm38UpoKYmOTnkwFXyIyYWEKZ4jYrYSOqKDO2pIotwVs8eZm0z+qe5bfntcZVUUcZjuAYTsGDC2jADTShBQwkPMMrvDnaeXHenY/5aMkpdg7hD5zPH/u2kRs=</latexit><latexit sha1_base64=\"z/nekUz73YPC7FPrXIbLtsxjNL4=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiF48VbCu2oWy2L+3SzSbsbpQS+i+8eFDEq//Gm//GTZuDtg4sDDPvsfMmSATXxnW/ndLK6tr6RnmzsrW9s7tX3T9o6zhVDFssFrG6D6hGwSW2DDcC7xOFNAoEdoLxde53HlFpHss7M0nQj+hQ8pAzaqz00IuoGQVh9jTtV2tu3Z2BLBOvIDUo0OxXv3qDmKURSsME1brruYnxM6oMZwKnlV6qMaFsTIfYtVTSCLWfzRJPyYlVBiSMlX3SkJn6eyOjkdaTKLCTeUK96OXif143NeGln3GZpAYlm38UpoKYmOTnkwFXyIyYWEKZ4jYrYSOqKDO2pIotwVs8eZm0z+qe5bfntcZVUUcZjuAYTsGDC2jADTShBQwkPMMrvDnaeXHenY/5aMkpdg7hD5zPH/u2kRs=</latexit><latexit sha1_base64=\"z/nekUz73YPC7FPrXIbLtsxjNL4=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiF48VbCu2oWy2L+3SzSbsbpQS+i+8eFDEq//Gm//GTZuDtg4sDDPvsfMmSATXxnW/ndLK6tr6RnmzsrW9s7tX3T9o6zhVDFssFrG6D6hGwSW2DDcC7xOFNAoEdoLxde53HlFpHss7M0nQj+hQ8pAzaqz00IuoGQVh9jTtV2tu3Z2BLBOvIDUo0OxXv3qDmKURSsME1brruYnxM6oMZwKnlV6qMaFsTIfYtVTSCLWfzRJPyYlVBiSMlX3SkJn6eyOjkdaTKLCTeUK96OXif143NeGln3GZpAYlm38UpoKYmOTnkwFXyIyYWEKZ4jYrYSOqKDO2pIotwVs8eZm0z+qe5bfntcZVUUcZjuAYTsGDC2jADTShBQwkPMMrvDnaeXHenY/5aMkpdg7hD5zPH/u2kRs=</latexit><latexit sha1_base64=\"z/nekUz73YPC7FPrXIbLtsxjNL4=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiF48VbCu2oWy2L+3SzSbsbpQS+i+8eFDEq//Gm//GTZuDtg4sDDPvsfMmSATXxnW/ndLK6tr6RnmzsrW9s7tX3T9o6zhVDFssFrG6D6hGwSW2DDcC7xOFNAoEdoLxde53HlFpHss7M0nQj+hQ8pAzaqz00IuoGQVh9jTtV2tu3Z2BLBOvIDUo0OxXv3qDmKURSsME1brruYnxM6oMZwKnlV6qMaFsTIfYtVTSCLWfzRJPyYlVBiSMlX3SkJn6eyOjkdaTKLCTeUK96OXif143NeGln3GZpAYlm38UpoKYmOTnkwFXyIyYWEKZ4jYrYSOqKDO2pIotwVs8eZm0z+qe5bfntcZVUUcZjuAYTsGDC2jADTShBQwkPMMrvDnaeXHenY/5aMkpdg7hD5zPH/u2kRs=</latexit>x<latexit sha1_base64=\"5FH1f5q1iZ8UGbO/bkoJpmknF4s=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiF48VbCu2oWy2L+3SzSbsbsQS+i+8eFDEq//Gm//GTZuDtg4sDDPvsfMmSATXxnW/ndLK6tr6RnmzsrW9s7tX3T9o6zhVDFssFrG6D6hGwSW2DDcC7xOFNAoEdoLxde53HlFpHss7M0nQj+hQ8pAzaqz00IuoGQVh9jTtV2tu3Z2BLBOvIDUo0OxXv3qDmKURSsME1brruYnxM6oMZwKnlV6qMaFsTIfYtVTSCLWfzRJPyYlVBiSMlX3SkJn6eyOjkdaTKLCTeUK96OXif143NeGln3GZpAYlm38UpoKYmOTnkwFXyIyYWEKZ4jYrYSOqKDO2pIotwVs8eZm0z+qe5bfntcZVUUcZjuAYTsGDC2jADTShBQwkPMMrvDnaeXHenY/5aMkpdg7hD5zPH/07kRw=</latexit><latexit sha1_base64=\"5FH1f5q1iZ8UGbO/bkoJpmknF4s=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiF48VbCu2oWy2L+3SzSbsbsQS+i+8eFDEq//Gm//GTZuDtg4sDDPvsfMmSATXxnW/ndLK6tr6RnmzsrW9s7tX3T9o6zhVDFssFrG6D6hGwSW2DDcC7xOFNAoEdoLxde53HlFpHss7M0nQj+hQ8pAzaqz00IuoGQVh9jTtV2tu3Z2BLBOvIDUo0OxXv3qDmKURSsME1brruYnxM6oMZwKnlV6qMaFsTIfYtVTSCLWfzRJPyYlVBiSMlX3SkJn6eyOjkdaTKLCTeUK96OXif143NeGln3GZpAYlm38UpoKYmOTnkwFXyIyYWEKZ4jYrYSOqKDO2pIotwVs8eZm0z+qe5bfntcZVUUcZjuAYTsGDC2jADTShBQwkPMMrvDnaeXHenY/5aMkpdg7hD5zPH/07kRw=</latexit><latexit sha1_base64=\"5FH1f5q1iZ8UGbO/bkoJpmknF4s=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiF48VbCu2oWy2L+3SzSbsbsQS+i+8eFDEq//Gm//GTZuDtg4sDDPvsfMmSATXxnW/ndLK6tr6RnmzsrW9s7tX3T9o6zhVDFssFrG6D6hGwSW2DDcC7xOFNAoEdoLxde53HlFpHss7M0nQj+hQ8pAzaqz00IuoGQVh9jTtV2tu3Z2BLBOvIDUo0OxXv3qDmKURSsME1brruYnxM6oMZwKnlV6qMaFsTIfYtVTSCLWfzRJPyYlVBiSMlX3SkJn6eyOjkdaTKLCTeUK96OXif143NeGln3GZpAYlm38UpoKYmOTnkwFXyIyYWEKZ4jYrYSOqKDO2pIotwVs8eZm0z+qe5bfntcZVUUcZjuAYTsGDC2jADTShBQwkPMMrvDnaeXHenY/5aMkpdg7hD5zPH/07kRw=</latexit><latexit sha1_base64=\"5FH1f5q1iZ8UGbO/bkoJpmknF4s=\">AAAB8XicbVBNS8NAFHypX7V+VT16WSyCp5KIoMeiF48VbCu2oWy2L+3SzSbsbsQS+i+8eFDEq//Gm//GTZuDtg4sDDPvsfMmSATXxnW/ndLK6tr6RnmzsrW9s7tX3T9o6zhVDFssFrG6D6hGwSW2DDcC7xOFNAoEdoLxde53HlFpHss7M0nQj+hQ8pAzaqz00IuoGQVh9jTtV2tu3Z2BLBOvIDUo0OxXv3qDmKURSsME1brruYnxM6oMZwKnlV6qMaFsTIfYtVTSCLWfzRJPyYlVBiSMlX3SkJn6eyOjkdaTKLCTeUK96OXif143NeGln3GZpAYlm38UpoKYmOTnkwFXyIyYWEKZ4jYrYSOqKDO2pIotwVs8eZm0z+qe5bfntcZVUUcZjuAYTsGDC2jADTShBQwkPMMrvDnaeXHenY/5aMkpdg7hD5zPH/07kRw=</latexit>E(x,w)<latexit sha1_base64=\"/FInCnFwHf1qSgJbbgXOiO2T0kc=\">AAACDHicbZDLSsNAFIZP6q3WW9Wlm8EiVJCSiKDLogguK9gLtKFMppN26OTCzEQtIQ/gxldx40IRtz6AO9/GSRpBW38Y+PjPOcw5vxNyJpVpfhmFhcWl5ZXiamltfWNzq7y905JBJAhtkoAHouNgSTnzaVMxxWknFBR7DqdtZ3yR1tu3VEgW+DdqElLbw0OfuYxgpa1+udLzsBoRzOPLpJqx48b3yRH64bvkUHeZNTMTmgcrhwrkavTLn71BQCKP+opwLGXXMkNlx1goRjhNSr1I0hCTMR7SrkYfe1TacXZMgg60M0BuIPTzFcrc3xMx9qSceI7uTFeUs7XU/K/WjZR7ZsfMDyNFfTL9yI04UgFKk0EDJihRfKIBE8H0roiMsMBE6fxKOgRr9uR5aB3XLM3XJ5X6eR5HEfZgH6pgwSnU4Qoa0AQCD/AEL/BqPBrPxpvxPm0tGPnMLvyR8fENBQKblA==</latexit><latexit sha1_base64=\"/FInCnFwHf1qSgJbbgXOiO2T0kc=\">AAACDHicbZDLSsNAFIZP6q3WW9Wlm8EiVJCSiKDLogguK9gLtKFMppN26OTCzEQtIQ/gxldx40IRtz6AO9/GSRpBW38Y+PjPOcw5vxNyJpVpfhmFhcWl5ZXiamltfWNzq7y905JBJAhtkoAHouNgSTnzaVMxxWknFBR7DqdtZ3yR1tu3VEgW+DdqElLbw0OfuYxgpa1+udLzsBoRzOPLpJqx48b3yRH64bvkUHeZNTMTmgcrhwrkavTLn71BQCKP+opwLGXXMkNlx1goRjhNSr1I0hCTMR7SrkYfe1TacXZMgg60M0BuIPTzFcrc3xMx9qSceI7uTFeUs7XU/K/WjZR7ZsfMDyNFfTL9yI04UgFKk0EDJihRfKIBE8H0roiMsMBE6fxKOgRr9uR5aB3XLM3XJ5X6eR5HEfZgH6pgwSnU4Qoa0AQCD/AEL/BqPBrPxpvxPm0tGPnMLvyR8fENBQKblA==</latexit><latexit sha1_base64=\"/FInCnFwHf1qSgJbbgXOiO2T0kc=\">AAACDHicbZDLSsNAFIZP6q3WW9Wlm8EiVJCSiKDLogguK9gLtKFMppN26OTCzEQtIQ/gxldx40IRtz6AO9/GSRpBW38Y+PjPOcw5vxNyJpVpfhmFhcWl5ZXiamltfWNzq7y905JBJAhtkoAHouNgSTnzaVMxxWknFBR7DqdtZ3yR1tu3VEgW+DdqElLbw0OfuYxgpa1+udLzsBoRzOPLpJqx48b3yRH64bvkUHeZNTMTmgcrhwrkavTLn71BQCKP+opwLGXXMkNlx1goRjhNSr1I0hCTMR7SrkYfe1TacXZMgg60M0BuIPTzFcrc3xMx9qSceI7uTFeUs7XU/K/WjZR7ZsfMDyNFfTL9yI04UgFKk0EDJihRfKIBE8H0roiMsMBE6fxKOgRr9uR5aB3XLM3XJ5X6eR5HEfZgH6pgwSnU4Qoa0AQCD/AEL/BqPBrPxpvxPm0tGPnMLvyR8fENBQKblA==</latexit><latexit sha1_base64=\"/FInCnFwHf1qSgJbbgXOiO2T0kc=\">AAACDHicbZDLSsNAFIZP6q3WW9Wlm8EiVJCSiKDLogguK9gLtKFMppN26OTCzEQtIQ/gxldx40IRtz6AO9/GSRpBW38Y+PjPOcw5vxNyJpVpfhmFhcWl5ZXiamltfWNzq7y905JBJAhtkoAHouNgSTnzaVMxxWknFBR7DqdtZ3yR1tu3VEgW+DdqElLbw0OfuYxgpa1+udLzsBoRzOPLpJqx48b3yRH64bvkUHeZNTMTmgcrhwrkavTLn71BQCKP+opwLGXXMkNlx1goRjhNSr1I0hCTMR7SrkYfe1TacXZMgg60M0BuIPTzFcrc3xMx9qSceI7uTFeUs7XU/K/WjZR7ZsfMDyNFfTL9yI04UgFKk0EDJihRfKIBE8H0roiMsMBE6fxKOgRr9uR5aB3XLM3XJ5X6eR5HEfZgH6pgwSnU4Qoa0AQCD/AEL/BqPBrPxpvxPm0tGPnMLvyR8fENBQKblA==</latexit>\fFigure 3: a: Denoising of Omniglot patterns. Patterns in the second column are obtained by adding\n15% salt-and-pepper noise to the patterns in the \ufb01rst column. The following columns shows samples\nfrom consecutive iterations. b: Sampling of Omniglot patterns. Patterns inside the top box were\nwritten into memory. Patterns in the \ufb01rst column are reconstructed using w \u223c p (w), which are then\nimproved through iterations in the following columns. c: Energy as a function of iterations during\ndenoising. d: Energy as a function of iterations during sampling.\n\noriginal patterns. The trajectories of the energy during denoising for 20 examples (including those\nwe plotted as images) are shown in Figure 3 (c) and Figure 4 (c), demonstrating that the system states\nwere attracted to points with lower energy.\n\nFigure 4: Experiment results for DMLab. Description of each panel is matched to those in Figure 3.\n\nSampling from the models\u2019 prior distributions provides another application of the attractor dynamics.\nGenerative models trained with stochastic variational inference usually suffer from the problem of low\nsample quality, because the asymmetric KL-divergence they minimise usually results in priors broader\nthan the posterior that is used to train the decoders. While different approaches exist to improve\nsample quality, including using more elaborated posteriors [26] and different training objectives\n[13], our model solves this problem by moving to regions with higher likelihoods via the attractor\ndynamics. As illustrated in Figure 3 (c) and Figure 4 (c), the initial samples have relatively low\nquality, but they were improved steadily through iterations. This improvement is correlated with the\ndecrease of energy. We do observe \ufb02uctuations in energy in all experiments, especially for DMLab.\nThis may be caused by the saddle-points that are more common in larger models [9]. While the\nobservation of saddle-points violates our assumption of local minima (section 3.5), our model still\nworked well and the energy generally dropped after temporarily rising.\n\n8\n\n\f5 Discussion\n\nHere we have presented a novel approach to robust attractor dynamics inside a generative distributed\nmemory. Other than the neural network encoder and decoder, our model has only a small number of\nstatistically well-de\ufb01ned parameters. Despite its simplicity, we have demonstrated its high capacity by\nef\ufb01ciently compressing episodes online and have shown its robustness in retrieving patterns corrupted\nby unseen noise.\nOur model can trade increased computation for higher precision retrieval by running attractor\ndynamics for more iterations. This idea of using attractors for memory retrieval and cleanup dates\nto Hop\ufb01eld nets [15] and Kanerva\u2019s sparse distributed memory [18]. Zemel and Mozer proposed\na generative model for memory [33] that pioneered the use of variational free energy to construct\nattractors for memory. By restricting themselves to a localist representation, their model is easy\nto train without backpropagation, though this choice constrains its capacity. On the other hand,\nBoltzmann Machines [1] are high-capacity generative models with distributed representations which\nobey stochastic attractor dynamics. However, writing memories into the weights of Boltzmann\nmachines is typically slow and dif\ufb01cult. In comparison, the DKM trains quickly via a low-variance\ngradient estimator and allows fast memory writing as inference. Notably, Saul and Jordan [30]\ndiscussed the limit of undirected graphical models compared with directed models in learning, and\nproposed mean-\ufb01eld-based attractor dynamics for iterative inference in feed-forward belief networks.\nOur method can also be seen as an extension along this line.\nAs a principled probabilistic model, the linear Gaussian memory of the DKM can be seen as a\nspecial case of the Kalman Filter (KF) [17] without the drift-diffusion dynamics of the latent state.\nThis more stable structure captures the statistics of entire episodes during sequential updates with\nminimal interference. The idea of using the latent state of the KF as memory is closely related to the\nhetero-associative novelty \ufb01lter suggested in [10]. The DKM can be also contrasted with recently\nproposed nonlinear generalisations of the KF such as [21] in that we preserve the higher-level linearity\nfor ef\ufb01cient analytic inference over a very large latent state (M). By combining deep neural networks\nand variational inference this allows our model to store associations between a large number of\npatterns, and generalise to large scale non-Gaussian datasets .\n\nReferences\n[1] David H Ackley, Geoffrey E Hinton, and Terrence J Sejnowski. A learning algorithm for boltzmann\n\nmachines. In Readings in Computer Vision, pages 522\u2013533. Elsevier, 1987.\n\n[2] David J Aldous. Exchangeability and related topics.\n\nXIII\u20141983, pages 1\u2013198. Springer, 1985.\n\nIn \u00c9cole d\u2019\u00c9t\u00e9 de Probabilit\u00e9s de Saint-Flour\n\n[3] Daniel J Amit. Modeling brain function: The world of attractor neural networks. Cambridge university\n\npress, 1992.\n\n[4] John Robert Anderson. Learning and memory: An integrated approach. John Wiley & Sons Inc, 2000.\n\n[5] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning\n\nto align and translate. arXiv preprint arXiv:1409.0473, 2014.\n\n[6] Charles Beattie, Joel Z Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich K\u00fcttler, Andrew\nLefrancq, Simon Green, V\u00edctor Vald\u00e9s, Amir Sadik, et al. Deepmind lab. arXiv preprint arXiv:1612.03801,\n2016.\n\n[7] Jonathan D Cohen, William M Perlstein, Todd S Braver, Leigh E Nystrom, Douglas C Noll, John Jonides,\nand Edward E Smith. Temporal dynamics of brain activation during a working memory task. Nature,\n386(6625):604, 1997.\n\n[8] John Conklin and Chris Eliasmith. A controlled attractor network model of path integration in the rat.\n\nJournal of computational neuroscience, 18(2):183\u2013203, 2005.\n\n[9] Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio.\nIdentifying and attacking the saddle point problem in high-dimensional non-convex optimization. In\nZ. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural\nInformation Processing Systems 27, pages 2933\u20132941. Curran Associates, Inc., 2014.\n\n9\n\n\f[10] Peter Dayan and Sham Kakade. Explaining away in weight space. In Advances in neural information\n\nprocessing systems, pages 451\u2013457, 2001.\n\n[11] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the\n\nem algorithm. Journal of the royal statistical society. Series B (methodological), pages 1\u201338, 1977.\n\n[12] Surya Ganguli, Dongsung Huh, and Haim Sompolinsky. Memory traces in dynamical systems. Proceedings\n\nof the National Academy of Sciences, 105(48):18970\u201318975, 2008.\n\n[13] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron\nCourville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes,\nN. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27,\npages 2672\u20132680. Curran Associates, Inc., 2014.\n\n[14] Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwi\u00b4nska,\nSergio G\u00f3mez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. Hybrid computing\nusing a neural network with dynamic external memory. Nature, 538(7626):471, 2016.\n\n[15] John J Hop\ufb01eld. Neural networks and physical systems with emergent collective computational abilities.\n\nProceedings of the national academy of sciences, 79(8):2554\u20132558, 1982.\n\n[16] Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. Dynamical movement\n\nprimitives: learning attractor models for motor behaviors. Neural computation, 25(2):328\u2013373, 2013.\n\n[17] Rudolph Emil Kalman. A new approach to linear \ufb01ltering and prediction problems. Transactions of the\n\nASME\u2013Journal of Basic Engineering, 82(Series D):35\u201345, 1960.\n\n[18] Pentti Kanerva. Sparse distributed memory. MIT press, 1988.\n[19] Diederik P Kingma and Max Welling. Auto-encoding variational bayes.\n\nInternational Conference on Learning Representations (ICLR), 2013.\n\nIn Proceedings of the 2nd\n\n[20] Konrad P Kording, Joshua B Tenenbaum, and Reza Shadmehr. The dynamics of memory as a consequence\n\nof optimal adaptation to a changing body. Nature neuroscience, 10(6):779, 2007.\n\n[21] Rahul G Krishnan, Uri Shalit, and David Sontag. Deep kalman \ufb01lters. arXiv preprint arXiv:1511.05121,\n\n2015.\n\n[22] Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through\n\nprobabilistic program induction. Science, 350(6266):1332\u20131338, 2015.\n\n[23] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the dif\ufb01culty of training recurrent neural\n\nnetworks. In International Conference on Machine Learning, pages 1310\u20131318, 2013.\n\n[24] Barak A Pearlmutter. Learning state space trajectories in recurrent neural networks. Neural Computation,\n\n1(2):263\u2013269, 1989.\n\n[25] Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adri\u00e0 Puigdom\u00e8nech, Oriol Vinyals, Demis Hassabis,\n\nDaan Wierstra, and Charles Blundell. Neural episodic control. arXiv preprint arXiv:1703.01988, 2017.\n\n[26] Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing \ufb02ows. arXiv\n\npreprint arXiv:1505.05770, 2015.\n\n[27] Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approxi-\nmate inference in deep generative models. In The 31st International Conference on Machine Learning\n(ICML), 2014.\n\n[28] Alexei Samsonovich and Bruce L McNaughton. Path integration and cognitive mapping in a continuous\n\nattractor neural network model. Journal of Neuroscience, 17(15):5900\u20135920, 1997.\n\n[29] Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. One-shot\n\nlearning with memory-augmented neural networks. arXiv preprint arXiv:1605.06065, 2016.\n\n[30] Lawrence K Saul and Michael I Jordan. Attractor dynamics in feedforward neural networks. Neural\n\nComputation, 12(6):1313\u20131335, 2000.\n\n[31] Jason Weston, Sumit Chopra, and Antoine Bordes. Memory networks. arXiv preprint arXiv:1410.3916,\n\n2014.\n\n[32] Yan Wu, Greg Wayne, Alex Graves, and Timothy Lillicrap. The kanerva machine: A generative distributed\n\nmemory. In International Conference on Learning Representations, 2018.\n\n[33] Richard S Zemel and Michael C Mozer. A generative model for attractor dynamics. In Advances in neural\n\ninformation processing systems, pages 80\u201388, 2000.\n\n10\n\n\f", "award": [], "sourceid": 5715, "authors": [{"given_name": "Yan", "family_name": "Wu", "institution": "DeepMind"}, {"given_name": "Gregory", "family_name": "Wayne", "institution": "Google DeepMind"}, {"given_name": "Karol", "family_name": "Gregor", "institution": "DeepMind"}, {"given_name": "Timothy", "family_name": "Lillicrap", "institution": "Google DeepMind"}]}