{"title": "Modeling Memory Transfer and Saving in Cerebellar Motor Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 859, "page_last": 866, "abstract": null, "full_text": "Modeling Memory Transfer and Savings in Cerebellar Motor Learning\n\nNaoki Masuda RIKEN Brain Science Institute Wako, Saitama 351-0198, Japan masuda@brain.riken.jp\n\nShun-ichi Amari RIKEN Brain Science Institute Wako, Saitama 351-0198, Japan amari@brain.riken.jp\n\nAbstract\nThere is a long-standing controversy on the site of the cerebellar motor learning. Different theories and experimental results suggest that either the cerebellar flocculus or the brainstem learns the task and stores the memory. With a dynamical system approach, we clarify the mechanism of transferring the memory generated in the flocculus to the brainstem and that of so-called savings phenomena. The brainstem learning must comply with a sort of Hebbian rule depending on Purkinje-cell activities. In contrast to earlier numerical models, our model is simple but it accommodates explanations and predictions of experimental situations as qualitative features of trajectories in the phase space of synaptic weights, without fine parameter tuning.\n\n1\n\nIntroduction\n\nThe cerebellum is involved in various types of motor learning. As schematically shown in Fig. 1, the cerebellum is composed of the cerebellar cortex and the cerebellar nuclei (we depict the vestibular nucleus V N in Fig. 1). There are two main pathways linking external input from the mossy fibers (mf ) to motor outputs, which originate from the cerebellar nuclei. The pathway that relays the mossy fibers directly to the cerebellar nuclei is called the direct pathway. Each nucleus cell receives about 104 mossy fiber synapses. The pathway involving the mossy fibers, the granule cells (g r), the parallel fibers (pl), and the Purkinje cells (P r) in the flocculo-nodular lobes of the cerebellar cortex, is called the indirect pathway. Because the Purkinje cells, which are the sole source of output from the cerebellar cortex, are GABAergic, firing rates of the nuclei are suppressed when this pathway is active. The indirect pathway also includes recurrent collaterals terminating on various types of inhibitory cells. Another anatomical feature of the indirect pathway is that climbing fibers (C m in Fig. 1) from the inferior olive (I O) innervate on Purkinje cells. Taking into account the huge mass of intermediate computational units in the indirect pathway, or the granule cells, Marr conjectured that the cerebellum operates as a perceptron with high computational power [8]. The climbing fibers were thought to induce long-term potentiation (LTP) of pl-P r synapses to reinforce the signal transduction. Albus claimed that long-term depression (LTD) rather than LTP should occur so that the Purkinje cells inhibit the nuclei [2]. The climbing fibers were thought to serve as teaching lines that convey error-correcting signals.\n\n\f\nu\n\ne=ru-z gr x=Au Pr IO pl *w Cm y=wx=wAu mf *v z=vu-y VN\n\nFigure 1: Architecture of the VOR model.\n\nThe vestibulo-ocular reflex (VOR) is a standard benchmark for exploring synaptic substrates of cerebellar motor learning. The VOR is a short-latency reflex eye movement that stabilizes images on the retina during head movement. Motion of the head drives eye movements in the opposite direction. When a subject wears a prism, adaptation of the VOR gain occurs for image stabilization. In this context, in vivo experiments confirmed that the LTD hypothesis is correct (reviewed in [6]). However, the cerebellum is not the only site of convergence of visual and vestibular signals. The learning scheme depending only on the indirect pathway is called the flocculus hypothesis. An alternative is the brainstem hypothesis in which synaptic plasticity is assumed to occur in the direct pathway (mf V N ) [12]. This idea is supported by experimental evidence that flocculus shutdown after 3 days of VOR adaptation does not impair the motor memory [7]. Moreover, in other experiments, plasticity of the Purkinje cells in response to vestibular inputs, as required in the flocculus hypothesis, really occurs but in the direction opposite to that predicted by the flocculus hypothesis [5, 12]. Also, LTP of the mf -V N synapses, which is necessary to implement the brainstem hypothesis [3], has been suggested in experiments [14]. Relative contributions of the flocculus mechanism and the brainstem mechanism to motor learning remain illusive [3, 5, 9]. The same controversy exists regarding the mechanism of associative eyelid conditioning [9, 10, 11]. Related is the distinction between short-term and long-term plasticities. Many of the experiments in favor of the flocculus hypothesis are concerned with short-term learning, whereas plasticity involving the vestibular nuclei is suggested to be functional in the long term. Short-term motor memory in the flocculus may eventually be transferred to the brainstem. This is termed the memory transfer hypothesis [9]. Medina and Mauk proposed a numerical model and examined what types of brainstem learning rules are compatible with memory transfer [10]. They concluded that the brainstem plasticity should be driven by coincident activities of the Purkinje cells and the mossy fibers. The necessity of Hebbian type of learning in the direct pathway is also supported by another numerical model [13]. We propose a much simpler model to understand the essential mechanism of memory transfer without fine parameter manipulations. Another goal of this work is to explain savings of learning. Savings are observed in natural learning tasks. Because animals can be trained just for a limited amount of time per day, the task period and the rest period, of e.g. 1 day, alternate. Performance is improved during the task period, and it degrades during the rest period (in the dark). However, when the alternation is repeated, the performance is enhanced more rapidly and progressively in later sessions [7] (also, S. Nagao, private communication). The flocculus may be responsible for daily rapid learning and forgetting, and the brainstem may underlie gradual memory consolidation [11]. While our target phenomenon of interest is the VOR, the proposed model is fairly general.\n\n\f\n2\n\nModel\n\nLooking at Fig. 1, let us denote by u Rm the external input to the mossy fibers. It is propagated to the granule cells via synaptic connectivity represented by an n by m matrix A, where presumably n m. The output of the granule cells, or x Au Rn , is received by the Purkinje-cell layer. For simplicity, we assume just one Purkinje cell whose output is written as y wx, where w R Rm . Since pl-P r synapses are excitatory, the elements of w are positive. The direct pathway (mf V N ) is defined by a plastic connection matrix v R Rm . The output to the VOR actuator is given by z = vu - y = vu - wAu, which is the output of the sole neuron of the cerebellar nuclei. This form of z takes into account that the contribution of the indirect pathway is inhibitory and that of the direct pathway is excitatory. The animal learns to adapt z as close as possible to the desirable motor output ru. For a large (resp. small) desirable gain r, the correct direction of synaptic changes is the decrease (resp. increase) in w and the increase (resp. decrease) in v [5]. The learning error e ru - z is carried by the climbing fibers and projects onto the Purkinje cell, which enables supervised learning [6]. The LTD of w occurs when the parallel-fiber input and the climbing-fiber input are simultaneously large [6, 9]. Since we can write 1 e2 w = -1 ex = - 1 , (1) 2 w where 1 is the learning rate, w evolves to minimize e2 . Equation (1) is a type of WidrowHoff rule [4, p. 320]. With spontaneous inputs only, or in the presence of x and the absence of e, w experiences LTP [6, 9]. We model this effect by adding 2 x to Eq. (1). This term provides subtractive normalization that counteracts the use-dependent LTD [4, p. 290]. However, subtractive normalization cannot prohibit w from running away when the error signal is turned off. Therefore, we additionally assume multiplicative normalization term 3 w to limit the magnitude of w [4, p. 290, 314]. In the end, Eq. (1) is modified to w = -1 (ru - vu + wAu)Au + 2 Au - 3 w, where 2 and 3 are rates of memory decay satisfying 2 , 3 1 . (2)\n\nIn the dark, the VOR gain, which might have changed via adaptation, tends back to a value close to unity [5]. Let us represent this reference gain by r = r0 . With the synaptic strengths in this null condition denoted by (w, v) = (w0 , v0 ), we obtain r0 u = v0 u - w0 Au. By setting w = 0 in Eq. (2), we derive 2 Au = 1 (r0 u - v0 u + w0 Au)Au + 3 w0 = 3 w0 . Substituting Eq. (3) into Eq. (2) results in w = -1 (ru - vu + wAu) Au - 3 (w - w0 ) . (4) (3)\n\nExperiments show that v can be potentiated [14]. Enhancement of the excitability of the nucleus output (z ) in response to tetannic stimulation, or sustained u, is also in line with the LTP of v [1]. In contrast, LTD of v is biologically unknown. Numerical models suggest that LTP in the nuclei should be driven by y [10, 11]. However, the mechanism and the specificity underlying plasticity of v are not well understood [9]. Therefore, we assume that both LTP and LTD of v occur in an associative manner, and we represent the LTP effect by a general function F . In parallel to the learning rule of w, we assume a subtractive normalization term -5 u [10]. We also add a multiplicative normalization term 6 v to constrain v. Finally, we obtain v = 4 F(u, y , z , e) - 5 u - 6 v. (5) Presumably, v changes much more slowly (on a time scale of 812 hr) than w changes (0.5 hr) [10, 13]. Therefore, we assume 1 4 5 , 6 .\n\n\f\n3\n\nAnalysis of Memory Transfer\n\nLet us examine a couple of learning rules in the direct pathway to identify robust learning mechanisms. 3.1 Supervised learning\n\nAlthough the climbing fibers carrying e send excitatory collaterals to the cerebellar nuclei, supervised learning there has very little experimental support [5]. Here we show that supervised learning in the direct pathway is theoretically unlikely. Let us assume that modification of v decreases |e|. Accordingly, we set F = - e2 / v = eu. Then, Eq. (5) becomes v = 4 (ru - vu + wAu)u - 5 u - 6 v. (6) In the natural situation, r = r0 . Hence, 5 u = 4 (r0 u - v0 u + w0 Au)u - 6 v0 = -6 v0 . Inserting Eq. (7) into Eq. (6) yields v = 4 (ru - vu + wAu) u - 6 (v - v0 ). (8) (7)\n\nFor further analysis, let us assume m = n = 1 (for which we quit bold notations) and perform the slow-fast analysis based on 1 3 , 4 6 . Equations (4) and (8) define the nullclines w = 0 and v = 0, which are represented respectively by v v = = 1 A2 u2 + 3 (w - w0 ), and 1 Au2 4 Au2 4 u2 (r - r0 ) + (w - w0 ). v0 + 4 u2 + 6 4 u2 + 6 v0 + r - r 0 + (9) (10)\n\nSince w = O(1 ) O(4 ) = v in an early stage, a trajectory in the w-v plane initially approaches the fast manifold (Eq. (9)) and moves along it toward the equilibrium given by w = w0 - 1 6 Au2 (r - r0 ) , 1 6 A2 u2 + 3 4 u2 + 3 6 v = v0 + 3 4 u2 (r - r0 ) . (11) 1 6 A2 u2 + 3 4 u2 + 3 6\n\nLTD of w and LTP of v are expected for adaptation to a larger gain (r > r0 ), and LTP of w and LTD of v are expected for r < r0 . The results are consistent with both the flocculus hypothesis and the brainstem hypothesis as far as the direction of learning is concerned [5]. When r > r0 (resp. r < r0 ), LTD (resp. LTP) of w first occurs to decrease the learning error. Then, the motor memory stored in w is gradually transferred by LTP (resp. LTD) of v replacing LTD (resp. LTP) of w. In the long run, the memory is stored mainly in v , not in w. However, the memory transfer based on supervised learning has fundamental deficiencies. First, since 1 3 and 4 6 , both nullclines Eqs. (9) and (10) have a slope close to A in the w-v plane. This means that the relative position of the equilibrium depends heavily on the parameter values, especially on the learning rates, the choice of which is rather arbitrary. Then, (w , v ) may be located so that, for example, the LTP of w or LTD of v results from r > r0 . Also, the degree of transfer, or |w - w0 | / |v - v0 |, is not robust against parameter changes. This may underlie the fact that LTD of w was not followed by partial LTP in the numerical simulations in [10]. Even if the position of (w , v ) happens to support LTD of w and LTP of v , memory transfer takes a long time. This is because Eqs. (9) and (10) are fairly close, which means that v is small on the fast manifold (w = 0). We can also imagine a type of Hebbian rule with F = z 2 / v = z u. Similar calculations show that this rule also realizes memory transfer only in an unreliable manner.\n\n\f\nA v e=0 w*,v* w0,v0 . v=0 w . w=0\n\nB v w0,v0 w*,v* e=0 . v=0 w\n\n. w=0\n\nFigure 2: Dynamics of the synaptic weights in the Purkinje cell-dependent learning. (A) r > r0 and (B) r < r0 .\n\n3.2\n\nPurkinje cell-dependent learning\n\nResults of numerical studies support that v should be subject to a type of Hebbian learning depending on two afferents to the vestibular nuclei, namely, u and y [10, 11, 13]. Changes in the VOR gain are signaled by y . Since LTP should logically occur when y is small and u is large, we set F = (ymax - y )u, where ymax is the maximum firing rate of the Purkinje cell. Then, we obtain v = 4 (ymax - wAu)u - 5 u - 6 v. The subtraction normalization is determined from the equilibrum condition: 5 u = 4 (ymax - w0 Au) - 6 v0 . Substituting Eq. (13) into Eq. (12) yields v = 4 (w0 - w)Au2 + 6 (v0 - v). When m = n = 1, the nullclines are given by Eq. (9) and v = v0 - 4 Au2 (w - w0 ), 6 (15) (14) (13) (12)\n\nwhich are depicted in Fig. 2(A) and (B) for r > r0 and r < r0 , respectively. As shown by arrows in Fig. 2, trajectories in the w-v space first approach the fast manifold Eq. (9) and then move along it toward the equilibrium given by w = w0 - 1 6 Au2 (r - r0 ) , 1 4 A2 u4 + 1 6 A2 u2 + 3 6 v = v0 + 1 4 A 2 u 4 (r - r 0 ) . 1 4 A2 u4 + 1 6 A2 u2 + 3 6 (16)\n\nEquation (15) has a large negative slope because 4 6 . Consequently, setting r > r0 (resp. r < r0 ) duly results in LTD (resp. LTP) of w and LTP (resp. LTD) of v . At the same time, LTD (resp. LTP) of w in an early stage of learning is partially compensated by subsequent LTP (resp. LTD) of w, which agrees with previously reported numerical results [10]. In contrast to the supervised and Hebbian learning rules, this learning is robust against parameter changes since the positions and the slopes of the two nullclines are apart from each other. Owing to this property, in the long term, the memory is transferred more rapidly along the w-nullcline than for the other two learning rules. Another benefit of the large negative slope of Eq. (15) is that |v - v0 | |w - w0 | holds, which means efficient memory transfer from w to v .\n\n\f\nThe error at the equilibrum state is e = 3 6 (r - r0 )u 2 u4 + A2 u2 1 4 A 16 + 3 6 . (17)\n\nEquation (17) guarantees that the e = 0 line is located as shown in Fig. 2, and the learning proceeds so as to decrease |e|. The performance overshoot, which is unrealistic, does not occur.\n\n4\n\nNumerical Simulations of Savings\n\nThe learning rule proposed in Sec. 3.2 explains savings as well. To show this, we mimic a situation of savings by periodically alternating the task period and the rest period. Specifically, we start with r = r0 = 1, w = w0 , v = v0 , and the learning condition (r = 2 or r = 0.5) is applied for 4 hours a day. During the rest of the day (20 hours), the dark condition is simulated by giving no teaching signal to the model. Changes in the VOR gains for 8 consecutive days are shown in Fig. 3(A) and (C) for r = 2 and r = 0.5, respectively. The numerical results are consistent with the savings found in other reported experiments [7] and models [11]; the animal forgets much of the acquired gain in the dark, while a small fraction is transferred each day to the cerebellar nuclei. The time-dependent synaptic weights are shown in Fig. 3(B) (r = 2) and (D) (r = 0.5) and suggest that v is really responsible for savings and that its plasticity needs guidance under the short-term learning of w. The memory transfer occurs even in the dark condition, as indicated by the increase (resp. decrease) of v in the dark shown in Fig. 3(B) (resp. (D)). This happens because ruin of the short-term memory of w drives the learning of v for some time even after the daily training has finished. For the indirect pathway, a dark condition defines an off-task period during which w gradually loses its associations. For comparison, let us deal with the case in which v is fixed. Then, the learning rule Eq. (4) is reduced to w = -1 [(r - r0 ) u + (w - w0 ) Au] Au - 3 (w - w0 ) . (18)\n\nThe VOR adaptation with this rule is shown in Fig. 4(A) (r = 2) and (B) (r = 0.5). Longterm retention of the acquired gain is now impossible, whereas the short-term learning, or the adaptation within a day, deteriorates little. Since savings do not occur, the ultimate learning error is larger than when v is plastic. However, if w is fixed and v is plastic, the VOR gain is not adaptive, since y does not carry teaching signals any longer. In this case, we must implement supervised learning of v for learning to occur. Then, r adapts only gradually on the slow time scale of 4 , and the short-term learning is lost.\n\n5\n\nDiscussion\n\nOur model explains how the flocculus and the brainstem cooperate in motor learning. Presumably, the indirect pathway involving the flocculus is computationally powerful because of a huge number of intermediate granule cells, but its memory is of short-term nature. The direct pathway bypassing the mossy fibers to the cerebellar nuclei is likely to have less computational power but stores motor memory for a long period. A part of the motor memory is expected to be passed from the flocculus to the nuclei. This happens in a robust manner if the direct pathway is equipped with the learning rule dependent on correlation between the Purkinje-cell firing and the mossy-fiber firing. To explore whether associative LTP/LTD in the cerebellar nuclei really exists will be a subject of future experimental work. Our model is also applicable to savings.\n\n\f\nA\n2 1.75 1.5 1.25 1 0 50 100 time [hr] 150\nv r\n\nB\n3 2.5 2 1.5 0 1 w 2\n\nB\n1\n\nD\n2\n\n0.75\n\n1.5\n\n0.5 0 50 100 time [hr] 150\n\nv\n\nr\n\n1 2 w 2.5 3\n\nFigure 3: Numerical simulations of savings with the Purkinje cell-dependent learning rule. We set A = 0.4, u = 1, w0 = 2, r0 = 1, v0 = r0 + Aw0 , 1 = 7, 3 = 0.3, 4 = 0.05, 6 = 0.002. The target gains are (A, B) r = 2 and (C, D) r = 0.5. (A) and (C) show VOR gains. (B) and (D) show trajectories in the w-v space (thin solid lines) together with the nullclines (thick solid lines) and e = 0 (thick dotted lines).\n\nA\n2 1.75 1.5 1.25 1 0 50 100 time [hr] 150\n\nB\n1\n\n0.75\n\nr\n\nr\n\n0.5 0 50 100 time [hr] 150\n\nFigure 4: Numerical simulations of savings with fixed v . The parameter values are the same as those used in Fig. 3. The target gains are (A) r = 2 and (B) r = 0.5.\n\n\f\nIn the earlier models [10, 11], quantitative meanings were given to the equilibrium synaptic weights. Actually, they are solely determined from non-experimentally determined parameters, namely, the balance between the learning rates (in our terminology, 1 , 2 , 4 and 5 ). Also, the balance seems to play a role in preventing runaway of synaptic weights. In contrast, our model uses the ratio of learning rates (and values of other parameters) just for qualitative purposes and is capable of explaning and predicting experimental settings without parameter tuning. For example, the earlier arguments negating the flocculus hypothesis are based on the fact that the plasticity of the flocculus (w) responding to vestibular inputs occurs but in the direction opposite to the expectation of the flocculus hypothesis [5, 12]. However, this experimental observation is not necessarily contradictory to either the flocculus hypothesis or the two-site hypothesis. As shown in Fig. 2(A), when adapting to a large VOR gain, w experiences LTD in the initial stage [6]. Then, partial LTP ensues as the motor memory is transferred to the nuclei. Another prediction is about adaptation to a small gain. Figure 2(B) predicts that, in this case, LTP in the indirect pathway is gradually transferred to LTD in the direct pathway. Partial LTD following LTP is anticipated in the flocculus. This implies savings in unlearning.\n\nAcknowledgments\nWe thank S. Nagao for helpful discussions. This work was supported by the Special Postdoctoral Researchers Program of RIKEN.\n\nReferences\n[1] C. D. Aizenman, D. J. Linden. Rapid, synaptically driven increases in the intrinsic excitability of cerebellar deep nuclear neurons. Nat. Neurosci., 3, 109111 (2000). [2] J. S. Albus. A theory of cerebellar function. Math. Biosci., 10, 2561 (1971). [3] E. S. Boyden, A. Katoh, J. L. Raymond. Cerebellum-dependent learning: the role of multiple plasticity mechanisms. Annu. Rev. Neurosci., 27, 581609 (2004). [4] P. Dayan, L. F. Abbott. Theoretial Neuroscience -- Computational and Mathematical Modeling of Neural Systems. MIT (2001). [5] S. du Lac, J. L. Raymond, T. J. Sejnowski, S. G. Lisberger. Learning and memory in the vestibulo-ocular reflex. Annu. Rev. Neurosci., 18, 409441 (1995). [6] M. Ito. Long-term depression. Ann. Rev. Neurosci., 12, 85102 (1989). [7] A. E. Luebke, D. A. Robinson. Gain changes of the cat's vestibulo-ocular reflex after flocculus deactivation. Exp. Brain Res., 98, 379390 (1994). [8] D. Marr. A theory of cerebellar cortex. J. Physiol., 202, 437470 (1969). [9] M. D. Mauk. Roles of cerebellar cortex and nuclei in motor learning: contradictions or clues? Neuron, 18, 343346 (1997). [10] J. F. Medina, M. D. Mauk. Simulations of cerebellar motor learning: computational analysis of plasticity at the mossy fiber to deep nucleus synapse. J. Neurosci., 19, 71407151 (1999). [11] J. F. Medina, K. S. Garcia, M. D. Mauk. A mechanism for savings in the cerebellum. J. Neurosci., 21, 40814089 (2001). [12] F. A. Miles, D. J. Braitman, B. M. Dow. Long-term adaptive changes in primate vestibuloocular reflex. IV. Electrophysiological observations in flocculus of adapted monkeys. J. Neurophysiol., 43, 14771493 (1980). [13] B. W. Peterson, J. F. Baker, J. C. Houk. A model of adaptive control of vestibuloocular reflex based on properties of cross-axis adaptation. Ann. New York Acad. Sci. 627, 319337 (1991). [14] R. J. Racine, D. A. Wilson, R. Gingell, D. Sunderland. Long-term potentiation in the interpositus and vestibular nuclei in the rat. Exp. Brain Res., 63, 158162 (1986).\n\n\f\n", "award": [], "sourceid": 2873, "authors": [{"given_name": "Naoki", "family_name": "Masuda", "institution": null}, {"given_name": "Shun-ichi", "family_name": "Amari", "institution": null}]}