{"title": "Multiple Paired Forward-Inverse Models for Human Motor Learning and Control", "book": "Advances in Neural Information Processing Systems", "page_first": 31, "page_last": 37, "abstract": null, "full_text": "Multiple Paired Forward-Inverse Models \nfor Human Motor Learning and Control \n\nMasahiko Haruno* \nmharuno@hip .atr.co.jp \n\nDaniel M. Wolpert t \nwolpert@hera.ucl.ac.uk \n\nMitsuo Kawato* o \nkawato(Q)hip.atr.co.jp \n\n* ATR Human Information Processing Research Laboratories \n2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02, Japan. \n\ntSobell Department of Neurophysiology, Institute of Neurology, \n\nQueen Square, London WelN 3BG, United Kingdom . \n\u00b0 Dynamic Brain Project, ERATO, JST, Kyoto, Japan. \n\nAbstract \n\nHumans demonstrate a remarkable ability to generate accurate and \nappropriate motor behavior under many different and oftpn uncprtain \nenvironmental conditions. This paper describes a new modular ap(cid:173)\nproach to human motor learning and control, baspd on multiple pairs of \ninverse (controller) and forward (prpdictor) models. This architecture \nsimultaneously learns the multiple inverse models necessary for control \nas well as how to select the inverse models appropriate for a given em'i(cid:173)\nronm0nt. Simulations of object manipulation demonstrates the ability \nto learn mUltiple objects, appropriate generalization to novel objects \nand the inappropriate activation of motor programs based on visual \ncues, followed by on-line correction, seen in the \"size-weight illusion\". \n\n1 \n\nIntroduction \n\nGiven the multitude of contexts within which we must act, there are two qualitatively \ndistinct strategies to motor control and learning. The first is to uSP a Single controller \nwhich would need to be highly complex to allow for all possible scenarios. If this \ncontroller were unable to encapsulate all the contexts it would need to adapt pvery \ntime the context of the movement changed before it could produce appropriate motor \ncommands- -this would produce transient and possibly large performancp errors. Al(cid:173)\nternatively, a modular approach can be used in which multiple controllers co-exist, with \neach controller suitable for onp or a small set of contexts. Such a modular strategy' has \nbeen introduced in the \"mixture of experts\" architecture for supervised learning [6]. \nThis architecture comprises a set of expert networks and a gating network which per(cid:173)\nforms classification by combining each expert's output. These networks are trained \nsimultaneously so that the gating network splits the input spacp into regions in which \nparticular experts can specialize. \nTo apply such a modular strategy to motor control two problems must bp solved. First \n\n\f32 \n\nM Haruno, D. M Wolpert and M Kawato \n\nhow are the set of inverse models (controllers) learned to cover the contexts which \nmight be experienced \nthe module learning problem. Second, given a set of inverse \nmodules (controllers) how are the correct subset selected for the current context -(cid:173)\nthe module selection problem. From human psychophysical data we know that such \na selection process must be driven by two distinct processes; feedforward switching \nbased on sensory signals such as the perceived size of an object, and switching based \non feedback of the outcome of a movement . For example, on picking up a object \nwhich appears heavy, feedforward switching may activate controllers responsible for \ngenerating a large motor impulse. However, feedback processes, based on contact with \nthe object, can indicate that it is in fact light thereby switching control to inverse \nmodels appropriate for a light object. \n\nIn the coutext of motor control and learning, Gomi and Kawato [4J combined the \nfeedback-error-learning [7J approach and the mixture of experts architecture to learn \nmultiple inverse models for different manipulated objects. They used both the visual \nshapes of the manipulated objects and intrinsic signals, such as somatosensory feedback \nand efference copy of the motor command, as the inputs to the gating network. Using \nthis architecture it was quite difficult to acquire multiple inverse models. This difficulty \narose because a single gating network needed to divide up, based solely on control error, \nthe large input space into complex regions. Furthermore, Gomi and Kawato's model \ncould not demonstrate feedforward controller selection prior to movement execution. \nHere we describe a model of human motor control which addresses these problems and \ncan solve the module learning and selection problems in a computationally coherent \nmanner. The basic idea of the model is that the brain contains multiple pairs (mod(cid:173)\nules) of forward (predictor) and inverse (controller) models (~fPFIM) [10J. Within each \nmodule, the forward and inverse models are tightly coupled both during their acquisi(cid:173)\ntion and use, in which the forward models determine the contribution (responsibility) \nof each inverse model 's output to the final motor command. This architecture can \nsimultaneously learn the mult.iple inverse models necessary for control as well as how \nto select the inverse models appropriate for a given environment in both a feedforward \nand a feedback manner. \n\n2 Multiple paired forward-inverse models \n\nactual arm trajectory \n\ncontextual \n\nsIgnal \n\netlerence copy \n\n01 motor \ncommand \n\ndesIred arm \n\ntrajectory \n\n--- ---:- ----- : \n\n, Feedback: : \n: controller : .'.. . ~ . \n\n_ __ ___ !~~?~~~_r:'!'.t?~7?~':l_a_n_~ . _~':. \n\n. 1 \n\nFigure 1: A schematic diagram showing how MPFIM architecture is used to control \narm movement while manipulating different objects. Parenthesized numbers in the \nfigure relate to the equations in the text. \n\n\fMultiple Paired Forward-Inverse Modelsfor Human Motor Learning and Control \n\n33 \n\n2.1 Motor learning and feedback selection \n\nFigure 1 illustrates how the MPFIM architecture can be used to learn and control \narm movements when the hand manipulates different objects. Central to the multiple \npaired forward-inverse model is the notion of dividing up experience using predictive \nforward models. We consider n undifferentiated forward models which each receive the \ncurrent state, Xl, and motor command, Ut, as input. The output of the ith forward \nmodel is xl+!, the prediction of the next state at time t \n\n(1) \nwhere wI are the parameters of a function approximator \u00a2 (e.g. neural network weights) \nused to model the forward dynamics . These predicted next states are compared to the \nactual next state to provide the responsibility signal which represents the extent to \nwhich each forward model presently accounts for the behavior of the system. Based on \nthe prediction errors of the forward models, the responsibility signal, AL for the i-th \nforward-inverse model pair (module) is calculated by the soft-max function \n\n(2) \n\nwhere X, is the true state of the system and a is a scaling constant. The soft-max \ntransforms the errors using the exponential function and then normalizes these values \nacross the modules, so that the responsibilities lie between 0 and 1 and sum to lover \nthe modules. Those forward models which capture the current behavior, and therefore \nproduce small prediction errors, will have high responsibilities 1. The responsibilities \nare then used to control the learning of the forward models in a competitive manner, \nwith those models with high responsibilities receiving proportionally more of their error \nsignal than modules \\vith low responsibility. The competitive learning among forward \nmodels is similar in spirit to \"annealed competition of experts\" architecture [9]. \n\n'\" \n\ni \n\n. (XI - X, = f -d ./1, Xt - Xt ) \n....JoW, = f/l l -d \nd d\u00a2z \nwi \n\ndil \\/( \nwi \n\nAi) \n\nAi \n\n(3) \n\nFor each forward model there is a paired inverse model whose inputs are the desired \nnext state X;+I and the current state Xt. The ith inverse model produces a motor \ncommand ul as output \n\ni _ ,1,( \n\nUt -\n\nZ \n\n'f/ at, x t+I ' Xt \n\n* \n\n) \n\n(4) \n\nwhere al are the parameters of some function approximator 'lb . \nThe total motor command is the summation of the outputs from these inverse models \nusing the responsibilities. A:, to weight the contributions. \n\nn \n\n11 \n\nUt = LA~U: = LA;t.b(a;,x;+l,xd \n\n(5) \n\ni=1 \n\n;=1 \n\nOnce again. the responsibilities are used to weight the learning of each inverse model. \nThis ensures t hat inverse models learns only when their paired forward models make \naccurate predictions. Although for supervised learning the desired control command \nu; is needed (hut is generally not available), we can approximate (ui - uD with the \nfeedback motor command signal u fb \n\n[7] . \n\nI Because selecting modules can be regarded as a hidden state estimation problem , an \n\nalternative way to determine appropriate forward models is to use the E~1 algorithm [3J. \n\n\f34 \n\nM. Haruno, D. M. Wolpert and M. Kawato \n\n(6) \n\nIn summary, the responsibility signals are used in three ways-\nfirst to gate the learning \nof the forward models (Equation 3), second to gate the learning of the inverse models \n(Equation 6), and third to gate the contribution of the inverse models to the final motor \ncommand (Equation 5). \n\n2.2 Multiple responsibility predictors: Feedforward selection \n\nWhile the system described so far can learn mUltiple controllers and switch between \nthem based on prediction errors, it cannot provide switching before a motor command \nhas been generated and the consequences of this action evaluated. To allow the system \nto switch controllers based on contextual information, we introduce a new component, \nthe responsibility predictor (RP). The input to this module, yt, contains contextual \nsensory information (Figure 1) and each RP produces a prediction of its own module's \nresponsibility \n\n(7) \n\nThese estimated responsibilities can then be compared to the actual responsibilities A.~ \ngenerated from the responsibility estimator. These error signals are used to update the \nweights of the RP by supervised learning. \nFinally a mechanism is required to combine the responsibility estimates derived from \nthe feed forward RP and from the forward models' prediction errors derived from \nfeedback. We determine the final value of responsibility by using Bayes rule; mul-\ntiplying the transformed feedback errors e- lx,-5;;1 2/O'2 by the feed forward responsibil(cid:173)\nity ~; and then normalizing across the modules within the responsibility estimator: \n~ ie-IXt -5;; 12/20'2/ \",n ~j e-Ixt -5;{1 2 /2 0'2 \n\nt \n\n~)=l t \n\nThe estimates of the responsibilities produced by the RP can be considered as prior \nprobabilities because they are computed before the movement execution based only on \nextrinsic signals and do not rely on knowing the consequences of the action. Once an \naction takes place, the forward models' errors can be calculated and this can be thought \nof as the likelihood after the movement execution based on knowledge of the result of \nthe movement. The final responsibility which is the product of the prior and likelihood, \nnormalized across the modules, represents the posterior probability. Adaptation of the \nRP ensures that the prior probability becomes closer to the posterior probability. \n\n3 Simulation of arm tracking while manipulating objects \n\n3.1 Learning and control of different objects \n\n~I M \n\nK \n\na \n(J \nJ \n\n5.0 \n8.0 \n2.0 \n\n7.0 \n3.0 \n10.0 \n\n4.0 \n1.0 \n1.0 \n\nFigure 2: Schematic illustration of the simulation experiment in which the arm makes \nreaching movements while grasping different objects with mass M, damping Band \nspring K. The object properties are shown in the Table. \n\n\fMultiple Paired Forward-Inverse Models for Human Motor Learning and Control \n\n35 \n\nTo examine motor learning and control we simulated a task in which the hand had \nto track a given trajectory (30 s shown in Fig. 3 (b)), while holding different objects \n(Figure 2). The manipulated object was periodically switched every 5 s between three \ndifferent objects Ct, {3 and 'Y in this order. The physical characteristics of these ob(cid:173)\njects are shown in Figure 2. The task was exactly the same as that of Gomi and \nKawato [4], and simulates recent grip force-load force coupling experiments by Flana(cid:173)\ngan and Wing [2]. \nIn the first simulation, three forward-inverse model pairs (modules) were used: the same \nnumber of modules as the number of objects. We assumed the existence of a perfect \ninverse dynamic model of the arm for the control of reachiilg movements. In each \nmodule, both forward (\u00a2 in (1)) and inverse ('IjJ in (4)) models were implemented as a \nlinear neural network2 . The use of linear networks allowed M, Band K to be estimated \nfrom the forward and inverse model weights. Let MJ ,Bf ,Kf be the estimates from \nthe jth forward model and Mj,B},Kj be the estimates from the jth inverse model. \nFigure 3(a) shows the evolution of the forward model estimates of MJ ,Bf ,Kf for \nthe three modules during learning. During learning the desired trajectory (Fig. 3(b)) \nwas repeated 200 times. The three modules started from randomly selected initial \nconditions (open arrows) and converged to very good approximations of the three \nobjects (filled arrows) as shown in Table 1. Each of the three modules converged to \nCt, {3 and 'Y objects, respectively. It is interesting to note that all the estimates of the \nforward models are superior to those of inverse models. This is because the inverse \nmodel learning depends on how modules are switched by the forward models . \n\n... -J \n\n, . \n\n(a) \n\nFigure 3: (a) Learning acquisition of three pairs of forward and inverse models corre(cid:173)\nsponding to three objects. (b) Responsibility signals from the three modules (top 3) \nand tracking performance (bottom) at the beginning (left) and at the end (right) of \nlearning. \n\n2 \n3 \n\n5.0071 \n8.0029 \n\n7.0040 \n3.0010 \n\n4.0000 \n0.9999 \n\n5.0102 \n7.8675 \n\n6.9554 \n3.0467 \n\n4.0089 \n0.9527 \n\nTable 1: Learned object characteristics \n\nFigure 3(b) shows the performance of the model at the beginning (left) and end (right) \nof learning. The top 3 panels show the responsibility signals of Ct, {3 and 'Y modules in \n\n2 Any kind of architecture can be adopted instead of linear networks \n\n\f36 \n\nM Hanlno, D. M Wolpert and M Kawato \n\nthis order, and the bottom panel shows the hand's actual and desired trajectories. At \nthe start of learning, the three modules were equally poor and thus generated almost \nequal responsibilities (1/3) and were involved in control almost equally. As a result, \nthe overall control performance was poor with large trajectory errors. However, at the \nend of learning, the three modules switched almost perfectly (only three noisy spikes \nwere observed in the top 3 panels on the right), and no trajectory error was visible \nat this resolution in the bottom panel. If we compare these results with Figure 7 of \nGomi and Kawato [4] for the same task, the superiority of the MPFIM compared to \nthe gating-expert architecture is apparent. Note that the number of free parameters \n(synaptic weights) is smaller in the current architecture than the other. The difference \nin performance comes from two features of the basic architecture. First, in the gating \narchitecture a single gating network tries to divide the space while many forward models \nsplits the space in MPFIM. Second, in the gating architecture only a single control error \nis used to divide the space, but mUltiple prediction errors are simultaneously utilized \nin MPFIM. \n\n3.2 Generalization to a novel object \n\nA natural question regarding MPFIM architecture is how many modules need to be \nused. In other words, what happens if the number of objects exceeds the number of \nmodules or an already trained MPFIM is presented with an unfamiliar object. To \nexamine this, the MPFIM trained from 4 objects a,(3\" and <5 was presented with a \nnovel object 'fJ (its (M, B, K) is (2.02,3.23,4.47)). Because the object dynamics can be \nrepresented in a 3-dimensional parameter space and the 4 modules already acquired \ndefine 4 vertices of a tetrahedron within the 3-D space, arbitrary object dynamics \ncontained within the tetrahedron can be decomposed into a weighted average of the \nexisting 4 forward modules (internal division point of the 4 vertices). The theoreti(cid:173)\ncally calculated weights of 'fJ were (0.15,0.20,0.35,0.30). Interestingly, each module's \nresponsibility signal averaged over trajectory was (0.14,0.24,0.37,0.26). Although the \nresponsibility was computed in the space of accelerations prediction by soft-max and \nhad no direct relation to the space of (M, B, K), the two vectors had very similar val(cid:173)\nues. This demonstrates the flexibility of MPFIM architecture which originates from its \nprobabilistic soft-switching mechanism. This is in sharp contrast to the hard switching \nof Narendra [8] for which only one controller can be selected at a time. \n\n3.3 Feedforward selection and the size-weight illusion \n\nFigure 4: Responsibility predictions based on contextual information of 2-D object \nshapes (top 3 traces) and corresponding acceleration error of control induced by the \nillusion (bottom trace) \n\nIn this section, we simulated prior selection of inverse models by responsibility pre(cid:173)\ndictors based on contextual information, and reproduce the size-weight illusion. Each \nobject was associated with a 2-D shape represented as a 3x3 binary matrix, which was \nrandomly placed at one of four possible locations on a 4x4 retinal matrix (see Gomi \n\n\fMultiple Paired Forward-Inverse Models for Human Motor Learning and Control \n\n37 \n\nand Kawato for more details). The retinal matrix was used as the contextual input \nto the RP (3-layer sigmoidal feedforward network). During the course of learning, the \ncombination of manipulated objects and visual cues were fixed as A-a, B-,B and C(cid:173)\n-y. After 200 iterations of the trajectory, the combination A--y was presented for the \nfirst. Figure 4 plots the responsibility signals of the three modules (top 3 traces) and \ncorresponding acceleration error of the control induced by the illusion (bottom trace). \nThe result replicates the size-weight illusion [1, 5] seen in the erroneous responsibility \nprediction of the a responsibility predictor based on the contextual signal A and its \ncorrection by the responsibility signal calculated by the forward models. Until the \nonset of movement (time 0) , A was always associated with light Ct, and C was always \nassociated with heavy -y. Prior to movement when A was associated with -y, the a mod(cid:173)\nule was switched on by the visual contextual information, but soon after the movement \nwas initiated, the responsibility signal from the forward model's prediction dominated, \nand the -y module was properly selected. Furthermore, after a while, the responsibility \npredictor of the modules were re-Iearned to capture this new association between the \nobjects visual shape and its dynamics. \nIn conclusion, the MPFIM model of human motor learning and control, like the human \nmotor system, can learn multiple tasks, shows generalization to new tasks and an ability \nto switch between tasks appropriately. \n\nAcknowledgments \nWe thank Zoubin Ghahramani for helpful discussions on the Bayesian formulation of \nthis model. Partially supported by Special Coordination Funds for promoting Science \nand Technology at the Science and Technology Agency of Japanese govenmnent, and \nby HFSP grant. \n\nReferences \n[1] E. Brenner, B. Jeroen, and J . Smeets. Size illusion influences how we lift but not \n\nhow we grasp an object. Exp Brain Res, 111:473- 476, 1996. \n\n[2] J.R. Flanagan and A. Wing. The role of internal models in motion planning and \ncontrol: Evidence from grip force adjustments during movements of hand-held \nloads. J Neurosci, 17(4):1519- 1528, 1997. \n\n[3] A.M. Fraser and A. Dimitriadis. Forecasting probability densities by using hidden \nMarkov models with mixed states. In A.S. Wiegand and N.A. Gershenfeld, editors, \nTime series prediction: Forecasting the future and understanding the past, pages \n265-282. Addison-Wesley, 1993. \n\n[4] H. Gomi and M. Kawato. Recognition of manipulated objects by motor learning \n\nwith modular architecture networks. Neural Networks, 6:485- 497, 1993. \n\n[5] A. Gordon, H. Forssberg, R. Johansson , and G. Westling. Visual size cues in \nth~ I>rogramming of manipulative forces during precision grip. Exp Brain Res, \n83.477-482, 1991. \n\n[6] R. Jacobs, M. Jordan, S. Nowlan, and G. Hinton. Adaptive mixture of local \n\nexperts. Neural Computation, 3:79- 87, 1991. \n\n[7] M. Kawato . Feedback-error-Iearning neural network for supervised learning. In \nR. Eckmiller, editor, Advanced neural computers, pages 365- 372. North-Holland, \n1990. \n\n[8] K. Narendra and J. Balakrishnan. Adaptive control using multiple models. IEEE \n\nTransaction on Automatic Control, 42(2):171 -187, 1997. \n\n[9] K. Pawelzik, J. Kohlmorgen, and K. Muller. Annealed competition of experts \nf