{"title": "Computational Elements of the Adaptive Controller of the Human Arm", "book": "Advances in Neural Information Processing Systems", "page_first": 1077, "page_last": 1084, "abstract": null, "full_text": "Computational Elements of the Adaptive \n\nController of the Human Arm \n\nReza Shadmehr and Ferdinando A. Mussa-Ivaldi \n\nDept . of Brain and Cognitive Sciences \n\nM. I. T ., Cambridge , MA 02139 \n\nEmail: reza@ai.mit.edu , sandro@ai .mit.edu \n\nAbstract \n\nWe consider the problem of how the CNS learns to control dynam(cid:173)\nics of a mechanical system. By using a paradigm where a subject's \nhand interacts with a virtual mechanical environment, we show \nthat learning control is via composition of a model of the imposed \ndynamics. Some properties of the computational elements with \nwhich the CNS composes this model are inferred through the gen(cid:173)\neralization capabilities of the subject outside the training data. \n\n1 \n\nIntroduction \n\nAt about the age of three months, children become interested in tactile exploration \nof objects around them. They attempt to reach for an object , but often fail to \nproperly control their arm and end up missing their target. In the ensuing weeks, \nthey rapidly improve and soon they can not only reach accurately, they can also \npick up the object and place it. Intriguingly, during this period of learning they \ntend to perform rapid, flailing-like movements of their arm, as if trying to \"excite\" \nthe plant that they wish to control in order to build a model of its dynamics. \n\nFrom a control perspective , having a model of the arm's skeletal dynamics seems \nnecessary because of the relatively low gain of the fast acting feedback system \nin the spinal neuro-muscular controllers (Crago et al. 1976), and the long delays in \ntransmission of sensory information to the supra-spinal centers. Such a model could \nbe used by the CNS to predict the muscular forces that need to be produced in order \nto move the arm along a desired trajectory. Yet, this model by itself is not sufficient \n\n1077 \n\n\f1078 \n\nShadmehr and Mussa-Ivaldi \n\nfor performing a contact task because most objects which our hand interacts with \nchange the arm's dynamics significantly. We are left with a situation in which we \nneed to be able to quickly acquire a model of an object's dynamics so that we can \nincorporate it in the control system for the arm. How we learn to construct a model \nof a dynamical system and how our brains represent the composed model are the \nsubjects of this research. \n\n2 Learning Dynamics of a Mechanical System \n\nTo make the idea behind learning dynamics evident, consider the example of con(cid:173)\ntrolling a robotic arm. The arm may be seen as an inertially dominated mechanical \nadmitance, accepting force as input and producing a change in state as its output: \n(1) \n\nq = H(q)-l (F - C(q, q)) \n\nwhere q is the configuration of the robot, H is the inertia tensor, F is the input \nforce from some controllable source (e.g., motors), and C is the coriolis/centripetal \nforces. In learning to control the arm, i.e., having it follow a certain state trajectory \nor reach a final state, we form a model which has as its input the desired change in \nthe state of the arm and receive from its output a quantity representing the force \nthat should be produced by the actuators. Therefore, what needs to be learned is \na map from state and desired changes in state to force: \niJ(q, q, iid) = if(q)qd + C(q, q) \n\n(2) \n\nCombine the above model with a simple PD feedback system, \n\nF = iJ + if K(q - qd) + if B(q - qd) \n\nand the dynamics of the system in Eq. (1) can now be written in terms of a new \nvariable s = q - qd, i.e., the error in the trajectory. It is easy to see that if we have \nif ~ Hand C ~ C, and if J( and B are positive definite, then s will be a decreasing \nfunction of time, i.e., the system will be globally stable. \n\nLearning dynamics means forming the map in Eq. (2). The computational elements \nwhich we might use to do this may vary from simple memory cells that each have an \naddress in the state space (e.g., Albus 1975, Raibert & Wimberly 1984, Miller et al. \n1987), to locally linear functions restricted to regions where we have data (Moore \n& Atkeson 1994), to sigmoids (Gomi & Kawato 1990) and radial basis functions \nwhich can broadly encode the state space (Botros & Atkeson 1991). Clearly, the \nchoice that we make in our computational elements will affect how the learned map \nwill generalize its behavior to regions of the state space outside of the training data. \nFurthermore, since the task is to learn dynamics of a mechanical system (as opposed \nto, for example, dynamics of a financial market), certain properties of mechanical \nsystems can be used to guide us in our choice for the computational elements. For \nexample, the map from states to forces for any mechanical system can be linearly \nparameterized in terms of its mass properties (Slotine and Li, 1991). In an inertially \ndominated system (like a multi-joint arm) these masses may be unknown, but the \nfact that the dynamics can be linearized in terms of the unknowns makes the task \nof learning control much simpler and orders of magnitude faster than using, for \nexample, an unstructured memory based approach. \n\n\fComputational Elements of the Adaptive Controller of the Human Arm \n\n1079 \n\nJ \nji \n/i \nf t \n\nI~ \n--.J Lfl \n1. sec ci \n\nB \n\n2.5 \n\n. ,~ .. -., ... \" . ,' ..... . \n\nc \n\n~ \"C \n!.'! \nLfl o \n1.sec \n\nFigure 1: Dynamics of a real 2 DOF robot was learned so to produce a desired trajectory. \nA: Schematic of the robot. The desired trajectory is the quarter circle. Performance of a \nPD controller is shown by the gray line, as well as in B, where joint trajectories are drawn: \nthe upper trace is the shoulder joint and the lower trace is the elbow joint. Desired joint \ntrajectory is solid line, actual trajectory is the gray line. C: Performance when the PD \ncontroller is coupled with an adaptive model. D: Error in trajectory. Solid line is PD, \nGray line is PD+adaptation. \n\nTo illustrate this point, consider the task of learning to control a real robot arm. \nStarting with the assumption that the plant has 2 degrees of freedom with rota(cid:173)\ntional joints, inertial dynamics of Eq. (2) can be written as a product of a known \nmatrix-function of state-dependent geometric transformations Y, and an unknown \n(but constant) vector a, representing the masses, center of masses, and link lengths: \nD( q , q, qd) = Y (q, q, qd) a . The matrix Y serves the function of referring the un(cid:173)\nknown masses to their center of rotation and is a geometric transformation which \ncan be derived from our assumption regarding the structure of the robot. It is \nthese geometric transformations that can guide us in choosing the computational \nelements for encoding the sensory data (q and q). \nWe used this approach to learn to control a real robot. The adaptation law was \nderived from a Lyapunov criterion, as shown by Slotine and Li (1991): \n\n~ = _yT (q, q, qd) (q - qd(t) + q - qd(t)) \n\nThe system converged to a very low trajectory tracking error within only three pe(cid:173)\nriods of the movement (Fig. 1). This performance is achieved despite the fact that \nour model of dynamics ignores frictional forces, noise and delay in the sensors, and \ndynamics of the actuators. In contrast, using a sigmoid function as the basic com-\n\n\f1080 \n\nShadmehr and Mussa-Ivaldi \n\nputational element of the map and training via back-propagation led to comparable \nlevels of performance in over 4000 repetitions of the training data (Shadmehr 1990) . \nThe difference in performance of these two approaches was strictly due to the choice \nof the computational elements with which the map of Eq. (2) was formed. \n\nNow consider the task of a child learning dynamics of his arm, or that of an adult \npicking up a hammer and pounding a nail. We can scarcely afford thousands of \npractice trials before we have built an adequate model of dynamics. Our proposal \nis that because dynamics of mechanical systems are distinctly structured, perhaps \nour brains also use computational elements that are particularly suited for learning \ndynamics of a motor task (as we did in learning to control the robot in Fig. 1). How \nto determine the structure of these elements is the subject of the following sections. \n\n3 A Virtual Mechanical Environment \n\nTo understand how humans represent learned dynamics of a motor task, we designed \na paradigm where subjects reached to a target while their hand interacted with a \nvirtual mechanical environment. This environment was a force field produced by a \nmanipulandum whose end-effector was grasped by the subject. The field of forces \ndepended only on the velocity of the hand, e.g., F = Bx, as shown in Fig. 2A, \nand significantly changed the dynamics of the limb: When the robot's motors were \nturned off (null field condition), movements were smooth, straight line trajectories \nto the target (Fig. 2B). When coupled with the field however, the hand 's trajectory \nwas now significantly skewed from the straight line path (Fig. 2C). \n\nIt has been suggested that in making a reaching movement, the brain formulates \na kinematic plan describing a straight hand path along a smooth trajectory to the \ntarget (Morasso 1981) . Initially we asked whether this plan was independent of the \ndynamics of the moving limb. If so, as the subject practiced in the environment, \nthe hand path should converge to the straight line, smooth trajectory observed in \nthe null field . Indeed , with practice , trajectories in the force field did converge to \nthose in the null field. This was quantified by a measure of correlation which for all \neight subjects increased monotonically with practice time. \n\nIf the CNS adapted to the force field by composing a model of its dynamics, then \nremoval of the field at the onset of movement (un-be-known to the subject) should \nlead to discrepancies between the actual field and the one predicted by the subject's \nmodel, resulting in distorted trajectories which we call after-effects. The expected \ndynamics of these after-effects can be predicted by a simple model of the upper \narm (Shadmehr and Mussa-Ivaldi 1994). Since the after-effects are a by-product \nof the learning process, we expected that as subjects adapted to the field, their \nperformance in the null field would gradually degrade. We observed this gradual \ngrowth of the after-effects, leading to grossly distorted trajectories in the null field \nafter subjects had adapted to the force field (Fig. 2D). This evidence suggested that \nthe CNS composed a model of the field and used this model to compensate for the \nforces which it predicted the hand would encounter during a movement. \n\nThe information contained in the learned model is a map whose input is the state \nand the desired change in state of the limb, and whose output is force (Eq. 2). How \nis this map implemented by the CNS? Let us assume that the approximation is via \n\n\fComputational Elements of the Adaptive Controller of the Human Arm \n\n1081 \n\n15o , - - - - - - - - - - - - . \n\nI 0.5 \n\n~ \n~ 0 \n~ \n>-\n\n-0.5 \n\n-g .. \n\nr \n\nA \n\n-1 \n\n-0.5 \n\n0.5 \n\nHand x-velocrty (rrV$) \n\nB -150 '--------,-1~00--=:-50-~-------:5::-:-0 -1:-:\"':OO--,J150 \n\nDisplacement (mm) \n\nFigure 2: A: The virtual mechanical environment as a force field. B: Trajectory of reach(cid:173)\ning movements (center-out) to 8 targets in a null field. C: Average\u00b1standard-deviation \nof reaches to same targets when the field was on, before adaptation. D: After-affects of \nadaptation, i.e., when moving in a null field but expecting the field. \n\na distributed set of computational elements (Poggio 1990). What are the properties \nof these elements? An important property may be the spatial bandwidth, i.e_, the \nsize of the receptive field in the input space (the portion of the input space where \nthe element generates a significant output). This property greatly influences how \nthe eNS might interpolate between states which it has visited during training, and \nwhether it can generalize to regions beyond the boundary of the training data. \n\nFor example, in eye movements, it has been suggested that a model of dynamics of \nthe eye is stored in the cerebellum (Shidara et al. 1992). Cells which encode this \nmodel (Purkinje cells) vary their firing rate as a linear function of the state of the \neye, and the sum of their outputs (firing rates) correlates well with the force that the \nmuscles need to produce to move the eye. Therefore, the model of eye's dynamics \nis encoded via cells with very large receptive fields. On the other hand, cells which \ntake part in learning a visual hyperacuity task may have very small receptive fields \n(Poggio et al. 1992), resulting in a situation where training in a localized region \ndoes not lead to generalization. \n\nIn learning control of our limbs, one possibility for the computational elements is the \nneural control circuits in the spinal cord (Mussa-Ivaldi 1992). Upon activation of \n\n\f1082 \n\nShadmehr and Mussa-Ivaldi \n\nTest workspace \n\nTrained Wo rkspace \n\nA \n\n-fo:::;;...-+---:>-\nX \n\n10 em \n\n150 \n\n100 \n\n50 \n\nI 0 \n\n-50 \n\n-100 \n\nB -150 \n\n-100 \n\n-50 \n\n50 \n\n100 \n\n150 \n\nDisplacement (mm) \n\nc \n\n-1 \n\n-0.5 \n\no \n\n0.5 \n\nHand .-velocity (\"\"s) \n\nFigure 3: A: Schematic of subject's arm and the trained region of the workspace where \nthe force field was presented and the \"test\" region where the transferred effects were \nmeasured. B: After-effects at the test region. C: A joint-based translation of the force \nfield shown in Fig. 2A to the novel workspace. This is the field that the subject expected \nat the test region. \n\none such circuit, muscles produce a time varying force field, i.e., forces which depend \non the state of the limb (position and velocity) and time (Mussa-Ivaldi et al. 1990). \nLet us call the force function produced by one such motor element h(q, q, t) . It \nturns out that as one changes the amount of activation to a motor element, the \noutput forces essentially scale. When two such motor elements are activated, the \nresulting force field is a linear combination of the two individual fields (Bizzi et al. \n1991): f = 2::;=1 Cdi(q, q, t). \nN ow consider the task of learning to move in the field shown in Fig. 2A. The \nmodel that the eNS builds is a map from state of the limb to forces imposed by the \nenvironment. Following the above scenario, the task is to find coefficients Ci for each \nelement such that the output field is a good approximation of the environmental \nfield. Unlike the computational elements of a visual task however, we may postulate \nthat the motor elements are characterized by their broad receptive fields . This is \nbecause muscular force changes gradually as a function of the state of the limb \nand therefore its output force is non zero for wide region of the state space. It \nfollows that if learning dynamics was accomplished through formation of a map \nwhose computational elements were these motor functions, then because of the large \nspatial bandwidth of the elements the composed model should be able to generalize \nto well beyond the region of the training data. \n\n\fComputational Elements of the Adaptive Controller of the Human Arm \n\n1083 \n\nTo test this, we limited the region of the input space for which training data was \nprovided and quantified the subject's ability to generalize to a region outside the \ntraining set. Specifically, we limited the workspace where practice movements in the \nforce field took place and asked whether local exposure to the field led to after-effects \nin other regions (Fig. 3A). We found that local training resulted in after-effects in \nparts ofthe workspace where no exposure to the field had taken place (Fig. 3B) . This \nindicated that the model composed by the CNS predicted specific forces well outside \nthe region in which it had been trained. The existence of this generalization showed \nthat the computational elements with which the internal model was implemented \nhad broad receptive fields. \n\nThe transferred after-effects (Fig. 3B) show that at the novel region of the \nworkspace, the subject's model of the environment predicted very different forces \nthan the one on which the subject had been trained on (compare with Fig. 2D). \nThis rejected the hypothesis that the composed model was a simple mapping (i.e., \ntranslation in variant) in a hand-based coordinate system, i.e., from states of the \narm to forces on the hand. The alternate hypothesis was that the composed model \nrelated observed states of the arm to forces that needed to be produced by the \nmuscles and was translation invariant in a coordinate system based on the joints \nand muscles. This would be the case, for example, if the computational elements \nencoded the state of the arm linearly (analogous to Purkinje cells for the case of \neye movements) in joint space. \n\nTo test this idea, we translated the field in which the subject had practiced to the \nnovel region in a coordinate system defined based on the joint space of the subject's \narm, resulting in the field shown in Fig. 3C. We recorded the performance of the \nsubjects in this new field at the novel region of the workspace (after they had been \ntrained on field of Fig. 2A) and found that performance was near optimum at the \nfirst exposure. This indicated that the geometric structure of the composed model \nsupported transfer of information in an intrinsic, e.g., joint based, coordinate sys(cid:173)\ntem. This result is consistent with the hypothesis that the computational elements \ninvolved in this learning task broadly encode the state space and represent their \ninput in a joint-based coordinate system and not a hand-based one. \n\n4 Conclusions \n\nIn learning control of an inertially dominated mechanical system, knowledge of the \nsystem's geometric constraints can direct us to choose our computational elements \nsuch that learning is significantly faciliated. This was illustrated by an example \nof a real robot arm: starting with no knowledge of its dynamics, a reasonable \nmodel was learned within 3 periods of a movements (as opposed to thousands of \nmovements when the computational elements were chosen without regard to the \ngeometric properties). We argued that in learning to control the human arm, the \nCNS might also make assumption regarding geometric properties of its links and \nuse specialized computational elements which facilitate learning of dynamics. \n\nOne possibility for these elements are the discrete neuronal circuits found in the \nspinal cord. The function of these circuits can be mathematically formulated such \nthat a map representing inverse dynamics of the arm is formed via a combination \nof the elements. Because these computational elements encode their input space \n\n\f1084 \n\nShadmehr and Mussa-Ivaldi \n\nbroadly, i.e., has significant output for a wide region of the input space, we expected \nthat if subjects learned a dynamical process from localized training data, then the \nformed model should generalize to novel regions of the state space. Indeed we found \nthat the subjects transferred the training information to novel regions of the state \nspace, and this transfer took place in a coordinate system similar to that of the \njoints and muscles. We therefore suggest that the eNS learns control of the arm \nthrough formation of a model whose computational elements broadly encode the \nstate space, and that these elements may be neuronal circuits of the spinal cord. \nAcknowledgments: Financial support was provided in part by the NIH (AR26710) and \nthe ONR (N00014/90/J/1946). R . S. was supported by the McDonnell-Pew Center for \nCognitive Neurosciences and the Center for Biological and Computational Learning. \n\nReferences \n\nAlbus JS (1975) A new approach to manipulator control: The cerebellar model articulation \ncontroller (CMAC) . Trans ASME J Dyn Syst Meas Contr 97:220-227. \n\nBizzi E , Mussa-Ivaldi FA, Giszter SF (1991) Computations underlying the execution of \nmovement: a novel biological perspective. Science 253 :287-291. \n\nBotros SM , Atkeson CG (1991) Generalization properties of radial basis functions. \nLippmann et al., Adv. in Neural Informational Processing Systems 3:707-713. \n\nIn: \n\nCrago, Houk JC, Hasan Z (1976) Regulatory actions of human stretch reflex. J Neuro(cid:173)\nphysioI39:5-19. \n\nGomi H, Kawato M (1990) Learning control for a closed loop system using feedback error \nlearning. Proc IEEE Conf Decision Contr. \n\nMiller WT, Glanz FH, Kraft LG (1987) Application of a general learning algorithm to the \ncontrol of robotic manipulators. Int J Robotics Res 6(2):84-98. \n\nMoore AW, Atkeson CG (1994) An investigation of memory-based function approximators \nfor learning control. Machine Learning, submitted. \n\nMussa-Ivaldi FA, Giszter SF (1992) Vector field approximation: a computational paradigm \nfor motor control and learning. BioI Cybern 67:491 - 500 . \n\nMussa-Ivaldi FA, Giszter SF, Bizzi E (1990) Motor-space coding in the central nervous \nsystem . Cold Spring Harbor Symp Quant BioI 55:827-835. \n\nPoggio T (1990) A theory of how the brain might work. Cold Spring Harbor Symp Quant \nBioi 55:899-910. \nPoggio T, Fahle M, Edelman S (1992) Fast perceptual learning in visual hyperacuity. \nScience 256:1018-1021. \nRaibert MH, Wimberly Fe (1984) Tabular control of balance in a dynamic legged system. \nIEEE Trans Systems, Man, Cybernetics SMC-14(2):334-339. \n\nShadmehr R (1990) Learning virtual equilibrium trajectories for control of a robot arm. \nNeural Computation 2:436-446. \n\nShadmehr R, Mussa-Ivaldi FA (1994) Adaptive representation of dynamics during learning \nof a motor task. J Neuroscience, in press. \n\nShidara M, Kawano K, Gomi H, Kawato M (1993) Inverse-dynamics model eye movement \ncontrol by Purkinje cells in the cerebellum. Nature 365:50-52. Slotine JJE, Li W (1991) \nApplied Nonlinear Control, Prentice Hall, Englewood Cliffs, New Jersey. \n\n\f", "award": [], "sourceid": 787, "authors": [{"given_name": "Reza", "family_name": "Shadmehr", "institution": null}, {"given_name": "Ferdinando", "family_name": "Mussa-Ivaldi", "institution": null}]}