{"title": "Coordinate Transformation Learning of Hand Position Feedback Controller by Using Change of Position Error Norm", "book": "Advances in Neural Information Processing Systems", "page_first": 1038, "page_last": 1044, "abstract": null, "full_text": "Coordinate Transformation Learning of \nHand Position Feedback Controller by \nU sing Change of Position Error Norm \n\nEimei Oyama* \n\nMechanical Eng. Lab. \n\nNamiki 1-2, Tsukuba Science City \n\nIbaraki 305-8564 Japan \n\nSusumu Tachi \n\nThe University of Tokyo \nHongo 7-3-1, Bunkyo-ku \nTokyo 113-0033 Japan \n\nAbstract \n\nIn order to grasp an object, we need to solve the inverse kine(cid:173)\nmatics problem, i.e., the coordinate transformation from the visual \ncoordinates to the joint angle vector coordinates of the arm. Al(cid:173)\nthough several models of coordinate transformation learning have \nbeen proposed, they suffer from a number of drawbacks. In human \nmotion control, the learning of the hand position error feedback \ncontroller in the inverse kinematics solver is important. This paper \nproposes a novel model of the coordinate transformation learning \nof the human visual feedback controller that uses the change of \nthe joint angle vector and the corresponding change of the square \nof the hand position error norm. The feasibility of the proposed \nmodel is illustrated using numerical simulations. \n\n1 \n\nINTRODUCTION \n\nThe task of calculating every joint angle that would result in a specific hand position \nis called the inverse kinematics problem. An important topic in neuroscience is the \nstudy of the learning mechanisms involved in the human inverse kinematics solver. \nWe questioned five pediatricians about the motor function of infants suffering from \nserious upper limb disabilities. The doctors stated that the infants still were able \nto touch and stroke an object without hindrance. In one case, an infant without \na thumb had a major kinematically influential surgical operation, transplanting an \nindex finger as a thumb. After the operation, the child was able to learn how to \nuse the index finger like a thumb [1]. In order to explain the human motor learning \n\n\u2022 Phone:+81-298-58-7298, Fax:+81-298-58-7201, e-mail:eimei@mel.go.jp \n\n\fCoordinate Transformation Learning of Feedback Controller \n\n1039 \n\ncapability, we believe that the coordinate transformation learning of the feedback \ncontroller is a necessary component. \n\nAlthough a number of learning models of the inverse kinematics solver have been \nproposed, a definitive learning model has not yet been obtained. This is from the \npoint of view of the structural complexity of the learning model and the biolog(cid:173)\nical plausibility of employed hypothesis. The Direct Inverse Modeling employed \nby many researchers [2] requires the complex switching of the input signal of the \ninverse model. When the hand position control is performed, the input of the in(cid:173)\nverse model is the desired hand position, velocity, or acceleration. When the inverse \nmodel learning is performed, the input is the observed hand position, velocity, or \nacceleration. Although the desired signal and the observed signal could coincide, \nthe characteristics of the two signals are very different. Currently, no research has \nsucceesfully modeled the switching system. Furthermore, that learning model is not \n\"goal-directed\"; i.e., there is no direct way to find an action that corresponds to a \nparticular desired result. The Forward and Inverse Modeling proposed by Jordan \n[3] requires the back-propagation signal, a technique does not have a biological ba(cid:173)\nsis. That model also requires the complex switching of the desired output signal \nfor the forward model. When the forward model learning is performed, the desired \noutput is the observed hand position. When the inverse kinematics solver learn(cid:173)\ning is performed, the desired output is the desired hand position. The Feedback \nError Learning proposed by Kawato [4] requires a pre-existing accurate feedback \ncontroller. \n\nIt is necessary to obtain a learning model that possesses a number of characteristics: \n(1) it can explain the human learning function; (2) it has a simple structure; and \n(3) it is biologically plausible. This paper presents a learning model of coordinate \ntransformation function of the hand position feedback controller. This model uses \nthe joint angle vector change and the corresponding change of square of the hand \nposition error norm. \n\n2 BACKGROUND \n2.1 Discrete Time First Order Model of Hand Position Controller \nLet 8 E Rm be the joint angle vector and x ERn be the hand position/orientation \nvector given by the vision system. The relationship between x and 8 is expressed \nas x = /(8) where / is a C 1 class function. The Jacobian of the hand position \nvector is expressed as J(8) = 8/(8)/88. Let Xd be the desired hand position and \n/(8) be the hand position error vector. In this paper, an \ne = Xd - X = Xd -\ninverse kinematics problem is assumed to be a least squares minimization problem \nthat calculates 8 in order to minimize the square of the hand position error norm \nS(xd,8) = le1 2/2 = IXd -\nFirst, the feed-forward controller in the human inverse kinematics solver is disre(cid:173)\ngarded and the following first order control system, consisting of a learning feedback \ncontroller, is considered: \n\n/(8)1 2/2. \n\n8(k + 1) = 8(k) + A8(k) \n\n(1) \n\nNOIse d(k) Vector \n\nJoint Angle H \n\n+ ~k~ (f?-'f(8) X(!i... \n\numan \n\nArm Hand \n\nPosition \n\nPosition \nError \n\ne(k) \n\n+ \n\nXti \nDesired \nHand \nPosition \n\nFeedback \u2022 Di~rbance \n\n8(k) \n\ntPp/.8, e) (+-~ \n\nFigure 1: Configuration of 1-st Order Model of Hand Position Controller \n\n\f1040 \n\nE. Oyama and S. Tachi \n\ne(k) = Xd - f(8(k)) \n\na8(k) = ~fb(8(k), e(k)) + d(k) \n\n(2) \n(3) \nwhere d(k) is assumed to be a disturbance noise from all components except the \nhand position control system. Figure 1 shows the configuration of the control sys(cid:173)\ntem. In this figure, Z-l is the operator that indicates a delay in the discrete time \nsignal by a sampling interval of tl.t. Although the human hand position control sys(cid:173)\ntem includes higher order complex dynamics terms which are ignored in Equation \n(2), McRuer's experimental model of human compensation control suggests that the \nterm that converts the hand position error to the hand velocity is a major term in \nthe human control system [5]. We consider Equation (2) to be a good approximate \nmodel for the analysis of human coordinates transformation learning. \nThe learner ~ fb (8, e) E R m, which provides the hand position error feedback, is \nmodeled using the artificial neural network. In this paper, the hand position error \nfeedback controller learning by observing output x(k) is considered without any \nprior knowledge of the function f (8). \n2.2 Learning Model of the Neural Network \nLet ~fb(8, e) be the desired output of the learner ~fb(8, e). ~fb(8, e) functions as \na teacher for ~fb(8,e). Let ~jb(8, e) be the updated output of ~fb(8,e) by the \nlearning. Let E[t(8, e)18, e] be the expected value of a scalar, a vector, or a matrix \nfunction t(8,e) when the input vector (8 ,e) is given. We assume that ~fb(8 , e) is \nan ideal learner which is capable of realizing the mean of the desired output signal, \ncompletely. ~+ fb(8, e) can be expressed as follows: \n\n~jb(8, e) ~ E[~fb(8, e)18, e] = ~fb(8 , e) + E[a~fb(8, e)18, e] \n\na~fb(8, e) = ~fb(8, e) - ~fb(8, e) \n\nWhen the expected value of a~fb(8, e) is expressed as: \n\nE[a~fb(8,e)18,e] ~ Gfbe - Rfb~fb(8 , e) , \nRfb E Rm xm is a positive definite matrix, and the inequality \n\nI 8~jb(8, e) I = I 8(G fb e - (Rfb - I)~fb(8, e\u00bb I < 1 \n8~fb(8, e) \n\n8~ fb(8 , e) \n\nis satisfied, the final learning result can be expressed as: \n\n~fb(8 , e) ~ Rjb1Gfbe \n\n(4) \n\n(5) \n\n(6) \n\n(7) \n\n(8) \n\nby the iteration of the update of ~fb(8 , e) expressed in Equation (4). \n3 USE OF CHANGE OF POSITION ERROR NORM \n3.1 A Novel Learning Model of Feedback Controller \nThe change of the square of the hand position error norm tl.S = S(Xd , 8 + a8) -\nS(Xd , 8) reflects whether or not the change of the joint angle vector A8 is in proper \ndirection. The propose novel learning model can be expressed as follows: \n\n~fb(8, e) = -atl.Sa8 \n\n(9) \nwhere a is a small positive real number. We now consider a large number of trials \nof Equation (2) with a large variety of initial status 8(0) with learnings conducted \nat the point of the input space of the feedback controller (8, e) = (8(k -1), e(k -1\u00bb \nat time k. tl.S and a8 can be calculated as follows. \n\nS(k) - S(k - 1) = ~(le(kW -Ie(k - 1W) \n\ntl.S \na8 = a8(k - 1) \n\n(10) \n\n(11) \n\n\fCoordinate Transformation Learning of Feedback Controller \n\n1041 \n\n.---------, Change of Square of \n\nHand Position Error Norm \n\ne(k) \n\nHand \nPosition \nError \n\ne(k) \n\nInput for \nLearning \ne(k-l) ---\n\nError Signal for k ' / '---,..--T----' \nFeedback \nController \n\nInput for \nLearning \n\nInput for \nControl \n8(k) \n\n(J(k-l) \n\nChange of \nJoint Angle \nVector \n\nd8(k) \n\nDist\"\"'-z \n\nd(k) NoiJe \n\nFigure 2: Configuration of Learning Model of Feedback Controller \n\nFigure 2 shows the conceptual diagram of the proposed learning model. \nLet p(qI8, e) be the probability density function of a vector q at at the point (8, e) \nin the input space of ~fb(8, e). In order to simplify the analysis of the proposed \nlearning model, d(k) is assumed to satisfy the following equation: \n\n(12) \nWhen d8 is small enough, the result of the learning using Equation (9) can be \nexpressed as: \n\np(dI8, e) = p( -dI8, e) \n\n~fb(8, e) ~ a(~R9JT (8)J(8) + 1)-1 R9JT (8)e \n\n(13) \n\n(14) \n\nR9 = E[d8d8T I8, e] \n\nwhere JT (8)e is a vector in the steepest descent direction of S(Xd, 8). When d(k) is \na non-zero vector, R9 is a positive definite symmetric matrix and (~R9JT J + 1)-1 \nis a positive definite matrix. When a is appropriate, ~ fb(8, e) as expressed in \nEquation (13) can provide appropriate output error feedback control. The deriva(cid:173)\ntion of the above result will be illustrated in Section 3.2. A partially modified \nsteepest descent direction can be obtained without using the forward model or the \nback-propagation signal, as Jordan's forward modeling [3]. \nLet Rd be the covariance matrix of the disturbance noise d(k). When a is infinites(cid:173)\nimal, R9 ~ Rd is established and an approximate solution ~fb(8,e) ~ aRdJT(8)e \nis obtained. \n3.2 Derivation of Learning Result \nThe change of the square of the hand position error norm llS(Xd, 8) by d8 can be \ndetermined as: \n\nllS(xd, 8) = 8S~;, 8) d8 + ~d8T H(Xd, 8)d8 + O(d83 ) \n\n(15) \n\n= -eT(J(8) + ~ 8~~8) i&l d8)d8 + ~d8T J T (8)J(8)d8 + O(d83 ) \n\nwhere i&l is a 2-operand operator that indicates the Croneker's product. H(Xd,8) E \nRmxm is the Hessian of S(Xd, 8). O(d83 ) is the sum of third and higher order \nterms of d8 in each equation. When d8 is small enough, the following approximate \nequations are obtained: \n\n18J(8) \ndx ~ J(8)d8 ~ J(8 + 2\"d8)d8 ~ (J(8) + 2 88 \n\n1 \n\ni&l d8)d8 \n\n(16) \n\nTherefore, llS can be approximated as follows: \n1 \n\nllS ~ _eT J(8)d8 + 21dXI2 \n\n(17) \n\n\f1042 \n\nE. Oyama and S. Tachi \n\nSince e T J AOAO = AOAOT JT e and IAxI2 AO = AOAOT JT J AO are deter(cid:173)\nmined, tl.S AO can be approximated as: \n\nConsidering AOnjb defined as AOnjb = AO -\n.jb(O,e), the expected value of the \nproduct of AO and tl.S at the point (O,e) in the input space of .jb(O,e) can be \napproximated as follows: \n\nE[tl.SAOIO, e] \n\n~ -ReJ e + 2ReJ J.jb(O,e) \n\nT IT \n\n(19) \n\n(18) \n\n1 \n\n+ 2E[AOAO J J AOnjblO, e] \n\nT T \n\nWhen the arm is controlled according to Equation (2), AOnjb is the disturbance \nnoise d(k). Since d(k) satisfies Equation (12), the following equation is established. \n\nE[AOAOT JT JAOnjbIO,e] = 0 \n\nTherefore, the expected value of A.jb(O, e) can be expressed as; \n\nE[A.jb(O, e)IO, e] ~ aReJ e - (2ReJ J + I).jb(O, e) \n\nTaT \n\n(20) \n\n(21) \n\nWhen a is small enough, the condition described in Equation (7) is established. \nThe learning result expressed as Equation (13) is obtained as described in Section \n2.2. \n\nIt should be noted that the learning algorithm expressed in Equation (9) is ap(cid:173)\nplicable not only to S(Xd,O), but also to general penalty functions of hand po(cid:173)\nsition error norm lei. The proposed learning model synthesizes a direction that \ndecreases S(Xd,O) by summing after weighting AO based on the increase or de(cid:173)\ncrease of S(Xd, 0). \nThe feedback controller defined in Equation (13) requires a number of iterations \nto find a correct inverse kinematics solution, as the coordinates transformation \nfunction of the controller is incomplete. However, by using Kawato's feedback \nerror learning [4], the second feedback controller; the feed-forward controller; or the \ninverse kinematics model that has a complete coordinate transformation function \ncan be obtained as shown in Section 4. \n\n4 TRACKING CONTROL SYSTEM LEARNING \n\nIn this section, we will consider the case where Xd changes as xd(k)(k \n1,2, ... ). The hybrid controller that includes the learning feed-forward controller \n.ff(O(k), AXd(k)) E Rm that transforms the change of the desired hand position \nAXd(k) = xd(k + 1) - xd(k) to the joint angle vector space is considered: \n\nAO(k) = .ff(O(k), AXd(k)) + .,b(O(k),e(k)) + d(k) \n\ne(k) = xd(k) - x(k) \n\nThe configuration of the hybrid controller is illustrated in Figure 3. \nBy using the modified change of the square of the error norm expressed as: \n\n2 \ntl.S = 2(lxd(k - 1) - x(k)1 -\n\n1 \n\n2 \nle(k - 1)1 ) \n\n(22) \n(23) \n\n(24) \n\nand AO(k) as defined in Equation (22), the feedback controller learning rule defined \nin Equation (9) is useful for the tracking control system. A sample holder for \nmemorizing xd(k -1) is necessary for the calculation of tl.S. When the distribution \n\n\fCoordinate Transformation Learning of Feedback Controller \n\n1043 \n\nError Signal for ~:,..-:--::----:-\"' \nFccdforword \nController \n\nFccdforwonl \nCoDtrolJer \n~ J(8)J*(8)=1 \n\ni - - - - r - - - , \n\n4},(8,Lix\" (k\u00bb)- J*(8)Lh\" (k) \n\niixd(k) ~ \n\n:I \n\n./i(8'. aj(8) \nI arr-\n\nPosition \nI!m>r \n\ne(k) \n\n+ \n\n~(k) -\nDesired \nHand \nPosition \n\nFeedback \u2022 \n\nHWIIIJIArm \n\nHand \nPosition \nx(k) \n\n~ \n\nX=f(O) \n\n8(k) \n\n8(k) \n\nFigure 3: Configuration of Hybrid Controller \n\nof 4Xd(k) satisfies Equation (20), Equation (13) still holds. When 4Xd(k) has no \ncorrelation with d(k) and 4Xd(k) satisfies p(4XdI8, e) = p( -4XdI8, e), Equation \n(20) is approximately established after the feed-forward controller learning. \n\nUsing 48(k) defined in Equation (2) and e(k) defined in Equation (23), tl.S de(cid:173)\nfined in Equation (10) can be useful for the calculation of ~fb(8, e). Although the \nlearning calculation becomes simpler, the learning speed becomes much lower. \n\nLet~' ff(8(k), 4Xd(k)) be the desired output of ~,,(8(k), 4Xd(k)). According to \nKawato's feedback error learning [4], we use ~',,(8(k), 4Xd(k)) expressed as: \n~',,(8(k), 4Xd(k)) = (1 - >..)~,,(8(k), 4Xd(k)) + ~fb(8(k + 1), e(k + 1)) \n(25) \nwhere >.. is a small, positive, real number for stabilizing the learning process and \nensuring that equation ~,,(8,O) ~ 0 holds. If >.. is small enough, the learning \nfeed-forward controller will fulfill the equation: \n\nJ~,,(8, 4Xd) ~ 4Xd \n\n(26) \n\n5 NUMERICAL SIMULATION \nNumerical simulation experiments were performed in order to evaluate the perfor(cid:173)\nmance of the proposed model. The inverse kinematics of a 3 DOF arm moving on \na 2 DOF plane were considered. The relationship between the joint angle vector \n8 = (81'(h, (3 ) T and the hand position vector x = (x, y) T was defined as: \n\nx = Xo + Ll cos(8t} + L2 cos(81 + (2 ) + L3 cos(81 + 82 + (3 ) \ny = Yo + Ll sin(81) + L2 sin(81 + (2 ) + L3 sin(81 + 82 + (3 ) \n\n(27) \n(28) \nThe range for 81 was (-300 ,1200 ); the range for 82 was (0 0 ,1200 ); and the range \nfor 83 was (_750 ,750 ). Ll was 0.30 m, L2 was 0.25 m and L3 was 0.15 m. \nRandom straight lines were generated as desired trajectories for the hand. The \ntracking control trials expressed as Equation (22) with the learning of the feedback \ncontroller and the feed-forward controller were performed. The standard deviation \nof each component of d was 0.01. Learnings based on Equations (9), (22) , (24), \nand (25) were conducted 20 times in one tracking trial. 1,000 tracking trials were \nconducted to estimate the RMS(Root Mean Square) of e(k) . \nIn order to accelerate the learning, a in Equation (9) was modified as a = \n0.5/(Itl.xI 2 + 0.11tl.(12). >.. in Equation (25) was set to O.OOL \nTwo neural networks with 4 layers were used for the simulation. The first layer had \n5 neurons and the forth layer had 3 neurons. The other layers had 15 neurons each. \nThe first layer and the forth layer consisted of linear neurons. The initial values of \nweights of the neural networks were generated by using uniform random numbers. \nThe back-propagation method without optimized learning coefficients was utilized \nfor the learning. \n\n\f1044 \n\nE. Oyama and S. Tachi \n\nEl00~------------------~ \n....... \n\ny \n\n0.5 \n\n... g \n\nw 10.2 +-r--___ --......... __ -\n~ 10\u00b0101102103104105106107 \nCE: \n\n___ ............. -r.........f \n\nNumber of Trials \n\no \n\n0.5 x \n\nFigure 4: Learning Process of Controller Figure 5: One Example of Tracking Control \n\nFigure 4 shows the progress of the proposed learning model. It can be seen that the \nRMS error decreases and the precision of the solver becomes higher as the number \nof trials increases. The RMS error became 9.31 x 1O-3m after 2 x 107 learning trials. \nFigure 5 illustrates the hand position control by the inverse kinematics solver after \n2 x 107 learning trials. The number near the end point of the arm indicates the \nvalue of k. The center of the small circle in Figure 5 indicates the desired hand \nposition. The center of the large circle indicates the final desired hand position. \nThrough learning, a precise inverse kinematics solver can be obtained. However, \nfor RMS error to fall below 0.02, trials must be repeated more than 106 times. In \nsuch cases, more efficient learner or a learning rule is necessary. \n\n6 CONCLUSION \nA learning model of coordinate transformation of the hand position feedback con(cid:173)\ntroller was proposed in this paper. Although the proposed learning model may \ntake a long time to learn, it is capable of learning a correct inverse kinematics \nsolver without using a forward model, a back-propagation signal, or a pre-existing \nfeedback controller. \nWe believe that the slow learning speed can be improved by using neural networks \nthat have a structure suitable for the coordinate transformation. A major limitation \nof the proposed model is the structure of the learning rule, since the learning rule \nrequires the calculation of the product of the change of the error penalty function \nand the change of the joint angle vector. However, the existence of such structure in \nthe nervous system is unknown. An advanced learning model which can be directly \ncompared with the physiological and psychological experimental results is necessary. \n\nReferences \n[1] T. Ogino and S. Ishii, \"Long-term Results after Pollicization for Congenital \n\nHand Deformities,\" Hand Surgery, 2, 2,pp.79-85,1997 \n\n[2] F. H. Guenther and D. M. Barreca,\" Neural models for flexible control of red un(cid:173)\ndant systems,\" in P. Morasso and V. Sanguineti (Eds.), Self-organization, Com(cid:173)\nputational Maps, and Motor Control. Amsterdam: Elsevier, pp.383-421,1997 \n[3J M.1. Jordan, \"Supervised Learning and Systems with Excess Degrees of Free(cid:173)\n\ndom,\" COINS Technical Report,88-27,pp.1-41 ,1988 \n\n[4J M. Kawato, K. Furukawa and R. Suzuki, \"A Hierarchical Neural-network \n\nModel for Control and Learning of Voluntary Movement,\" Biological Cyber(cid:173)\nnetics, 57, pp.169-185, 1987 \n\n[5] D.T. McRuer and H. R. Jex, \"A Review of Quasi-Linear Pilot Models,\" IEEE \n\nTrans. on Human Factors in Electronics, HFE-8, 3, pp.38-51, 1963 \n\n\f", "award": [], "sourceid": 1496, "authors": [{"given_name": "Eimei", "family_name": "Oyama", "institution": null}, {"given_name": "Susumu", "family_name": "Tachi", "institution": null}]}