{"title": "Recursive Estimation of Dynamic Modular RBF Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 239, "page_last": 245, "abstract": null, "full_text": "Recursive Estimation of Dynamic \n\nModular RBF Networks \n\nVisakan Kadirkamanathan \n\nAutomatic Control & Systems Eng. Dept. \n\nUniversity of Sheffield, Sheffield Sl 4DU, UK \n\nvisakan@acse.sheffield.ac. uk \n\nMaha Kadirkamanathan \n\nDragon Systems UK \n\nCheltenham GL52 4RW, UK \n\nmaha@dragon.co.uk \n\nAbstract \n\nIn this paper, recursive estimation algorithms for dynamic modular \nnetworks are developed. The models are based on Gaussian RBF \nnetworks and the gating network is considered in two stages: At \nfirst, it is simply a time-varying scalar and in the second, it is \nbased on the state, as in the mixture of local experts scheme. The \nresulting algorithm uses Kalman filter estimation for the model \nestimation and the gating probability estimation. Both, 'hard' and \n'soft' competition based estimation schemes are developed where in \nthe former, the most probable network is adapted and in the latter \nall networks are adapted by appropriate weighting of the data. \n\n1 \n\nINTRODUCTION \n\nThe problem of learning multiple modes in a complex nonlinear system is increas(cid:173)\ningly being studied by various researchers [2, 3, 4, 5, 6], The use of a mixture of \nlocal experts [5, 6], and a conditional mixture density network [3] have been devel(cid:173)\noped to model various modes of a system. The development has mainly been on \nmodel estimation from a given set of block data, with the model likelihood depen(cid:173)\ndent on the input to the networks. A recursive algorithm for this static case is the \napproximate iterative procedure based on the block estimation schemes [6]. \n\nIn this paper, we consider dynamic systems - developing a recursive algorithm is \ndifficult since mode transitions have to be detected on-line whereas in the block \nscheme, search procedures allow optimal detection. Block estimation schemes for \ngeneral architectures have been described in [2, 4]. However, unlike in those schemes, \nthe algorithm developed here uses relationships based on Bayes law and Kalman \nfilters and attempts to describe the dynamic system explicitly, The modelling is \ncarried out by radial basis function (RBF) networks for their property that by pre(cid:173)\nselecting the centres and widths, the problem can be reduced to a linear estimation. \n\n\f240 \n\nV. KADIRKAMANATHAN. M. KADIRKAMANATHAN \n\n2 DYNAMIC MODULAR RBF NETWORK \n\nThe dynamic modular RBF network consists of a number of models (or experts) \nto represent each nonlinear mode in a dynamical system. The models are based \non the RBF networks with Gaussian function, where the RBF centre and width \nparameters are chosen a priori and the unknown parameters are only the linear \ncoefficients w. The functional form of the RBF network can be expressed as, \n\nK \n\nf(XiP) = L wkgk(X) = w T g \n\n(1) \n\nk=l \n\nis \n\nthe linear weight vector and g \n\n[ . . . , Wk, .. Y E \n\nwhere w = \n[ ... , gk(X), .. . ]T E lR~ are the radial basis functions, where, \ngk(X) = exp {-O.5r-21Ix - mk112} \n\nlRK \n\n(2) \nmk E lRM are the RBF centres or means and r the width. The RBF networks \nare used for their property that having chosen appropriate RBF centre and width \nparameters mk, r, only the linear weights w need to be estimated for which fast, \nefficient and optimal algorithms exist. \n\nEach model has an associated probability score of being the current underlying \nmodel for the given observation. In the first stage of the development, this prob(cid:173)\nability is not determined from parametrised gating network as in the mixture of \nlocal experts [5] and the mixture density network [3], but is determined on-line as it \nvaries with time. In dynamic systems, time information must be taken into account \nwhereas the mixture of local experts use only the state information which is not \nsufficient in general, unless the states contain the necessary information. In the \nsecond stage, the probability is extended to represent both the time and state infor(cid:173)\nmation explicitly using the expressions from the mixture of local experts. Recently, \ntime and state information have been combined in developing models for dynamic \nsystems such as the mixture of controllers [4] and the Input - Output HMM [2]. \nHowever, the scheme developed here is more explicit and is not as general as the \nabove schemes and is recursive as opposed to block estimation. \n\n3 RECURSIVE ESTIMATION \n\nThe problem of recursive estimation with RBF networks have been studied previ(cid:173)\nously [7, 8] and the algorithms developed here is a continuation of that process. Let \nthe set of input - output observations from which the model is to be estimated be, \n(3) \n\n2 N = {zn 1 n = 1, ... , N} \n\nwhere, 2N includes all observations upto the Nth data and Zn is the nth data, \n\n( 4 ) \nThe underlying system generating the observations are assumed to be multi-modal \n(with known H modes), with each observation satisfying the nonlinear relation, \n\nZn = {( X n , Yn) 1 Xn E lRM , Yn E lR} \n\nY = fh(X) + 1] \n\n(5) \nwhere 1] is the noise with unknown distribution and fh (.) : lRM 1-+ lR is the unknown \nunderlying nonlinear function for the hth mode which generated the observation. \nUnder assumptions of zero mean Gaussian noise and that the model can approxi(cid:173)\nmate the underlying function arbitrarily closely, the probability distribution, \nh)12} \n( I h \nP Zn W ,M = Mh, 2 n - 1 = 271\") \n\n_1 -t {1 -11 \n\n-\"2Ro Yn -!h Xn; W \n\n2 Ro exp \n\n( \n\n(6) \n\nn \n\n) \n\n( \n\n\fRecursive Estimation of Dynamic Modular RBF Networks \n\n241 \n\nis Gaussian. This is the likelihood of the observation Zn for the model Mh, which \nin our case is the GRBF network, given model parameters wand that the nth \nobservation was generated by Mh. Ro is the variance of the noise TJ . In general \nhowever, the model generating the nth observation is unknown and the likelihood \nof the nth observation is expanded to include I~ the indicator variable, as in [6], \n\np(zn\"nIW,M,Zn-l) = IT [p(znlw\\Mn = Mh,Zn_dp(Mn = Mhlxn,zn-l)r\" \n\nk \n\nH \n\nh=l \n\n(7) \n\nBayes law can be applied to the on-line or recursive parameter estimation, \n\np(WIZn,M) = p(znIW,M, Zn-dp(WIZn-l,M) \n\nP(ZnIZn-l,M) \n\n(8) \nand the above equation is applied recursively for n = 1, ... , N . The term \np(zn IZn-l, M) is the evidence. If the underlying system is unimodal, this will \nresult in the optimal Kalman estimator and if we assign the prior probability dis(cid:173)\ntribution for the model parameters p(wh IMhk to be Gaussian with mean Wo and \ncovariance matrix (positive definite) Po E 1R xK, which combines the likelihood \nand the prior to give the posterior probability distribution which at time n is given \nby p(whlZn, Mh) which is also Gaussian, \n\np(whIZn,Mh) = (27r)-4Ip~l-t exp { _~(wh - W~fp~-l (wh - w~)} (9) \n\n1 \n\n1 {1 \n\nIn the multimodal case also, the estimation for the individual model parameters \ndecouple naturally with the only modification being that the likelihood used for the \nparameter estimation is now based on weighted data and given by, \n\n' \n\nh\n\nh- 1 \n\n1 h I \n\np(znlw ,Mh,Zn-l)=(27r)-~(Roln )-~exp -'2 Ro In Yn-ih(Xn;W) \n\nh 12} \n(10) \nThe Bayes law relation (8) applies to each model. Hence, the only modification \nin the Kalman filter algorithm is that the noise variance for each model is set to \nRoh~ and the resulting equations can be found in [7]. It increases the apparent \nuncertainty in the measurement output according to how likely the model is to be \nthe true underlying mode, by increasing the noise variance term of the Kalman filter \nalgorithm. Note that the term p(Mn = Mhlxn, zn-l) is a time-varying scalar and \ndoes not influence the parameter estimation process. \n\nThe evidence term can also be determined directly from the Kalman filter, \n\nwhere the e~ is the prediction error and R~ is the innovation variance with, \n\neh \nn \nRh n \n\nhT \n\nYn - wn-1gn \nROln + gnP n-lgn \nT \n\nh- 1 \n\nh \n\n(11) \n\n(12) \n\n(13) \n\nThis is also the likelihood of the nth observation given the model M and the past \nobservations Zn-l. The above equation shows that the evidence term used in \nBayesian model selection [9] is computed recursively, but for the specific priors Ro, \nPo. On-line Bayesian model selection can be carried out by choosing many different \npriors, effectively sampling the prior space, to determine the best model to fit the \ngiven data, as discussed in [7]. \n\n\f242 \n\nV. KADIRKAMANATHAN. M. KADIRKAMANATHAN \n\n4 RECURSIVE MODEL SELECTION \n\nBayes law can be invoked to perform recursive or on-line model selection and this \nhas been used in the derivation of the multiple model algorithm [1] . The multiple \nmodel algorithm has been used for the recursive identification of dynamical nonlin(cid:173)\near systems [7]. Applying Bayes law gives the following relation: \n\n(14) \n\nwhich can be computed recursively for n = 1, ... , N. p(ZnIMh, Zn-1) is the likeli(cid:173)\nhood given in (11) and p(MhIZn) is the posterior probability of model Mh being the \nunderlying model for the nth data given the observations Zn\u00b7 The term p(Zn IZn-1) \nis the normalising term given by, \n\nH \n\nP(ZnI Zn-1) = Lp(znIMh,Zn-1)p(MhI Zn-1) \n\nh=l \n\n(15) \n\nThe initial prior probabilities for models are assigned to be equal to 1/ H. The \nequations (11), (14) combined with the Kalman filter estimation equations is known \nas the multiple model algorithm [1] . \n\nAmongst all the networks that are attempting to identify the underlying system, \nthe identified model is the one with the highest posterior probability p(MhIZn) at \neach time n, ie., \n\n(16) \n\nand hence can vary from time to time. This is preferred over the averaging of all the \nH models as the likelihood is multimodal and hence modal estimates are sought. \nPredictions are based on this most probable model. \n\nSince the system is dynamical, if the underlying model for the dynamics is known, \nit can be used to predict the estimates at the next time instant based on the current \nestimates, prior to observing the next data. Here, a first order Markov assumption \nis made for the mode transitions. Given that at the time instant n - 1 the given \nmode is j, it is predicted that the probability of the mode at time instant n being \nh is the transition probability Phj . With H modes, 2: Phj = 1. The predicted \nprobability of the mode being h at time n therefore is given by, \n\nH \n\nPnln-l(MhI Zn-1) = L Phjp(Mj IZn-1) \n\nj=l \n\n(17) \n\nThis can be viewed as the prediction stage of the model selection algorithm. The \npredicted output of the system is obtained from the output of the model that has \nthe highest predicted probability. \nGiven the observation Zn, the correction is achieved through the multiple model \nalgorithm of (14) with the following modification: \n\np(MhIZn) = p(znIMh, Zn-1)Pnln-1(MhI Zn-1) \n\np(znIZn-d \n\n(18) \n\nwhere modification to the prior has been made. Note that this probability is a \ntime-varying scalar value and does not depend on the states. \n\n\fRecursive Estimation of Dynamic Modular RBF Networks \n\n243 \n\n5 HARD AND SOFT COMPETITION \n\nThe development of the estimation and model selection algorithms have thus far \nassumed that the indicator variable 'Y~ is known. The 'Y~ is unknown and an \nexpected value must be used in the algorithm, which is given by, \n\n(3h _ p(znlMn = Mh, Zn-I)Pnln_1(Mn = MhIZn-I) \n\nn -\n\nP(ZnI Zn-1) \n\n(19) \n\nTwo possible methodologies can be used for choosing the values for 'Y~. In the first \nscheme, \n\n'Y~ = 1 if,B~ > ,B~ for all j 1= h, \n\n(20) \nThis results in 'hard' competition where, only the model with the highest predicted \nprobability undergoes adaptation using the Kalman filter algorithm while all other \nmodels are prevented from adapting. Alternatively, the expected value can be used \nin the algorithm, \n\nand 0 otherwise \n\n(21) \nwhich results in 'soft' competition and all models are allowed to undergo adaptation \nwith appropriate data weighting as outlined in section 3. This scheme is slightly \ndifferent from that presented in [7]. Since the posterior probabilities of each mode \neffectively indicate which mode is dominant at each time n, changes can then be \nused as means of detecting mode transitions. \n\n6 EXPERIMENTAL RESULTS \n\nThe problem chosen for the experiment is learning the inverse robot kinematics used \nin [3]. This is a two link rigid arm manipulator for which, given joint arm angles \n(0 1 , O2 ), the end effector position in cartesian co-ordinates is given by, \n\nL1 COS(Ol) - L2 COS(Ol + O2) \n:l:1 \n:l:2 = L1 sin(Ol) - L2 sin(Ol + O2) \n\n(22) \n\nL1 = 0.8, L2 = 0.2 being the arm lengths. The inverse kinematics learning problem \nrequires the identification of the underlying mapping from (:l:1' :l:2) \n(0 1 , O2 ), \nwhich is bi-modal. Since the algorithm is developed for the identification of dynam(cid:173)\nical systems, the data are generated with the joint angles being excited sinusoidally \nwith differing frequencies within the intervals [0.3,1.2] x ['71\"/2,371\"/2]. The first 1000 \nobservations are used for training and the next 1000 observations are used for test(cid:173)\ning with the adaptation turned off. The models use 28 RBFs chosen with fixed \nparameters, the centres being uniformly placed on a 7 x 4 grid. \n\n-\n\nc> g \n\n, .-'~',\"--'-\"-':\"\", (, :',-1 \n: r .::: \n-\n~: I \nl : : ~: \n0 __ :: \".. \n0 _\"7 I ! : \n' \" \n\n. ' 'I ~~ ., ! ' .. ~\\ :.\" ., :\" \n: .... :: I \nt : \n\n~, . . . : \n\n: : : \" \n\n::, .. : \n\n' : \n\nI. \n\nII \n\nI \n\n\u2022 ;:\"';'d:',~r\":rbv:~b\u00b7::\":~~:-:-:-~W-,~:~,g ::-~~. \" :' \" \" '\\'1 \" ': '\\ . \n\n\" \n\n, I :' \n\n:l ,' .\" \n:, I :,1 '~, I . i ~ I \n'1: ::': \n,. I \n! \n\n'I \nI \n\n~ \n\nI \n\nt \n\n,.' ~ . ' \n\nI \n\nI \n\n' I \" \n\n: . I , I~ \n' : : : i I : : : : : : : : I , \nI : \nI :: \n~;~ I \n~: \n, \n~ \n\n. :,,: I \n::!! \ny. ~ \n\n,'I \n\n'0 \n\n\" \n\n\" \n\nI \n\nI \n\nI \n: \n\n'\n\nI \n\" \n: ,\",: : \n\n.::: I \n:: : \n\" ~ \n\n\" \n\n, \n\n. , \" \n\n: ::~:: \n. :: :: I \n~ :: \ni \n~ ~ \n\n0\n\nI \n. \n\n_ e \n\n1 0 . &\n0__ \n::=~I I I : ~ \n\nt.U\\JJW~ :,' :'llJr: , .l/v :. \n\n, ,1.111.;, l,!111. ::, tU \\\",: ,i \n\\ _ 'J \n\n300 \n\n_00 \n\n:zoo \n\n.., co \n\n\u00b0 0 \n\n\u2022 \n\n, \n\n: .~ \n\n::' L ~ \n\n: : : : ' \n\n: : ; \n\n:,:, ,:, \n\nI \u2022 \u2022 t \n\n. . 00 \n\"TI......,_ \n\n- \\ \n\n! \n\" \n: \n~: \n,,,, \n\" \n' \n\nI. I ' . \" \n\n: VI f;' :: \n,,' \n\nI ' .\n., I' \n\nI... \n: \n. : . \nI. I. \n\n' \n\nFigure 1: Learning inverse kinematics (,hard' competition): Model probabilities. \n\nFigure 1 shows the model probabilities during training and shows the switching \ntaking place between the two modes. \n\n\f244 \n\n0.9 \n\n0.8 \n\n0.7 \n\n0.6 \n\n\\l 0.5 \n\n0.4 \n\n0.3 \n\n02 \n\n0. ' \n\nV. KADIRKAMANATHAN. M. KADIRKAMANATHAN \n\nModtl1 Teat Ca. errora in the EncI.n.ctcr Poei6oo S~ \n\nModtl 2 Teat Data enoN In ttw End .n.ckH' PoeItion Space \n\n0.9 \n\n0.8 \n\n0.7 \n\n0.6 \n\n\\l OS \n\n0.4 \n\n0.3 \n\n02 \n\n0.' \n\n~~~0.'--~02~0~.3~0~.' ~0.~5~0.6~~OJ~0~.8~0~.9~ \n\n~~~0.~'~02~0~.3~0~.4~0.~5~0.6~~0.7~0~.8~0~.9 ~ \n\nFigure 2: End effector position errors (test data) ('hard' competition) : (a) Model \n1 prediction (b) Model 2 prediction. \n\n\" \n\n\" \n\nFigure 2 show the end effector position errors on the test data by both models 1 and \n2 separately under the 'hard' competition scheme. The figure indicates the errors \nachieved by the best model used in the prediction - both models predicting in the \ncentre of the input space where the function is multi-modal. This demonstrates \nthe successful operation of the algorithm in the two RBF networks capturing some \nelements of the two underlying modes of the relationship. The best results on this \nlearning task are: The RMSE on test data for this problem by the Mixture Density \n\nTable 1: Learning Inverse Kinematics: Results \n\nHard Competition Soft Competition \n\nRMSE (Train) \nRMSE (Test) \n\n0.0213 \n0.0084 \n\n0.0442 \n0.0212 \n\nNetwork is 0.0053 and by a single network is 0.0578 [3]. Note however that the \nalgorithm here did not use state information and used only the time dependency. \n\n7 PARAMETRISED GATING NETWORKS \n\nThe model parameters were determined explicitly based on the time information in \nthe dynamical system . If the gating model probabilities are expressed as a function \nof the states, similar to [6], \n\np(Mhlxn, Zn-l) = exp{ahT g} / L exp{ahT g} = a~ \n\nH \n\n(23) \n\nh=l \n\nwhere a h are the gating network parameters. Note that the gating network shares \nthe same basis functions as the expert models. \nThis extension to the gating networks does not affect the model parameter estima(cid:173)\ntion procedure . The likelihood in (7) decomposes into a part for model parameter \nestimation involving output prediction error and a part for gating parameter esti(cid:173)\nmation involving the indicator variable Tn . The second part can be approximated \nto a Gaussian of the form, \n\nP(Tnlxn,a ,Zn-d ~ (21r)-~RgO ~ exp -\"2 Rgo \n\nh \n\n1 h-l. {1 h- 1 h \n\nh 2} \nI'Yn - ani \n\n(24) \n\n\fRecursive Estimation of Dynamic Modular RBF Networks \n\n245 \n\nThis approximation allows the extended Kalman filter algorithm to be used for \ngating network parameter estimation. The model selection equations of section 4 \ncan be applied without any modification with the new gating probabilities. The \nchoice of the indicator variable 'Y~ can be based as before, resulting in either hard \nor soft competition. The necessary expressions in (21) are obtained through the \nKalman filter estimates and the evidence values, for both the model and gating \nparameters. Note that this is different from the estimates used in [6] in the sense \nthat, marginalisation over the model and gating parameters have been done here. \n\n8 CONCLUSIONS \n\nRecursive estimation algorithms for dynamic modular RBF networks have been de(cid:173)\nveloped . The models are based on Gaussian RBF networks and the gating is simply \na time-varying scalar. The resulting algorithm uses Kalman filter estimation for \nthe model parameters and the multiple model algorithm for the gating probability. \nBoth, (hard' and (soft' competition based estimation schemes are developed where \nin the former, the most probable network is adapted and in the latter all networks \nare adapted by appropriate weighting of the data. Experimental results are given \nthat demonstrate the capture of the switching in the dynamical system by the mod(cid:173)\nular RBF networks. Extending the method to include the gating probability to be \na function of the state are then outlined briefly. Work is currently in progress to \nexperimentally demonstrate the operation of this extension. \n\nReferences \n\n[1] Bar-Shalom, Y. and Fortmann, T. E. Tracking and data association, Academic \n\nPress, New York, 1988. \n\n[2] Bengio, Y. and Frasconi, P. \n\n\"An input output HMM architecture\", \n\nIn \n\nG . Tesauro, D. S. Touretzky and T . K. Leen (eds.) Advances in Neural In(cid:173)\nformation Processing Systems 7, Morgan Kaufmann, CA: San Mateo, 1995. \n\n[3] Bishop, C. M. \"Mixture density networks\", Report NCRG/4288, Computer \n\nScience Dept., Aston University, UK, 1994. \n\n[4] Cacciatore, C. W. and Nowlan, S. J. \"Mixtures of controllers for jump linear \n\nand nonlinear plants\", In J. Cowan, G. Tesauro, and J. Alspector (eds.) Ad(cid:173)\nvances in Neural Information Processing Systems 6, Morgan Kaufmann, CA: \nSan Mateo, 1994. \n\n[5] Jacobs, R. A., Jordan, M. I., Nowlan, S. J . and Hinton, G. E. \"Adaptive \n\nmixtures of local experts\", Neural Computation, 9: 79-87, 1991. \n\n[6] Jordan, M. I. and Jacobs, R. A. \"Hierarchical mixtures of experts and the EM \n\nalgorithm\" , Neural Computation, 6: 181-214, 1994. \n\n[7] Kadirkamanathan, V. \"Recursive nonlinear identification using multiple model \nalgorithm\", In Proceedings of the IEEE Workshop on Neural Networks for \nSignal Processing V, 171-180, 1995. \n\n[8] Kadirkamanathan, V. \n\n((A statistical inference based growth criterion for the \nRBF network\", In Proceedings of the IEEE Workshop on Neural Networks for \nSignal Processing IV, 12-21, 1994. \n\n[9] MacKay, D. J. C. \"Bayesian interpolation\", Neural Computation, 4: 415-447, \n\n1992. \n\n\f", "award": [], "sourceid": 1122, "authors": [{"given_name": "Visakan", "family_name": "Kadirkamanathan", "institution": null}, {"given_name": "Maha", "family_name": "Kadirkamanathan", "institution": null}]}