{"title": "Mixtures of Controllers for Jump Linear and Non-Linear Plants", "book": "Advances in Neural Information Processing Systems", "page_first": 719, "page_last": 726, "abstract": null, "full_text": "Mixtures of Controllers for \n\nJump Linear and Non-linear Plants \n\nTimothy W. Cacciatore \nDepartment of Neurosciences \n\nUniversity of California at San Diego \n\nLa Jolla, CA 92093 \n\nSteven J. Nowlan \n\nSynaptics, Inc. \n\n2698 Orchard Parkway \n\nSan Jose, CA 95134 \n\nAbstract \n\nWe describe an extension to the Mixture of Experts architecture for \nmodelling and controlling dynamical systems which exhibit multi(cid:173)\nple modes of behavior. This extension is based on a Markov process \nmodel, and suggests a recurrent network for gating a set of linear \nor non-linear controllers. The new architecture is demonstrated to \nbe capable of learning effective control strategies for jump linear \nand non-linear plants with multiple modes of behavior. \n\n1 \n\nIntroduction \n\nMany stationary dynamic systems exhibit significantly different behaviors under \ndifferent operating conditions. To control such complex systems it is computation(cid:173)\nally more efficient to decompose the problem into smaller subtasks, with different \ncontrol strategies for different operating points. When detailed information about \nthe plant is available, gain scheduling has proven a successful method for designing a \nglobal control (Shamma and Athans, 1992). The system is partitioned by choosing \nseveral operating points and a linear model for each operating point. A controller \nis designed for each linear model and a method for interpolating or 'scheduling' the \ngains of the controllers is chosen. \n\nThe control problem becomes even more challenging when the system to be con(cid:173)\ntrolled is non-stationary, and the mode of the system is not explicitly observable. \nOne important, and well studied, class of non-stationary systems are jump linear \nsystems of the form: ~~ = A(i)x + B(i)u. where x represents the system state, \n\n719 \n\n\f720 \n\nCacciatore and Nowlan \n\nu the input, and i, the stochastic parameter that determines the mode of the sys(cid:173)\ntem, is not explicitly observable. To control such a system, one must estimate the \nmode of the system from the input-output behavior of the plant and then choose \nan appropriate control strategy. \n\nFor many complex plants, an appropriate decomposition is not known a priori. One \napproach is to learn the decomposition and the piecewise solutions in parallel. The \nMixture of Experts architecture (Nowlan 1990, Jacobs et a11991) was proposed as \none approach to simultaneously learning a task decomposition and the piecewise \nsolutions in a neural network context. This architecture has been applied to con(cid:173)\ntrol simple stationary plants, when the operating mode of the plant was explicitly \navailable as an input to the gating network (Jacobs and Jordan 1991). \n\nThere is a problem with extending this architecture to deal with non-stationary \nsystems such as jump linear systems. The original formulation of this architecture \nwas based on an assumption of statistical independence oftraining pairs appropriate \nfor classification tasks. However, this assumption is inappropriate for modelling the \ncausal dependencies in control tasks. We derive an extension to the original Mixture \nof Experts architecture which we call the Mixture of Controllers. This extension \nis based on an nth order Markov model and can be implemented to control non(cid:173)\nstationary plants. The new derivation suggests the importance of using recurrence \nin the gating network, which then learns t.o estimate the conditional state occupancy \nfor sequences of outputs. The power of the architecture is illustrated by learning \ncontrol and switching strategies for simple jump linear and non-stationary non(cid:173)\nlinear plants. The modified recurrent architecture is capable of learning both the \ncontrol and switching for these plants. while a non-recurrent architecture fails to \nlearn an adequate control. \n\n2 Mixtures of Controllers \n\nThe architecture of the system is shown in figure 1. Xt denotes the vector of inputs \nto the controller at time t and Yt is the corresponding overall control output. The \narchitecture is identical to the Mixture of Experts architecture, except that the \ngating network has become recurrent, receiving its outputs from the previous time \nstep as part of its input. The underlying statistical model, and corresponding train(cid:173)\ning procedure for the Mixture of Controllers, is quite different from that originally \nproposed for the Mixture of Experts. \n\nWe assume that the system we are interested in controlling has N different modes \nor states l and we will have a distinct control l\\\u00b7h for each mode. In general we are \ninterested in the likelihood of producing a sequence of control outputs Yl, ... , YT \ngiven a sequence of inputs Xl, ... , XT. This likelihood can be computed as: \n\nI1L.:P(YtI St = k,Xt)P(St = kIYl .. \u00b7Yt-I,Xl\u00b7\u00b7 .xd \n\nk \n\n(1) \n\nIThis is an idealization and if N is unknown it is safest to overestimate it. \n\n\fMixtures of Controllers for Jump Linear and Non-Linear Plants \n\n721 \n\nYt \n\n1 \nYt 2 \n\n1---i-iY t 3 \nL-J-H--\"1 Yt \n\nFigure 1: The Mixture of Controllers architecture. MI, M2 and M3 \nare feedforward networks implementing controls appropriate for different \nmodes of the system to be controlled. The gating network (Sel.) \nis \nrecurrent and uses a softmax non-linearity to compute the weight to \nbe assigned to each of the control out.puts. The weighted sum of the \ncontrols is then used as the overall control for the plant. \n\nwhere bf represents the probability of producing the desired control Yt given the \ninput Xt and that the system is in state k. If represents the conditional probability \nof being in state k given the sequence of inputs and outputs seen so far. In order \nto make the problem tractable, we assume that this conditional probability is com(cid:173)\npletely determined by the current input to the system and the previous state of the \nsystem: \n\nI: = fW'Y(Xt, {it-I})' \n\nThus we are assuming that our control can be approximated by a Markov process, \nand since we are assuming that the mode of the system is not explicitly available, this \nbecomes a hidden Markov model. This Markov assumption leads to the particular \nrecurrent gating architecture used in the Mixture of Controllers. \nIf we make the same gaussian assumptions used in the original Mixture of Experts \nmodel, we can define a gradient descent procedure for maximizing the log of the \nlikelihood given in Equation 1. Assume \n\nand define f3f = P(YT,\"\" Yt\\Sk, XT,\u00b7\u00b7\u00b7, Xt), Lt = Lk f3f,f and \n\nb~ = \n\n1 \n\ne-(Yt-y~)2/2(72 \n\ny'2iu \n\nj3k k \n\nR:=~. Lt \n\nThen the derivative of the likelihood with respect to the output of one of the con(cid:173)\ntrollers becomes: \n\nk) \nologL \na k = l\\ Rt Yt - Yt \n. \nYt \n\nk( \n\nr \n\n(2) \n\n\f722 \n\nCacciatore and Nowlan \n\nThe derivative of the likelihood with respect to a weight in one of the control net(cid:173)\nworks is computed by accumulating partial derivatives over the sequence of control \noutputs: \n\nFor the gating network, we once again use a softmax non-linearity so: \n\nk \nIt = \n\nk \nexp gt \n.. \nLj eXP9~ \n\nThen \n\na log L _ '\"\"'(Rk _ \na k \n9t \n\n- ~ t \n\nt \n\nk) k \n\nI't It-I' \n\n(3) \n\n(4) \n\nThe derivatives for the weights in the gating network are again computed by accu(cid:173)\nmulating partial derivatives over output sequences: \n\n(5) \n\nEquations (2) and (4) turn out to be quite similar to those derived for the original \nMixture of Experts architecture. The primary difference is the appearance of (3; \nrather than bf in the expression for R:. The appearance of /3 is a direct. result of \nthe recurrence introduced into the gating network. {3 can be computed as part of \na modified back propagation through time algorithm for the gating network using \nthe recurrence: \n\n(6) \n\n/3: = b: + L W kjf3f+l \n\nj \n\nwhere \n\nO;f+l \nWkj = olf \n\nEquation (6) is the analog of the backward pass in the forward-backward algorithm \nfor standard hidden Markov models. \n\nIn the simulations reported in the next section, we used an online gradient descent \nprocedure which employs an approximation for (3f which uses only one step of back \npropagation through time. This approximation did not appear to significantly affect \nthe final performance of the recurrent architecture. \n\n3 Results \n\nThe performances of the recurrent Mixture of Controllers and non-recurrent Mixture \nof Experts were compared on three control tasks: a first order jump linear system, \na second order jump linear system, and a tracking task that required two non(cid:173)\nlinear controllers. The object of the first two jump-linear tasks was to control a \nplant which switched randomly between two linear systems. The resulting overa.ll \nsystems were highly non-linear. In both the first. and second order cases it was \n\n\fMixtures of Controllers for Jump Linear and Non-Linear Plants \n\n723 \n\nArst Order Model Traming Error \n\nArst Order Model Trajectory \n\n20c00 \n\n'5000 \n\nb \nt \nW \n\n.OCOO \n\n5000 \n\n0000 \n\n.0000 \n\nN~I'OJmnl \n\n200 \n\n150 \n\nj \n<'3 \n\n, \n\n00 \n\n\u00b7SO \n\nRlClJ1Tlnl \n\nE_ \n\n20c00 \n\n3OJOO \n\n40000 \n\nS. \n\n~ \n\nactual . . \n\ndesired \n\ni \ni \ntarget! \ni \n\n~ .2t) \n\n2D \n~ \nx position \n\n~ \n\n~ ~ rn \n\nb) \n\n.-\n\n.\" ... -.... _-\n\n09 :' , \n08 \u2022 \nI \n0' , \n\ncorrect \n\n03 \n\n02f\\ \nO\\r ~ \n\u00b00 \n51! \n\nincarect \n\n2Slt \n\n:a \n\n3SO \n\ntOO \n\n.00 \n\n1 !I) \n\n200 \ntime \n\nFigure 5: (a) Actual and desired trajectories of ship under control of \nMixture of Controllers while attempting to intercept target. (b) Gating \nunit activities as a function of time for trajectory in (a). Note that these \nare much less noisy than the activities seen in figure 4(b). \n\nmay require the development of faster converging update algorithms, perhaps based \non the generalized EM (GEM) family of algorithms, or a variant of the iterative \nreweighted least squares procedure proposed by Jordan and Jacobs (1993) for hier(cid:173)\narchies of expert networks. Additional work is also required to establish the stability \nand convergence rate of the algorithm for use in adaptive control applications. \n\nReferences \n\nJacobs, R.A. and Jordan, M.I. A competitive modular connectionist architecture. \nNeural Information Processing Systems 3 (1991). \n\nJacobs, R.A., Jordan, M.I ., Nowlan, S.J. and Hinton, G .E. Adaptive Mixtures of \nLocal Experts. Neural Computation, 3, 79-87, (1991). \n\nJordan, M.I. and Jacobs, R.A. Hierarchical Mixtures of Experts and the EM algo(cid:173)\nrithm. Neural Computation, (1994). \n\nMiller, W.T., Sutton, R.S. and Werbos, P.J. Neural Networks for Control, MIT \nPress (1993). \n\nNowlan, S.J. Competing Experts: An Experimental Investigation of Associative \nMixture Models. Technical Report CRG- TR-90-5, Department of Computer Sci(cid:173)\nence, University of Toronto (1990). \n\nShamma, J.S., and Athans, M. Gain scheduling: potential hazards and possible \nremedies. IEEE Control Systems Magazine, 12:(3), 101-107 (1992). \n\n\f", "award": [], "sourceid": 750, "authors": [{"given_name": "Timothy", "family_name": "Cacciatore", "institution": null}, {"given_name": "Steven", "family_name": "Nowlan", "institution": null}]}