{"title": "Mixtures of Controllers for Jump Linear and Non-Linear Plants", "book": "Advances in Neural Information Processing Systems", "page_first": 719, "page_last": 726, "abstract": null, "full_text": "Mixtures of Controllers for \n\nJump  Linear and Non-linear Plants \n\nTimothy W.  Cacciatore \nDepartment of Neurosciences \n\nUniversity of California at San Diego \n\nLa Jolla, CA  92093 \n\nSteven J.  Nowlan \n\nSynaptics,  Inc. \n\n2698  Orchard  Parkway \n\nSan  Jose,  CA  95134 \n\nAbstract \n\nWe describe an extension to the Mixture of Experts architecture for \nmodelling and  controlling dynamical systems which  exhibit  multi(cid:173)\nple modes of behavior.  This extension is based on a Markov process \nmodel,  and suggests  a  recurrent  network for  gating a  set  of linear \nor  non-linear controllers.  The new  architecture  is  demonstrated  to \nbe  capable  of learning  effective  control  strategies  for  jump  linear \nand  non-linear plants with multiple modes of behavior. \n\n1 \n\nIntroduction \n\nMany  stationary  dynamic  systems  exhibit  significantly  different  behaviors  under \ndifferent  operating conditions.  To  control such  complex systems it is  computation(cid:173)\nally  more  efficient  to  decompose  the  problem  into smaller subtasks,  with different \ncontrol  strategies  for  different  operating points.  When  detailed  information about \nthe plant is available, gain scheduling has proven a successful method for  designing a \nglobal control (Shamma and Athans,  1992).  The system  is  partitioned by  choosing \nseveral  operating  points  and  a  linear model for  each  operating point.  A  controller \nis  designed for  each linear model and a  method for  interpolating or  'scheduling' the \ngains of the controllers is  chosen. \n\nThe  control  problem  becomes  even  more  challenging  when  the  system  to  be  con(cid:173)\ntrolled  is  non-stationary,  and  the  mode  of the  system  is  not  explicitly  observable. \nOne  important,  and  well  studied,  class  of non-stationary  systems  are  jump linear \nsystems  of the  form:  ~~  =  A(i)x + B(i)u.  where  x  represents  the  system  state, \n\n719 \n\n\f720 \n\nCacciatore and Nowlan \n\nu  the input,  and  i,  the stochastic parameter that  determines  the  mode of the sys(cid:173)\ntem,  is  not explicitly  observable.  To control  such  a system, one  must estimate  the \nmode of the  system  from  the  input-output  behavior  of the  plant  and  then  choose \nan  appropriate control strategy. \n\nFor many complex plants, an appropriate decomposition is  not known  a priori.  One \napproach is  to learn the decomposition and  the  piecewise solutions in  parallel.  The \nMixture of Experts  architecture  (Nowlan  1990,  Jacobs  et  a11991)  was  proposed  as \none  approach  to  simultaneously  learning  a  task  decomposition  and  the  piecewise \nsolutions in  a  neural  network  context.  This  architecture  has  been  applied  to  con(cid:173)\ntrol simple stationary plants,  when  the  operating mode of the  plant  was  explicitly \navailable as  an input to the gating network  (Jacobs  and Jordan  1991). \n\nThere  is  a  problem  with  extending  this  architecture  to  deal  with  non-stationary \nsystems such  as  jump linear systems.  The original formulation of this  architecture \nwas based on an assumption of statistical independence oftraining pairs appropriate \nfor  classification  tasks.  However,  this assumption is  inappropriate for  modelling the \ncausal dependencies in control tasks.  We derive an extension to the original Mixture \nof Experts  architecture  which  we  call  the  Mixture  of Controllers.  This  extension \nis  based  on  an  nth  order  Markov  model  and  can  be  implemented to  control  non(cid:173)\nstationary plants.  The  new  derivation suggests  the importance of using recurrence \nin the gating network, which then learns t.o  estimate the conditional state occupancy \nfor  sequences  of outputs.  The  power  of the  architecture  is  illustrated  by  learning \ncontrol  and  switching  strategies  for  simple  jump  linear  and  non-stationary  non(cid:173)\nlinear  plants.  The  modified  recurrent  architecture  is  capable  of learning  both  the \ncontrol  and  switching  for  these  plants.  while  a  non-recurrent  architecture  fails  to \nlearn  an  adequate control. \n\n2  Mixtures of Controllers \n\nThe architecture of the system is shown in figure  1.  Xt  denotes  the vector  of inputs \nto the  controller  at time t  and Yt  is  the  corresponding  overall  control  output.  The \narchitecture  is  identical  to  the  Mixture  of  Experts  architecture,  except  that  the \ngating network  has  become recurrent,  receiving  its outputs from  the  previous  time \nstep as part of its input.  The underlying statistical model, and corresponding train(cid:173)\ning procedure  for  the  Mixture of Controllers,  is  quite  different  from  that originally \nproposed  for  the  Mixture of Experts. \n\nWe  assume  that  the system we  are  interested  in  controlling has  N  different  modes \nor states l  and we  will have  a  distinct  control  l\\\u00b7h  for  each  mode.  In  general  we  are \ninterested  in  the  likelihood  of  producing  a  sequence  of control  outputs  Yl,  ... , YT \ngiven  a sequence  of inputs  Xl, ... ,  XT.  This likelihood can  be computed  as: \n\nI1L.:P(YtI St  =  k,Xt)P(St =  kIYl  .. \u00b7Yt-I,Xl\u00b7\u00b7 .xd \n\nk \n\n(1) \n\nIThis is  an  idealization  and if N  is  unknown  it is  safest  to overestimate  it. \n\n\fMixtures of Controllers for Jump Linear and Non-Linear Plants \n\n721 \n\nYt \n\n1 \nYt  2 \n\n1---i-iY t  3 \nL-J-H--\"1 Yt \n\nFigure  1:  The  Mixture  of Controllers  architecture.  MI,  M2  and  M3 \nare feedforward  networks implementing controls appropriate for different \nmodes  of  the  system  to  be  controlled.  The  gating  network  (Sel.) \nis \nrecurrent  and  uses  a  softmax  non-linearity  to  compute  the  weight  to \nbe  assigned  to  each  of the  control  out.puts.  The  weighted  sum  of  the \ncontrols is  then  used  as  the  overall  control for  the plant. \n\nwhere  bf  represents  the  probability  of producing  the  desired  control  Yt  given  the \ninput Xt  and that the system is  in state k.  If represents  the conditional probability \nof being in  state  k  given  the sequence  of inputs  and  outputs seen  so  far.  In  order \nto make the problem tractable,  we  assume that this  conditional probability is  com(cid:173)\npletely determined  by the current input to the system  and the previous state of the \nsystem: \n\nI: =  fW'Y(Xt,  {it-I})' \n\nThus we  are  assuming that our control  can  be  approximated by  a  Markov  process, \nand since we  are assuming that the mode of the system is not explicitly available, this \nbecomes  a  hidden  Markov  model.  This  Markov  assumption leads  to  the  particular \nrecurrent  gating architecture  used  in the  Mixture of Controllers. \nIf we  make the same gaussian  assumptions used  in  the  original Mixture of Experts \nmodel,  we  can  define  a  gradient  descent  procedure  for  maximizing the  log  of the \nlikelihood given  in  Equation  1.  Assume \n\nand  define  f3f  = P(YT,\"\"  Yt\\Sk, XT,\u00b7\u00b7\u00b7, Xt),  Lt  = Lk f3f,f  and \n\nb~ = \n\n1 \n\ne-(Yt-y~)2/2(72 \n\ny'2iu \n\nj3k  k \n\nR:=~. Lt \n\nThen  the  derivative of the  likelihood  with  respect  to  the output of one  of the  con(cid:173)\ntrollers becomes: \n\nk) \nologL \na k  = l\\ Rt  Yt  - Yt \n. \nYt \n\nk( \n\nr \n\n(2) \n\n\f722 \n\nCacciatore and Nowlan \n\nThe  derivative  of the likelihood with  respect  to  a  weight  in  one  of the  control  net(cid:173)\nworks is  computed by accumulating partial derivatives over  the sequence  of control \noutputs: \n\nFor  the gating network,  we  once  again use  a softmax non-linearity so: \n\nk \nIt  = \n\nk \nexp gt \n.. \nLj eXP9~ \n\nThen \n\na log L  _  '\"\"'(Rk  _ \na k \n9t \n\n- ~ t \n\nt \n\nk)  k \n\nI't  It-I' \n\n(3) \n\n(4) \n\nThe derivatives for  the weights  in  the gating network  are  again computed by accu(cid:173)\nmulating partial derivatives over  output sequences: \n\n(5) \n\nEquations  (2)  and  (4)  turn out  to be  quite similar to those  derived  for  the  original \nMixture  of Experts  architecture.  The  primary  difference  is  the  appearance  of (3; \nrather  than bf  in  the  expression  for  R:.  The appearance  of /3  is  a  direct.  result  of \nthe  recurrence  introduced  into  the  gating network.  {3  can  be  computed  as  part  of \na  modified back  propagation  through  time  algorithm for  the  gating network  using \nthe recurrence: \n\n(6) \n\n/3:  = b: + L W kjf3f+l \n\nj \n\nwhere \n\nO;f+l \nWkj  =  olf \n\nEquation (6)  is  the analog of the backward pass in the forward-backward  algorithm \nfor  standard  hidden  Markov  models. \n\nIn  the simulations reported  in  the  next section,  we  used  an online gradient descent \nprocedure  which employs an approximation for (3f  which  uses  only one step of back \npropagation through time.  This approximation did not appear to significantly affect \nthe final  performance of the recurrent  architecture. \n\n3  Results \n\nThe performances of the recurrent Mixture of Controllers and non-recurrent Mixture \nof Experts  were  compared on  three control tasks:  a  first  order jump linear system, \na  second  order  jump  linear  system,  and  a  tracking  task  that  required  two  non(cid:173)\nlinear  controllers.  The  object  of the  first  two jump-linear tasks  was  to  control  a \nplant  which  switched  randomly between  two  linear  systems.  The  resulting  overa.ll \nsystems  were  highly  non-linear.  In  both  the  first.  and  second  order  cases  it  was \n\n\fMixtures of Controllers for Jump Linear and Non-Linear Plants \n\n723 \n\nArst Order Model Traming Error \n\nArst Order Model Trajectory \n\n20c00 \n\n'5000 \n\nb \nt \nW \n\n.OCOO \n\n5000 \n\n0000 \n\n.0000 \n\nN~I'OJmnl \n\n200 \n\n150 \n\nj \n<'3 \n\n, \n\n00 \n\n\u00b7SO \n\nRlClJ1Tlnl \n\nE_ \n\n20c00 \n\n3OJOO \n\n40000 \n\nS<X110 \n\n-100 \n\n00 \n\nSOD \n\n1000 \n\n.SOO \n\nTo\"\" \n\nFigure  2:  Left:  Training  convergence  of Mixtures  of  Experts  and  Mix(cid:173)\ntures of Controllers  on first  order jump linear  system.  The vertical  axis \nis  average  squared  error  over  training  sequences  and  horizontal  axis  is \nthe  number  of  training  sequences  seen.  Right:  Sample  test  trajectory \nof first  order jump linear system under control of Mixture of Controllers. \nThe system switches states at  times  50  and  100. \n\ndesired  to drive all plant outputs to zero  (zero-forcing  control).  Neither  the first  or \nsecond  order systems could  be  successfully  controlled by a  single linear  controller. \n\nFor both jump-linear tasks,  the architecture of the MixtUre of Controllers and Mix(cid:173)\nture  of Experts  consisted  of  two  linear  experts,  and  a  one  layer  gating  network. \nThe input to  the experts was  the  plant output  at the previous time step,  while  the \ninput to the gating network  was the ratio of the plant outputs at the two preceding \ntime  steps.  An  ideal  linear  controller  was  designed  for  each  mode  of  the  system. \nTraining targets  were  derived  from  outputs of the  appropriate ideal  controller,  us(cid:173)\ning  the  known  mode  of  the  system  for  the  training  trajectories.  The  parameters \nof the  gating  and  control  networks  were  updated  after  each  pass  through  sample \ntrajectories  which  contained several  state transitions. \n\nThe  recurrent  Mixture  of  Controllers  could  be  trained  to  successfully  control  the \nfirst  order jump linear system  (figure  2),  and once  trained  generalized  successfully \nto novel test trajectories.  The non-recurrent  Mixture of Experts failed  to learn even \nthe  training  data for  the  first  order jump linear system  (note  the  high  asymptote \nfor  the  training  error  without  recurrence  in  figure  2).  The  recurrent  Mixture  of \nControllers  was  also  able  to  learn  to  control  the  second  order jump linear  system \n(figure  3),  however,  it  was  necessary  to  teacher  force  the  system  during  the  first \n5000  epochs  of  training  by  providing  the  true  mode  of  the  system  as  an  extra \ninput  to  the gating network.  This extra input  was  removed  at epoch  5000  and  the \nerror  initially increases  dramatically but  the  system  is  able  to  eventually  learn  to \ncontrol the second order jump linear system autonomously.  Note that the Mixture of \nExperts system is  actually able to learn  a successful  control even  more rapidly than \nthe  Mixture of Controllers  when  the  additional teacher  input is  provided,  however \nlearning again completely fails once  this input is  removed  at epoch  5000  (figure  3). \n\n\f724 \n\nCacciatore and Nowlan \n\nSecond Order Model Training Error \n\nSecond Order Model  Trajectory \n\neoo .----~--~--_, \n\nAecurren I Ccnlroler \n\n'000 \n\n3000 \n\nIdeal t \n\n-oulpUtl \n-\n---- oulpU12 \n-\nldul2 \n\n-\n\n-\n\nTIme \n\nFigure  3:  Left:  Training  convergence  of Mixt.ures  of Experts  and  Mix(cid:173)\ntures of Controllers on second order jump linear system.  Right:  Sample \ntest trajectory of second order jump linear system under control of Mix(cid:173)\ntUre  of Controllers.  The  system  again  switches  states  at  times  50  and \n100. \n\nIn  both first  and  second  order  cases,  the  trained  Mixture  of Controllers  is  able  to \ncontrol the  system in  both modes of system  behavior,  and  to  deted  mode changes \nautomatically.  The difficulty in designing a control for  a jump linear system usually \nlies in identifying the state of the system.  No explicit law describing how  to identify \nand switch  between  control modes is  necessary  to train the Mixture of Controllers, \nas  this  is  learned  automatically as  a  byproduct  of learning  to  successfully  control \nthe system. \n\nPerformance  of  the  Mixture  of Controllers  and  the  Mixture  of  Experts  was  also \ncompared on  a  more complex task requiring a non-linear control law in each mode. \nThe  task  wa:s  to  control  the  trajectory  of a  ship  to track  an  object  traveling in  a \nstraight  line,  or  flee  from  an  object  having  a  random  walk  trajectory  (figure  4). \nThere is  a  high  degree  of task  interference  between  the controls appropriate during \nthe  two  modes of object behaviors.  The ship  dynamics were  t.aken  from Miller  and \nSutton (1990). \nFor  both  the  Mixture  of Controllers  and  the  Mixture  of Experts  two experts  were \nused.  The  experts  received  past  and  present  measurements  of the  object  bearing, \ndistance,  velocity,  and the ship heading and turn rate.  The controllers specified  the \ndesired  turn  rate of the  ship.  A  one  layer  gating network  was  used  which  received \nthe  velocity  of the object  as  input. \n\nTraining targets  were  produced from  ideal  controllers  designed  for  each  object  be(cid:173)\nhavior.  The ideal controller for  the random walk behavior produced a turn rate that \nheaded  directly  away from  the object.  The ideal controller for  intercepting  the ob(cid:173)\nject used future  information about object position to determine the turn rate which \nwould lead  to  the  closest  possible  intercept  point.  Both  ideal  controllers  made use \nof information not  available to the  Mixture of Experts  or Mixture of Controllers. \n\nThe Mixture of Controllers and the Mixture of Experts were  trained on sequences  of \n\n\fMixtures of Controllers for Jump Linear and Non-Linear Plants \n\n725 \n\na) \n\n200 \n\n8  Ilil \n~ \n~ IDD \n\nactual \n\n~/1 target! \n\n;!l \n\n'0 \nX  pOliition \n\n60 \n\n80 \n\nI \n\n111) \n\ni \n100 \n\nb) \n\n:: 'f\\\u00b7/fJ''''r\u00b7'(r~!-{,\u00b7~~N\\''1\u00b7'I\u00b7hf('~/~\"r!\u00b7''.l,;''\u00b7..t,:\u00b7+\u00b7i'i'II'\u00b7~\u00b7y. \no. \n\ncorrect \n\n0' \n\nincorrect \n\ntime \n\nFigure  4:  (a)  Actual  and  desired  trajectories  of  ship  under  control  of \nMixture  of  Experts  while  attempting  to  intercept  target.  (b)  Gating \nunit  activities  as  a  function  of time for  trajectory in  (a). \n\ntrajectories  where  the object  changed  behaviors multiple times.  The weights  of the \nnetworks  were  updated  after  each  pass  through  the  trajectories.  The  input  to  the \ngating net in this ta.sk  provided more inst antaneous information about the mode of \nobject  behavior than  was  provided  in  the jump linear tasks.  As  a  result ,  the  non(cid:173)\nrecurrent  Mixture  of Experts  was  able  to achieve  a  minimum level  of performance \non  the overall task.  The recurrent  Mixture of Controllers performed much better . \n\nThe differences  between two architectures  are revealed by examining the gating net(cid:173)\nwork  outputs .  Without  recurrence ,  the  Mixture  of Experts  gating  network  could \nnot  determine  the  state  of the  object  with  certainty,  and  compromised  by  select(cid:173)\ning  a  combination of the  correct  and  incorrect  control  (figure  4b) .  Since  the  two \ncontrols  are  incompatible, this uncertainty  degrades  the performance  of the overall \ncontroller.  With recurrence in the ga.ting network,  the Mixture of Controllers is able \nto determine the target state with greater certainty by integrating information from \nmany  observations  of object  behavior.  The sharper  decisions  about  which  control \nto  use  greatly  improve tracking performance (figure  5). \n\nWe  explored  the  ability  of  the  Mixture  of  Controllers  to  learn  the  dynamics  of \nswitching by training on trajectories  where  the object switched  behavior with vary(cid:173)\ning  frequency.  The  gating  network  trained  on  an  object  that  switched  behaviors \ninfrequently was sluggish to respond to transitions, but more noise tolerant than the \ngating network  trained on  a  frequently  switching object.  Thus, the gating network \nis  able to incorporate the frequency  of transition into its state model. \n\n4  Discussion \n\n\\Ve have described an extension to the Mixture of Experts architecture for modelling \nand  controlling dynamical systems  which  exhibit  multiple modes of behavior.  The \nalgorithm we  have  presented for  updating the  parameters of the model is  a  simple \ngradient  descent  procedure.  Application  of the  technique  to  large  scale  problems \n\n\f726 \n\nCacciatore and Nowlan \n\na) \n\n250 \n\nalii \n\n]  uo \n\n1'00 \n\n:>. \n\n~ \n\nactual . . \n\ndesired \n\ni \ni \ntarget! \ni \n\n~  .2t) \n\n2D \n~ \nx position \n\n~ \n\n~  ~  rn \n\nb) \n\n.-\n\n.\" ... -.... _-\n\n09  :' , \n08  \u2022 \nI \n0' , \n\ncorrect \n\n03 \n\n02f\\ \nO\\r ~ \n\u00b00 \n51! \n\nincarect \n\n2Slt \n\n:a \n\n3SO \n\ntOO \n\n.00 \n\n1 !I) \n\n200 \ntime \n\nFigure  5:  (a)  Actual  and  desired  trajectories  of ship  under  control  of \nMixture  of Controllers  while  attempting  to intercept  target.  (b)  Gating \nunit activities  as a function  of time for  trajectory in  (a).  Note that these \nare  much  less  noisy  than  the activities  seen  in  figure  4(b). \n\nmay require  the development of faster converging update algorithms, perhaps based \non  the  generalized  EM  (GEM)  family  of algorithms,  or  a  variant  of the  iterative \nreweighted  least squares procedure proposed  by  Jordan and Jacobs  (1993) for  hier(cid:173)\narchies of expert networks.  Additional work is also required to establish the stability \nand  convergence  rate of the  algorithm for  use  in  adaptive  control applications. \n\nReferences \n\nJacobs,  R.A.  and  Jordan,  M.I.  A  competitive  modular connectionist  architecture. \nNeural Information  Processing  Systems 3  (1991). \n\nJacobs,  R.A.,  Jordan,  M.I .,  Nowlan,  S.J.  and  Hinton,  G .E.  Adaptive  Mixtures  of \nLocal  Experts.  Neural  Computation,  3,  79-87,  (1991). \n\nJordan,  M.I.  and  Jacobs,  R.A.  Hierarchical  Mixtures of Experts  and  the  EM  algo(cid:173)\nrithm.  Neural  Computation,  (1994). \n\nMiller,  W.T.,  Sutton,  R.S.  and  Werbos,  P.J.  Neural  Networks  for  Control,  MIT \nPress  (1993). \n\nNowlan,  S.J.  Competing  Experts:  An  Experimental  Investigation  of  Associative \nMixture  Models.  Technical  Report  CRG- TR-90-5,  Department  of Computer  Sci(cid:173)\nence,  University  of Toronto  (1990). \n\nShamma,  J.S.,  and  Athans,  M.  Gain  scheduling:  potential  hazards  and  possible \nremedies.  IEEE  Control Systems  Magazine,  12:(3),  101-107  (1992). \n\n\f", "award": [], "sourceid": 750, "authors": [{"given_name": "Timothy", "family_name": "Cacciatore", "institution": null}, {"given_name": "Steven", "family_name": "Nowlan", "institution": null}]}