{"title": "Neural Network On-Line Learning Control of Spacecraft Smart Structures", "book": "Advances in Neural Information Processing Systems", "page_first": 303, "page_last": 310, "abstract": null, "full_text": "Neural Network On-Line Learning Control \n\nof Spacecraft Smart Structures \n\nDr.  Christopher  Bowman \nBall Aerospace Systems Group \n\nP.O.  Box  1062 \n\nBoulder. CO  80306 \n\nAbstract \n\nThe overall goal is to reduce spacecraft weight. volume, and cost by on(cid:173)\nline adaptive non-linear control of flexible structural components. The \nobjective of this effort is to develop an  adaptive Neural Network (NN) \ncontroller for the Ball C-Side 1m x 3m antenna with embedded actuators \nand the RAMS  sensor system.  A traditional  optimal controller for  the \nmajor modes  is provided perturbations  by  the  NN  to compensate  for \nunknown residual modes. On-line training of recurrent and feed-forward \nNN  architectures  have  achieved  adaptive  vibration  control  with \nunknown  modal  variations and  noisy  measurements.  On-line training \nfeedback to each actuator NN output is computed via Newton's method \nto reduce the difference between desired and achieved antenna positions. \n\n1  ADAPTIVE  CONTROL  BACKGROUND \nThe two traditional approaches to adaptive control are 1) direct control (such as perfonned \nin direct model reference adaptive controllers) and 2) indirect control (such as performed by \nexplicit self-tuning regulators).  Direct control techniques (e.g. model-reference adaptive \ncootrul) provide good stability however are susceptible to noise. Whereas indirect control \ntechn;'q~es (e.g.  explicit self-tuning  regulators)  have low  noise susceptibility  and good \nconvergence rate.  However they require more control effort and have worse stability and \nare less roblistto mismodeling. NNs synergistically augment traditional adaptive control \ntechniques  by  providing  improved mismodeling  robustness both adaptively on-line for \ntime-varying dynamics as well as in a learned control mode at a slower rate. \n\nThe NN control approaches which correspond to direct and indirect adaptive control are \ncommonly known as inverse and forward modeling. respectively. More specifically, aNN \nwhich maps the plant state and its desired perfonnance to the control command is called \nan inverse model, a NN mapping both the current plant state and control to the next state \nand its performance is called the forward model. \n\nWhen given a desired performwce and the current state. the inverse model generates the \ncontrol. see Figure 1. The actual perfonnance is observed and is used to  train/update the \ninverse model. A significant problem occurs when the desired and achieved perfonnance \ndiffer greatly  since the  model  near  the desired  slate  is  not changed.  This condition  is \ncorrected by  adding random noise to  the control outputs  so as to  extend the state  space \n\n303 \n\n\f304 \n\nBowman \n\nbeing  explored.  However,  this  correction  has  the  effect  of slowing  the  learning  and \nreducing broadband stability. \n\neasurements \n\n, .. ---=.,:-:-=.....n..:=; __ ~-----' \n\nTrainin \n\nn;o:uu;u.;k \n\n\"\" \n\n,.\" \n\nlnV_~;\"\"1 \nNel!Pll'Controller  ~-~ Structures \n... \" \n\nCon \n\n1 Nonlinear r~ s .... \"\"' 1 L.!!.:.~-- ents \n\nx \n\nFilters \n\nY \n\nPrevious controls and stale measurements \n\nFigure 1: Direct Adaptive Control Using Inverse Modeling Neural Network Controller \n\nTrainin \nFeedback \n\n\" \n\" \n\" \" \n\nements \ny \n\nII~F~=fI-..1 Current and \n\nProvisions state \n\n\"\" \n\nI':\"\" -lnv-I-~-'M\"'I\"\"ode-I\" Control \nNet.Tl!JtControlier \n\nu \n\nI' \n\npr;.::ev.;.;l;.;;,O.::.;Us;..;c;.;o;;.;n.;;.tro~ ___  .....;;meas=;.;;urements \n\n1 N _  ~i -I ~M=eas=ur=e=m=en=ts=~ \n\nStructures \n\nFillers \n\nx \n\ny \n\nPrevious controls and stale measurements \n\nFigure 2:  Dual (Indirect and Direct) Adaptive Control Using Forward Modeling Neural \n\nNetwork State Predictor To Aid Inverse Model Convergence \n\nFor forward modeling the map from  the current control and state to the resulting state and \nperformance is  learned, see Figure 2. For cases where the performance is evaluated at a \nfuture  time (i.e.  distal in time), a predictive critic  [Barto and Sutton,  1989] NN model is \nlearned.  In  both cases the  Jacobian of this performance can be computed  to iteratively \ngenerate the next control action.  However, this  differentiating of the critic NN for back(cid:173)\npropagation training of the controller network is very  slow and in some cases steers the \nsearching the wrong direction due to initial erroneous forward model estimates. As the NN \nadapts  itself the performance flattens  which  results in  the slow  halting of learning at an \n\n\fNeural Network On-Line Learning Control of Spacecraft Smart Structures \n\n305 \n\nunacceptable solution. Adding  noise to  the controller's output [Jordan and Jacobs,  1990] \nbreaks  the  redundancy  but forces  the  critic  to  predict  the  effects of future  noise.  This \nproblem has been solved by using a separately trained intermediate plant model to predict \nthe next state from  the prior state and control while having an independent predictor model \ngenerate the performance evaluation from the plant model predicted state [Werbos, 1990] \nand  [Brody,  1991].  The  result  is  a  50-100  fold  learning  speed  improvement  over \nreinforcement training of the forward model controller NN. \nHowever, this  method  still  relies on  a \"good\" forward  model  to  incrementally  train  the \ninverse  model.  These  incremental  changes  can  still  lead  to  undesirable  solutions.  For \ncontrol systems which  follow  the stage  1,2 or 3 models given  in  [Narendra,  1991)  the \ncontrol can be analytically computed from  a forward-only model. For the most general, \nnon-linear (stage 4) systems, an alternative is the memory-based forward model  [Moore, \n1992]. Using only a forward NN model, a direct hill-climbing or Newton's method search \nof candidate actions can be applied until a control decision is reached. The resulting state \nand its performance are used for  on-line training of the forward  model. Judicial random \ncontrol actions are applied  to  improve behavior only where  the forward  model error is \npredicted to be large (e.g. via cross-validation). Also using robust regression, experiences \ncan be deweighted according to their quality and their age. The high computational burden \nof  these  cross-validation  techniques  can  be  reduced  by  parallel  on-line  processing \nproviding the \"policy\" parameters for fast on-line NN control. \nFor control problems which are distal in  time and space, a hybrid of these  two forward(cid:173)\nmodeling approaches can be used. Namely, a NN plant model is added which is  trained \noff-line  in  real-time and updated as  necessary at a slower rate than  the on-line forward \nmodel  which predicts performance based upon the current plant model. This slower rate \ntrained forward-model NN supports learned control (e.g. via numerical inversion) whereas \nthe on-line forward model provides the faster response adaptive control. Other NN control \ntechniques  such  as  using  a  Hopfield  net  to  solve  the  optimal-control  quadratic(cid:173)\nprogramming  problem  or  the  supervised  training  of ART  II  off-line  with  adaptive \nvigilance  for  on-line  pole  placement  have  been  proposed.  However,  their  on-line \nrobustness appears limited due to their sensitivity to a priori parameter assumptions. \nA forward  model  NN  which augments a traditional controller for unmodeled modes and \nunforeseen situations is presented in the following  section. Performance results for both \nfeed-faward and current learning versions are compared in Section 3. \n\n2  RESIDUAL  FORWARD  MODEL  NEURAL  NETWORK \n\n(RFM-NN)  CONTROLLER \n\nA type of forward model NN which acts as a residual mode mter to support a reduced-order \nmodel (ROM)  traditional optimal  state controller has  been evaluated. see Figure 3.  The \nROM determines the control based upon its model coordinate approximate representation \nof the structure. Model coordinates are obtained by a transformation using known primary \nvibration  modes,  [Young,  1990].  The  transformation  operator is  a  set of eigenvectors \n(mode shapes) generated by finite element modeling. The ROM controller is traditionally \naugmented by a residual-mode mter (RMF). Ball's RFM-NN Ball's RFM-NN replaces the \nRMF in order to better capture the mismodeled. unmodeled and changing modes. \nThe objective of the RFM-NN is to provide ROM controller with ROM derivative state \nperturbations,  so  that the ROM controls the  structure as desired by  the  user.  The RFM(cid:173)\nNN  is trained on-line using scored  supervised feedback  to  generate these desired ROM \nstate perturbations. The scored supervised training provides a score for each perturbation \noutput based upon the measured position of the structure. The measured deviations, Y*(t), \nfrom  the desired  structure position  are converted  to  errors in  the  estimated ROM state \nusing  the  ROM. transformation.  Specifically,  the  training  score,  S(t),  for  each  ROM \nderivative state  XN (t) is expressed in the following discrete equation: \n\nS(t) = BN Y * (t) - xN(t) \n\n.. \n\nwhere *N(t) = [AN + BNGN  - KNCN]XN(t -1) + KN Y(t -1) \n\n\f306 \n\nBowman \n\n;\"  ,\"  .. :,::..-:.;\"  .. :: \n\n. \n\n\" \n\n, . . ;  .} :::  ':  .. : \", \n\nFigure 3:  Residual Forward Model Neural Network Adaptive Controller Replaces \n\nTraditional Residual Mode Filter \n\nNewton's method is then applied to find lbe 0*  (1) ROM state ~ations which zero \nthe score. First, the score is  smoothed,  Set) = ~S(t -1) + (1- o)S(t)  and the neural \nnetwork  output  is  smoothed  similarly.  Second,  Newton's  method  computes  the \nadjusbnents needed to zero the scores, \n\n~(O*N(t\u00bb = -S(t)(8iN(t) - 8iN(t -1\u00bb  I  [S(t) - Set -1)] \n\n= -EXN(t)  (if either difference = 0) \n\nThird,  the  NN  is  trained,  8*T(t + 1) = ~(8iN(t\u00bb + 8iN(t)  with  the  appropriate \nlearning  rate,  a  (e.g.  approximation  to  inverse  of largest  eigenvalue  of the  Hessian \nweight matrix). \n\n3  RFM-NN  ON-LINE  LEARNING  RESULTS \nBoth feed-forward  and recurrent RFM-NNs  have been  incorporated  into an  interactive \nsimulation  of Ball's Control-Structure Interaction Demonstration Experiment (C-SIDE) \nsee Figure 4. This 1m x 3m lightweight antenna facesheet has 8 embedded actuators plus \nthree auxiliary input actuators and uses 8 remote angular measurement sensors (RAMS) \nplus 4  displacement and 3  velocity  auxiliary sensors.  In order to  evaluate  the  on-line \nperformance of the RFM-NNs the ROM controller was  given insufficient and partially \nincorrect  modes. The ROM  without  the  RFM-NN  grew  unstable  (i.e.  greater  than  10 \nmillimeter C-SIDE displacements) in  13  seconds. The initial feed-forward RFM-NN used \n8 sensor and 6 ROM state feedback estimate inputs as well as 5 hidden units and 3 ROM \nvelocity state perturbation outputs. This RFM-NN had random  initial  weights,  logistic \n\n\fNeural Network On-Line Learning  Control of Spacecraft Smart Structures \n\n307 \n\nactivation functions. and back-propagation training  using one sixth the  learning rate for \nthe output layers (e.g .. 06 and .01). Newton RFM-NN training search used a step size of \none with smoothing factor of one tenth. \n\nFigure 4:  1m x 3m C-SIDE Antenna Facesheet With Embedded Actuators. \n\nThis  RFM-NN  learned  on-line  to  stabilized and  reduce  vibration  to  less  than  \u00b1Imm \nwithin  20  seconds,  see Figure  5.  A five  Newton  force  applied  a few  seconds  later is \ncompensated for  within  nine seconds, see Figure 6. This is accomplished with  learning \noff as  well  as  when  on.  To test  the  necessity  of the RFM-NN  the ROM  was  given  the \nscored supervised training (Le.  Newton's search estimates) directly instead of the RFM(cid:173)\nNN  outputs.  This caused immediate unstable behavior. To test the RFM-NN sensitivity \nto  measurement  accuracy  a  unifonn  error of \u00b15%  was added.  Starting  from  the  same \nrandom  weight start the RFM-NN required 25  seconds to learn  to stabilize the antenna, \nsee Figure 7. The best stability was achieved when the product of the Newton and BPN \nsteps  was approximately  .01.  This  feed-forward  NN  was  compared  to an  Elman-type \nrecurrent NN (i.e. hidden layer feedback to itself with one-step BP training). The recurrent \nRFM-NN  on-line  learning  stability  was  much  less  sensitive  to  initial  weights.  The \nrecurrent RFM-NN stabilized C-SIDE with up to  10% - 20%  measurement noise versus \n5% - 10% limit for feed-forward RFM-NN. \n\n4  SUMMARY  AND  RECOMMENDATIONS \nAdaptive smart sbUctures promise to reduce spacecraft weight and dependence on extensive \nground  monitoring.  A recurrent forward model NN  is  used  as a residual  mode  fllter  to \naugment a traditional reduced-order model (ROM) controller. It was more robust than the \nfeed-forward NN and the traditional-only controller in the presence of unmodeled modes \nand  noisy  measurements.  Further  analyses  and  hardware  implementations  will  be \nperfonned  to  better  quantify  this  robustness  including  the  sensitivity  to  the  ROM \ncontroller mode fidelity, number of output modes.  learning rates, measurement-to-state \nerrors, and time quantization effects. \nTo improve robustness to ROM  mode changes a comparison to the dual forward/inverse \nNN control approach is recommended. The forward model will adjust the search used to \ntrain an inverse model which provides control augmentations to the ROM controller. This \nwill enable control searches to occur both off-line faster than real-time using the forward \nmodel  (Le.  imagination) and on-line using direct search trials with varying  noise levels. \nThe forward model  will adapt using quality experiences (e.g.  via cross validation) which \nimproves  inverse  models  searches.  The  inverse model  reliance on  forward  model  will \nreduce until  forward model prediction errors increase. Future challenges. include solving \n\n\f308 \n\nBowman \n\n(-SIll APtU'lcial  \"8UNl  ttetuaI'k  Reai411&1  no .. 1 Cantrall ... \n\nROft  State  Esti .... tea \n\nROft  State  Ed inat. AcIjud_nts \n\nI \nI \u00b7   \u2022 \n\nI \n\n/ )  \n/\"f'Xi \nr\\ \n\\  X  \\ i   i  \\, \n\n'0xYJ \n\nFigure 5: RFM-NN On-Line Learning To Achieve Stable Control \n\n(-Sill APtltlcial  \"lW'al  Hetwal'k  Reai411&1  I10MI  Cantrall ... \n\nDiapllClIII8IIt  lteuvennta  (+.(-18I111d \n\n'/\\X~, \n. , \n\n\\ ;   \\  . \n\n, \n\nFigure 5: RFM-NN On-Line Learning To Achieve Stable Control (concluded) \n\nthe  temporal credit assignment problem, partitioning to restricted chip sizes, combining \nwith incomplete a priori knowledge, and balancing adaptivity of response with long-term \nlearning.  The goal  is  to  extend stabiJity-dominated,  fixed-goal  traditional  control  with \nadaptive robotic-type  neural  control  to enable  better autonomous  control  where  fully(cid:173)\njustified fixed  models and complete system  knowledge are not  required.  The resultant \nrobust  autonomous  control  will  capitalize on  the  speed  of massively  parallel  analog \nneural-like computations (e.g.  with NN pulse stream chips). \n\n\fNeural Network On-Line Learning  Control of Spacecraft Smart  Structures \n\n309 \n\nC-SIJI  APtificial  tteuNl  IIetuaI'k  Resiaul  IIoUI  Cantrall ... \nlletworll: \n\nPaus. - Hit \n\nto  conti.... \n\nlb.,:  36.4& \n\nFigure 6:  5 Newton Force Vibration Removed Using RFM-NN Learned Forward Model \n\nC-SIJlI  APtificial  ltauPal  tIIriwarIc  Reaiwl  twal  Cantrall ... \nnetwork: \n\nPause.t  - Hit \n\nto  contb... \n\nfl,.:  25.28 \n\n. ~,~ .. \n...  ~.; \n\n' . \n\n.. \nr~ \n\" \n~.  l  \\. \n\\l \n\n.. \n\\i \n.\\ \n~.  \\,: \n,J... \n:  -- '. \n\n\"-.  / \n\n/  ~ .. ,  \"t\u00b7\"'\" \n\nI \n\n\\, ... ---/./ \n\nR~ Est inat.  Acljustaents \n\n\",.  ~~- . \":::=<I>~-::::. :::::.:::;::::::::::::::-=:-==--=~O:::='  =.::..-.:.-= ..... ~==::::.::::: ..... :::_== \n\n--.~ \n..: ~ --... \n\n~;  ~~~ __ ~ ______ .~ __ ~ __ ~.~- .-~. ~.---~.-~--\n\nFigure 7:  RFM\u00b7NN Learning to Remove Vibrations in C\u00b7SIDE With \u00b115% Noisy \n\nDisplacement Measurements \n\n\f310 \n\nBowman \n\n5  REFERENCES \nBarto,  A.G.,  Sutton,  R.S .\u2022  and Watkins,  CJ.C.H., Learning  and  Sequential  Decision \nMaking. Univ. of Mass. at Amherst COINS Technical Report 89-95, September 1989 \nBowman,  C.L.,  Adaptive  Neural  Networks  Applied  to  Signal  Recognition,  3rd Tri(cid:173)\nService Data fusion Symposium, May  1989 \n\nBrody,  Carlos,  Fast  Learning  With  Predictive  Forward  Models.  Neural  Information \nProcessing Systems 4  (NIPS4), 1992 \nJorden, M.I., and Jacobs, R.A., Learning to  Control and Unstable  System with Forward \nModeling, in D.S. Touretzky, ed., Advances in NIPS 2, Morgan Kaufmann  1990. \n\nMoore,  A.W.,  Fast.  Robust  Adaptive  Control  by  Learning  Only  Forward  Models. \nNIPS 4,  1992 \n\nMukhopadhyay  S.  and  Narendra,  D.S .\u2022  Disturbance  Rejection  in  Nonlinear  Systems \nUsing Neural Networks Yale University Report No. 9114 December 1991 \n\nWerbos, P., Architectures For Reinforcement Learning, in Miller, Sutton  and Werbos. \ned., Neural Networks for Control, MIT Press  1990 \nYoung,  D.O., Distributed Finite-Element  Modeling  and  Control Approach for Large \nFlexible Structures, J. of Guidance, Control and Dynamics, Vol.  13  (4),703-713,1990 \n\n\f", "award": [], "sourceid": 615, "authors": [{"given_name": "Christopher", "family_name": "Bowman", "institution": null}]}