{"title": "Gradient and Hamiltonian Dynamics Applied to Learning in Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 274, "page_last": 280, "abstract": null, "full_text": "Gradient and Hamiltonian Dynamics \n\nApplied to Learning in Neural Networks \n\nJames W. Howse \n\nChaouki T. Abdallah \n\nGregory L.  Heileman \n\nDepartment of Electrical and Computer Engineering \n\nUniversity of New  Mexico \nAlbuquerque, NM  87131 \n\nAbstract \n\nThe  process  of machine  learning  can  be  considered  in  two  stages:  model \nselection  and parameter estimation.  In this  paper a  technique  is  presented \nfor constructing dynamical systems with desired qualitative properties.  The \napproach  is  based  on the fact  that  an n-dimensional  nonlinear  dynamical \nsystem can be decomposed into one gradient and  (n - 1)  Hamiltonian sys(cid:173)\ntems.  Thus, the model  selection stage consists of choosing the gradient and \nHamiltonian portions appropriately so that a certain behavior is obtainable. \nTo estimate the parameters, a stably convergent learning rule  is  presented. \nThis algorithm has been proven to converge to the desired system trajectory \nfor  all  initial  conditions  and system inputs.  This technique can be used to \ndesign  neural network models  which  are guaranteed to solve the trajectory \nlearning problem. \n\nIntroduction \n\n1 \nA fundamental  problem in mathematical systems  theory is  the  identification of dy(cid:173)\nnamical systems.  System  identification  is  a  dynamic  analogue of the functional  ap(cid:173)\nproximation problem.  A set of input-output pairs {u(t), y(t)} is given over some time \ninterval t E [7i, 1j]. The problem is to find a model which for the given input sequence \nreturns an approximation of the given output sequence.  Broadly speaking, solving an \nidentification problem  involves two  steps.  The first  is  choosing a  class  of identifica(cid:173)\ntion models  which  are capable of emulating the behavior of the actual system.  The \nsecond is selecting a method to determine which member of this class of models  best \nemulates the actual system.  In this paper we  present a class of nonlinear models and \na learning algorithm for  these models which  are guaranteed to learn the trajectories \nof an  example  system.  Algorithms  to learn  given  trajectories of a  continuous  time \nsystem have been proposed in [6],  [8],  and [7]  to name only a few.  To our knowledge, \nno  one  has  ever  proven that the error between  the learned  and  desired  trajectories \nvanishes  for  any  of these algorithms.  In our trajectory learning system this error is \nguaranteed to vanish.  Our models  extend  the  work  in  [1]  by  showing that  Cohen's \nsystems are one instance of the class of models generated by decomposing the dynam(cid:173)\nics into a component normal to some surface and a set of components tangent to the \nsame surface.  Conceptually this formalism can be used  to design  dynamical systems \nwith a  variety of desired qualitative properties.  Furthermore, we  propose a provably \nconvergent learning algorithm which  allows  the parameters of Cohen's models to be \nlearned from  examples  rather than being programmed in  advance.  The algorithm is \n\n\fGradient and Hamiltonian Dynamics Applied to  Learning in Neural  Networks \n\n275 \n\nconvergent  in  the  sense  that  the  error  between  the  model  trajectories  and  the  de(cid:173)\nsired  trajectories is  guaranteed to vanish.  This  learning procedure is  related  to one \ndiscussed in [5]  for  use in linear system identification. \n\n2  Constructing the Model \nFirst some terminology will  be defined.  For a  system of n  first  order ordinary differ(cid:173)\nential equations, the phase  space  of the system is  the n-dimensional space of all state \ncomponents.  A solution  trajectory  is  a  curve  in  phase space described  by  the differ(cid:173)\nential equations for  one specific starting point.  At every point on  a  trajectory there \nexists a tangent vector.  The space of all such tangent vectors for  all possible solution \ntrajectories constitutes the  vector field  for  this system of differential equations. \nThe trajectory learning models  in  this  paper are systems  of first  order ordinary dif(cid:173)\nferential equations.  The form of these equations will  be  obtained by  considering the \nsystem dynamics as motion relative to some surface.  At each point in the state space \nan arbitrary system trajectory will  be decomposed into a  component normal  to this \nsurface and a set of components tangent to this surface.  This approach was suggested \nto us  by  the results in  [4],  where it is  shown that an arbitrary n-dimensional vector \nfield  can be decomposed locally into the sum of one gradient vector field  and (n - 1) \nHamiltonian  vector  fields.  The  concept  of a  potential  function  will  be  used  to  de(cid:173)\nfine  these  surfaces.  A  potential  function  V(:z:)  is  any  scalar  valued  function  of the \nsystem states :z:  =  [Xl, X2, \u2022\u2022\u2022 , Xn.] t  which is  at least twice  continuously differentiable \n(Le.  V(:z:)  E  or  : r  ~ 2).  The operation [.]t  denotes  the transpose of the  vector.  If \nthere  are  n  components in  the system  state,  the  function  V{:z:),  when  plotted  with \nrespect all of the state components, defines a surface in an (n + 1 )-dimensional space. \nThere are two curves passing through every point on this potential surface which  are \nof interest in this discussion,  they are illustrated in Figure 1(a).  The dashed curve is \n\n(z - zo)t \\7 ... v (z)l ... o = 0 \n\n(a) \n\n(b) \n\nV(z) =  K-\n\nFigure 1:  (a) The potential function  V(z) =  X~ (Xl  _1)2 +x~ plotted versus its two  depen(cid:173)\ndent  variables Xl  and X2.  The dashed curve is called a level surface and is given \nby  V(z) =  0.5.  The solid  curve follows  the path of steepest  descent  through  Zo. \n(b)  The  partitioning  of  a  3-dimensional  vector  field  at  the  point  Zo  into  a  1-\ndimensional portion which is normal to the surface V(z) =  K- and a 2-dimensional \nportion which is tangent to V(z) = K-.  The vector -\\7 ... V(z) 1\"'0  is the normal vec(cid:173)\ntor to the surface V(z) = K- at the point Zo.  The plane (z - zo)t \\7 ... V (z) 1\"'0  = 0 \ncontains  all  of the vectors  which  are  tangent  to  V(z) = K- at  Zo.  Two  linearly \nindependent  vectors  are  needed  to  form  a basis  for  this  tangent space,  the pair \nQ2(z) \\7 ... V (z)l ... o and Q3(Z) \\7 ... V (z)l ... o that are shown are just one possibility. \nreferred to as  a  level  surface,  it is  a surface along which  V(:z:)  =  K  for  some constant \nK.  Note that in general this level surface is an n-dimensional object.  The solid curve \n\n\f276 \n\nJ.  W.  HOWSE, C. T. ABDALLAH, G.  L.  HEILEMAN \n\nmoves  downhill  along V (X)  following  the path of steepest descent through the point \nXo.  The  vector  which  is  tangent to this  curve  at  Xo  is  normal  to the  level  surface \nat Xo.  The system dynamics will  be designed as  motion relative to the level surfaces \nof V(x).  The  results  in  [4]  require  n  different  local  potential  functions  to  achieve \narbitrary dynamics.  However, the results  in  [1]  suggest that a  considerable number \nof dynamical systems can be achieved using only a single global potential function. \nA system  which  is  capable  of traversing any  downhill  path along  a  given  potential \nsurface  V(x),  can  be  constructed  by  decomposing  each  element  of the  vector  field \ninto  a  vector  normal  to the  level  surface  of V(x)  which  passes  through  each  point \nand a  set  of vectors tangent to the  level  surface  of V(x)  which  passes  through the \nsame  point.  So  the  potential  function  V(x)  is  used  to partition the n-dimensional \nphase  space  into  two  subspaces.  The  first  contains  a  vector  field  normal  to  some \nlevel  surface  V(x)  =  }(  for  }(  E  IR,  while  the  second  subspace  holds  a  vector  field \ntangent  to V(x)  =  IC.  The  subspace  containing  all  possible  normal  vectors  to  the \nn-dimensional  level  surface  at a  given  point,  has  dimension  one.  This is  equivalent \nto the statement that every  point on  a  smooth surface has  a  unique  normal  vector. \nSimilarly,  the subspace containing all  possible tangent vectors to the level surface at \na  given  point  has  dimension  (n - 1).  An  example  of this  partition in the case  of a \n3-dimensional system is  shown  in Figure 1 (b).  Since  the space of all tangent vectors \nat each point on a  level  surface is  (n - I)-dimensional,  (n - 1)  linearly independent \nvectors are required to form  a  basis for  this space. \nMathematically, there is a straightforward way to construct dynamical systems which \neither  move  downhill  along  V(x)  or  remain  at  a  constant  height  on  V(x).  In this \npaper,  dynamical  systems  which  always  move  downhill  along some  potential surface \nare called  gradient-like  systems.  These systems  are  defined  by  differential  equations \nof the form \n\nx =  -P(x) VII: V(x), \n\n(1) \nwhere  P(x)  is  a  matrix  function  which  is  symmetric  (Le.  pt  =  P)  and  positive \ndefinite  at every point  x,  and where  V III V(x)  =  [g;: , g;: , ... , :z~]f.  These systems \n\nare  similar  to  the  gradient  flows  discussed  in  [2].  The  trajectories  of the  system \nformed  by  Equation  (1)  always move  downhill along the potential surface defined  by \nV(x).  This  can  be  shown  by  taking  the  time  derivative  of V(x)  which  is  V(x)  = \n-[VII: V (x)]t P(x) [VII: V(x)] :5 O.  Because P(x) is positive definite, V(x) can only be \nzero where V II: V (x)  =  0, elsewhere V(x) is negative.  This means that the trajectories \nof Equation (1)  always move toward a level surface of V(x)  formed  by  \"slicing\"  V(x) \nat a lower height, as  pointed out in [2].  It is also easy to design systems which remain \nat a constant height on V(x).  Such systems will be denoted Hamiltonian-like systems. \nThey are specified  by the equation \n\nx =  Q(x) VII: V(x), \n\n(2) \nwhere  Q(x)  is  a  matrix function  which  is  skew-symmetric  (Le.  Qt  =  -Q) at every \npoint  x.  These  systems  are  similar to the  Hamiltonian systems  defined  in  [2].  The \nelements of the vector field  defined  by Equation  (2)  are always  tangent to some level \nsurface of V (x).  Hence the trajectories ofthis system remain at a constant height on \nthe potential surface  given  by  V(x).  Again  this  is  indicated  by  the time  derivative \nof V(x), which  in this case is  V(x) = [VII: V(x)]f Q(x)[VII: V(x)] = o. This indicates \nthat the trajectories of Equation (2)  always remain on the level  surface on which the \nsystem  starts.  So  a  model  which  can  follow  an  arbitrary  downhill  path  along  the \npotential surface V(x)  can  be  designed by combining the dynamics of Equations  (1) \nand  (2) .  The dynamics  in the subspace  normal to the level  surfaces of V(x)  can  be \n\n\fGradient and Hamiltonian Dynamics  Applied to Learning in Neural Networks \n\n277 \n\ndefined using one equation of the form in Equation (1).  Similarly the dynamics in the \nsubspace tangent to the level surfaces of Vex)  can be defined using  (n - 1) equations \nof the form  in Equation (2).  Hence the total dynamics for  the model are \n\nz= -P(x)VIDV(x) + LQi(X)VIDV(x). \n\nn \n\ni=2 \n\n(3) \n\nFor  this model  the number  and  location of equilibria is  determined  by  the function \nVex), while the manner in which the equilibria are approached is  determined by the \nmatrices P(x) and Qi(x). \nIf the  potential function  Vex)  is  bounded  below  (i.e.  Vex)  > Bl  V  x  E  IRn ,  where \nBl  is  a  constant),  eventually  increasing  (i.e.  limlllDlI-+oo Vex)  ~ 00)  ,  and  has  only \na  finite  number  of  isolated  local  maxima  and  minima  (i.e.  in  some  neighborhood \nof every  point  where  V III V (x)  =  0  there  are  no  other  points  where  the  gradient \nvanishes),  then  the  system  in  Equation  (3)  satisfies  the  conditions  of Theorem  10 \nin  [1].  Therefore the system will  converge  to one  of the points where  V ID Vex)  =  0, \ncalled  the  critical  points  of Vex),  for  all  initial  conditions.  Note  that  this  system \nis  capable of all  downhill  trajectories along  the potential surface  only  if the  (n - 1) \nvectors  Qi(X) VID Vex)  V  i  =  2, ... , n  are linearly independent  at every point  x.  It \nis  shown in [1]  that the potential function \n\nV(z) =  C ( \n\n1:., (-y) d-y + t, [ ~ (XI  - I:.,(xd)' + ~ J:' 1:., h )II:.: (-y)]'  d-y 1 \n\n(4) \n\nsatisfies these three criteria.  In this equation \u00a3.i(Xt}  Vi =  1, ... , n  are interpolation \npolynomials, C is  a real positive constant, Xi  Vi =  1, ... , n  are real constants chosen \nso that the integrals are positive valued, and \u00a3.Hxt}  ==  f:-. \n3  The Learning Rule \nIn  Equation  (3)  the  number  and  location  of equilibria can  be  controlled  using  the \npotential function Vex), while the manner in which the equilibria are approached can \nbe  controlled with the matrices  P(x) and Qi(X).  If it is  assumed that the locations \nof the equilibria  are  known,  then  a  potential  function  which  has  local  minima and \nmaxima  at  these  points  can  be  constructed  using  Equation  (4).  The  problem  of \ntrajectory learning is  thereby reduced to the problem of parameterizing the matrices \nP(x)  and  Qi(x)  and  finding  the  parameter values  which  cause  this  model  to  best \nemulate  the  actual  system.  If the  elements  P(x)  and  Qi(x)  are  correctly  chosen, \nthen  a  learning  rule  can  be  designed  which  makes  the model  dynamics  converge  to \nthat of the  actual  system.  Assume  that the dynamics  given  by  Equation  (3)  are  a \nparameterized  model  of the  actual  dynamics.  Using  this  model  and samples  of the \nactual system states, an estimator for states of the actual system can be designed.  The \nbehavior of the model is altered by changing its parameters, so a parameter estimator \nmust also be constructed.  The following  theorem provides a form  for  both the state \nand  parameter estimators  which  guarantees  convergence  to  a  set  of parameters  for \nwhich the error between the estimated and target trajectories vanishes. \nTheorem 3.1.  Given  the  model system \n\nk \n\nZ =  LAili(x) +Bg(u) \n\ni=l \n\n(5) \n\nwhere Ai E IRnxn  and BE IRnxm  are  unknown,  and li(') and g(.)  are  known smooth \nfunctions  such  that the system has  bounded solutions for  bounded inputs u(t).  Choose \n\n\f278 \n\nJ.  W. HOWSE, C. T. ABDALLAH, G. L. HEILEMAN \n\na state estimator of the form \n\n~ = 'R. B  (x - x) + L Ai fi(x) + iJ g(u) \n\nk \n\n(6) \n\ni=1 \n\nwhere'R.B  is  an  (n x n)  matrix of real  constants whose  eigenvalues  must all  be  in the \nleft  half plane,  and  Ai  and iJ  are  the  estimates  of the  actual  parameters.  Choose \nparameter  estimators  of the form \nt \nAi = -'R.p  (x - x) [fi(x)]  V  i  = 1, ... , k \nB =  -'R.p (x - x) [g(u)]t \n\n(7) \n\n~ \n\nwhere  'R.p  is  an  (n x  n)  matrix  of  real  constants  which  is  symmetric  and  positive \ndefinite,  and  (x - x) [.]t  denotes  an  outer product.  For  these  choices  of state  and \nparameter estimators limt~oo(x(t) -x(t\u00bb  =  0 for all initial conditions.  Furthermore, \nthis  remains  true if  any  of the  elements  of Ai  or iJ  are  set to  0,  or if any  of these \nmatrices  are  restricted to  being  symmetric  or skew-symmetric. \nThe  proof of this  theorem  appears in  [3].  Note  that convergence  of the  parameter \nestimates  to  the  actual  parameter  values  is  not  guaranteed  by  this  theorem.  The \nmodel  dynamics in Equation  (3)  can be cast in the form of Equation (5)  by choosing \neach element of P(x) and Qi(X) to have the form \n\nPrB  = LL~rBjkt?k(Xj) \n\nand \n\nQrB  = LLArBjk ek(Xj), \n\n(8) \n\nn  I-I \n\nj=1  k=O \n\nn  I-I \n\nj=1  k=O \n\nwhere  {t?o(Xj), t?1 (Xj), ... ,t?I-1 (Xj)}  and {eo(Xj), el (Xj), ... ,el-l (Xj)}  are a set of 1 \northogonal polynomials  which  depend  on the state Xj'  There is  a  set of such  poly(cid:173)\nnomials for  every state Xj,  j  =  1,2, ... , n.  The constants  ~rBjk and  ArBjk  determine \nthe contribution of the  kth  polynomial which  depends  on  the jth state to the  value \nof Prs  and Qrs  respectively.  In this case the dynamics in Equation  (3)  become \n\n:i: = t. ~ { S;. [11.(x;) V. V (z)j + t, A;;. [e;.(x;) v. V(z)j }  + T g(u(t)) \n\n(9) \n\nwhere 8 jk is the (n x n) matrix of all values ~rsjk which have the same value of j  and \nk.  Likewise  A ijk  is  the  (n x  n)  matrix of all  values  Arsjk,  having the same  value  of \nj  and k, which  are associated with the ith matrix Qi(X).  This system has m  inputs, \nwhich  may  explicitly  depend  on time,  that are represented by  the m-element  vector \nfunction  u(t).  The  m-element  vector  function  g(.)  is  a  smooth,  possibly  nonlinear, \ntransformation of the input function.  The matrix Y  is an (n x m) parameter matrix \nwhich  determines  how  much  of input  S  E  {I, ... , m}  effects  state  r  E  {I, ... , n}. \nAppropriate state and parameter estimators can be designed based on Equations (6) \nand (7)  respectively. \n4  Simulation Results \nNow  an example is  presented in  which the parameters of the model  in  Equation  (9) \nare trained, using the learning rule in Equations (6)  and (7), on one input signal and \nthen are tested on  a  different  input signal.  The actual system has three equilibrium \npoints,  two  stable  points  located  at  (1,3)  and  (3,5),  and  a  saddle  point  located  at \n(2  - ~,4 + ~). In this  example the  dynamics  of both the  actual system  and the \nmodel are given by \n(~1) = (1'1 +  1'2 Z~ +:3 Z~  O2)  (:~) + (0 - {1'7 +  1'8 Z1  +  1'9 Z2}) (:~ )  + (1'10)  u(t) \n\n(10) \n\n0  1'4 + 1'5 Z1  + 1'6 Z2 \n\n'P7 + 'P8  ZI + 1'9 Z2 \n\nZ2 \n\n0 \n\n0 \n\n8Y \n8Z2 \n\n8Y \n8Z2 \n\n\fGradient and Hamiltonian Dynamics Applied to Learning in Neural Networks \n\n279 \n\nwhere V(x) is defined in Equation (4)  and u(t) is a time varying input.  For the actual \nsystem the  parameter  values  were  'PI  = 'P4  = -4, 'P2  = 'Ps  = -2, 'P3  = 'P6  = -1, \n'P7  = 1,  'Ps  = 3,  'P9  = 5,  and  'PIO  = 1. \nIn  the  model  the  10  elements  'Pi  are \ntreated as the unknown parameters which must be learned.  Note that the first matrix \nfunction  is  positive  definite  if the  parameters  'PI-'P6  are  all  negative  valued.  The \nsecond  matrix  function  is  skew-symmetric  for  all  values  of 'P7-'P9.  The  two  input \nsignals used for  training and testing were  Ul  =  10000 (sin! 1000t + sin ~ 1000t)  and \nU2  =  5000 sin 1000 t.  The phase space responses of the actual system to the inputs UI \nand U2  are shown by the solid curves in Figures 3(b) and 3(a) respectively.  Notice that \nboth of these inputs produce a periodic attractor in the phase space of Equation (10). \nIn order to evaluate the effectiveness of the learning algorithm the Euclidean distance \nbetween the actual and learned state and parameter values was computed and plotted \nversus time.  The results are shown in Figure 2.  Figure 2(a) shows these statistics when \n\n{1I~zll,  II~'PII} \n\n{1I~zll, II~'PII} \n\n17.5 \n\n15 \n\n12.5 \n\n10 \n7.5 i \n\n,., ~--.----... -... --....... ----\n\n15 \n\n12.5 \n\n2.5 \n\n50 \n\n100 \n\n150 \n(a) \n\n200 \n\n250 \n\n300  t \n\n50 \n\n100 \n\n150 \n(b) \n\n200 \n\n250 \n\n300  t \n\nFigure 2:  (a)  The state and parameter errors for  training using input signal  Ut.  The solid \ncurve is the Euclidean distance between the state estimates and the actual states \nas a function of time.  The dashed curve shows the distance between the estimated \nand actual parameter values versus time. \n(b)  The state and parameter errors for  training using input signal  U2. \n\ntraining with input  UI,  while Figure 2(b)  shows the same statistics for  input U2.  The \nsolid curves are the Euclidean distance between the learned and actual system states, \nand  the  dashed  curves  are  the  distance  between  the  learned  and  actual  parameter \nvalues.  These  statistics have  two  noteworthy features.  First,  the error  between  the \nlearned and  desired  states quickly  converges  to very  small  values,  regardless of how \nwell the actual parameters are learned.  This result was  guaranteed by Theorem 3.1. \nSecond, the final error between the learned and desired parameters is much lower when \nthe system is trained with input  UI.  Intuitively this is  because input Ul  excites more \nfrequency  modes  of the system than input  U2.  Recall that in  a nonlinear  system the \nfrequency modes excited  by  a given input do not depend  solely on the input because \nthe  system  can  generate  frequencies  not  present  in  the  input.  The  quality  of  the \nlearned  parameters can be  qualitatively judged by  comparing the  phase  plots  using \nthe learned and actual parameters for each input, as shown in Figure 3.  In Figure 3(a) \nthe system was trained using input Ul  and tested with input U2,  while  in Figure 3(b) \nthe situation was reversed.  The solid curves are the system response using the actual \nparameter values,  and the dashed curves are the response for the learned parameters. \nThe  Euclidean  distance  between the  target and test trajectories in  Figure  3(a)  is  in \nthe range (0,0.64) with a mean distance of 0.21  and a standard deviation of 0.14.  The \ndistance  between the  the  target and  test  trajectories in  Figure  3(b)  is  in  the  range \n(0,4.53) with a mean distance of 0.98 and a standard deviation of 1.35.  Qualitatively, \nboth  sets  of  learned  parameters  give  an  accurate  response  for  non-training inputs. \n\n\f280 \n\n1.  W. HOWSE, C. T. ABDALLAH, G. L. HEILEMAN \n\n5 \n\no  -------r-- ------- ----- -\n\nI \nI \n\n{i  - 5 \n\n-10 \n\n-15 \n\n-l \n\n-1 \n\n1 \nXl \n\n(a) \n\n- 2 \n\n-1 \n\n4 \n\n(b) \n\nFigure 3:  (a)  A  phase plot of the system response when trained with input  UI  and tested \nwith input  U2.  The solid  line is the response to the test  input  using the actual \nparameters.  The dotted line is the system response using the learned parameters. \n(b)  A  phase plot of the system response when trained with  input  U2  and tested \nwith input UI. \n\nNote  that even  when the error between  the learned  and  actual  parameters is  large, \nthe periodic attractor resulting from the learned parameters appears to have the same \n\"shape\"  as that for  the actual parameters. \n5  Conclusion \nWe  have  presented  a  conceptual  framework  for  designing  dynamical  systems  with \nspecific qualitative properties by decomposing the dynamics into a component normal \nto  some  surface  and  a  set  of components  tangent  to  the  same  surface.  We  have \npresented a specific instance of this class of systems which converges to one of a finite \nnumber of equilibrium points.  By parameterizing these systems, the manner in which \nthese equilibrium  points  are  approached can  be  fitted  to an  arbitrary data set.  We \npresent  a  learning  algorithm  to  estimate  these  parameters  which  is  guaranteed  to \nconverge  to  a  set  of parameter values  for  which  the error  between  the  learned  and \ndesired trajectories vanishes. \nAcknowledgments \nThis  research  was  supported  by  a  grant  from  Boeing  Computer  Services  under  Contract \nW-300445.  The  authors  would  like  to  thank  Vangelis  Coutsias,  Tom  Caudell,  and  Bill \nHome for  stimulating discussions and insightful suggestions. \nReferences \n[1]  M.A. Cohen.  The construction of arbitrary stable dynamics in nonlinear neural networks. \n\nNeural  Networks,  5(1):83-103,  1992. \n\n[2]  M.W. Hirsch and S.  Smale. Differential equations,  dynamical systems,  and linear algebra, \nvolume 60 of Pure and Applied Mathematics.  Academic Press, Inc., San Diego, CA, 1974. \n[3]  J.W. Howse,  C.T.  Abdallah, and G.L. Heileman.  A gradient-hamiltonian decomposition \nfor  designing and learning dynamical systems.  Submitted to  Neural  Computation,  1995. \n[4]  R.V.  Mendes  and  J .T.  Duarte.  Decomposition  of vector  fields  and  mixed  dynamics. \n\nJournal  of Mathematical  Physics,  22(7):1420-1422,  1981. \n\n[5]  K.S. Narendra and A.M.  Annaswamy.  Stable  adaptitJe  systems.  Prentice-Hall,  Inc., En(cid:173)\n\nglewood Cliffs,  NJ, 1989. \n\n[6]  B.A.  Pearlmutter.  Learning state space trajectories in recurrent neural networks.  Neural \n\nComputation,  1(2):263-269,  1989. \n\n[7]  D.  Saad.  Training  recurrent  neural networks  via trajectory  modification.  Complex  Sys(cid:173)\n\ntems,  6(2):213-236,  1992. \n\n[8]  M.-A.  Sato.  A  real  time  learning algorithm  for  recurrent  analog  neural  networks.  Bio(cid:173)\n\nlogical  Cybernetics,  62(2):237-241,  1990. \n\n\f", "award": [], "sourceid": 1033, "authors": [{"given_name": "James", "family_name": "Howse", "institution": null}, {"given_name": "Chaouki", "family_name": "Abdallah", "institution": null}, {"given_name": "Gregory", "family_name": "Heileman", "institution": null}]}