{"title": "Modern Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 253, "page_last": 259, "abstract": "", "full_text": "Modern Analytic Techniques to Solve the \nDynamics of Recurrent Neural Networks \n\nA.C.C. Coolen \n\nDept. of Mathematics \nKing's College London \n\nS.N. Laughton \n\nStrand, London WC2R 2LS, U.K. \n\n1 Keble Road, Oxford OX1 3NP, U.K. \n\nDept. of Physics - Theoretical Physics \n\nUniversity of Oxford \n\nD. Sherrington .. \n\nCenter for Non-linear Studies \n\nLos Alamos National Laboratory \nLos Alamos, New Mexico 87545 \n\nAbstract \n\nWe describe the use of modern analytical techniques in solving the \ndynamics of symmetric and nonsymmetric recurrent neural net(cid:173)\nworks near saturation. These explicitly take into account the cor(cid:173)\nrelations between the post-synaptic potentials, and thereby allow \nfor a reliable prediction of transients. \n\n1 \n\nINTRODUCTION \n\nRecurrent neural networks have been rather popular in the physics community, \nbecause they lend themselves so naturally to analysis with tools from equilibrium \nstatistical mechanics. This was the main theme of physicists between, say, 1985 \nand 1990. Less familiar to the neural network community is a subsequent wave of \ntheoretical physical studies, dealing with the dynamics of symmetric and nonsym(cid:173)\nmetric recurrent networks. The strategy here is to try to describe the processes \nat a reduced level of an appropriate small set of dynamic macroscopic observables. \nAt first, progress was made in solving the dynamics of extremely diluted models \n(Derrida et al, 1987) and of fully connected models away from saturation (for a \nreview see (Coolen and Sherrington, 1993)). This paper is concerned with more \nrecent approaches, which take the form of dynamical replica theories, that allow \nfor a reliable prediction of transients, even near saturation. Transients provide the \nlink between initial states and final states (equilibrium calculations only provide \n\n\u00b7On leave from Department of Physics - Theoretical Physics, University of Oxford \n\n\f254 \n\nA. C. C. COOLEN, S. N. LAUGHTON, D. SHERRINGTON \n\ninformation on the possible final states). In view of the technical nature of the \nsubject, we will describe only basic ideas and results for simple models (full details \nand applications to more complicated models can be found elsewhere). \n\n2 RECURRENT NETWORKS NEAR SATURATION \n\nLet us consider networks of N binary neurons ai E {-I, I}, where neuron states \nare updated sequentially and stochastically, driven by the values of post-synaptic \npotentials hi . The probability to find the system at time t in state 0' = (a1,' .. , aN) \nis denoted by Pt(O'). For the rates Wi(O') of the transitions ai -t -(7i and for the \npotentials hi (0') we make the usual choice \n\nWi (0') = - [1-ai tanh [,Bhi (0')]] \n\n1 \n2 \n\nhi(O') = L Jijaj \n\nj:f:i \n\nThe parameter ,B controls the degree of stochasticity: the ,B = 0 dynamics is com(cid:173)\npletely random, whereas for ,B = 00 we find the deterministic rule ai -t sgn[hi(O')]. \nThe evolution in time of Pt(O') is given by the master equation \ndtPt (0') = l: [Pt (FkO' )Wk (FkO') - Pt (0' )Wk (0')] \n\n(1) \n\nd \n\nN \n\nk=l \n\nwith Fk
1 \n\nj:f: i \n\n(2) \n\nAll complications arise from the noise terms. \nThe 'Local Chaos Hypothesis' (LCH) consists of assuming the noise terms to be \nindependently distributed Gaussian variables. The macroscopic description then \nconsists of the overlap m and the width ~ of the noise distribution (Amari and \nMaginu, 1987). This, however, works only for states near the nominated pattern, \nsee also (Nishimori and Ozeki, 1993). In reality the noise components in the po(cid:173)\ntentials have far more complicated statisticsl . Due to the build up of correlations \nbetween the system state and the non-nominated patterns, the noise components \ncan be highly correlated and described by bi-modal distributions. Another approach \ninvolves a description in terms of correlation- and response functions (with two time(cid:173)\narguments). Here one builds a generating functional, which is a sum over all possible \ntrajectories in state space, averaged over the distribution of the non-nominated pat(cid:173)\nterns. One finds equations which are exact for N -t 00 , but, unfortunately, also \nrather complicated. For the typical neural network models solutions are known \nonly in equilibrium (Rieger et aI, 1988); information on transients has so far only \nbeen obtained through cumbersome approximation schemes (Horner et aI, 1989). \nWe now turn to a theory that takes into account the non-trivial statistics of the \npost-synaptic potentials, yet involves observables with one time-argument only. \n\nlCorrelations are negligible only in extremely diluted (asymmetric) networks (Derrida \n\net aI , 1987) , and in networks with independently drawn (asymmetric) random synapses \n\n\fModem Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks \n\n255 \n\n3 DYNAMICAL REPLICA THEORIES \n\nThe evolution of macroscopic observables n( 0') = (0 1 (0'), ... , OK (0')) can be de(cid:173)\nscribed by the so-called Kramers-Moyal expansion for the corresponding probability \ndistribution pt(n) (derived directly from (1)). Under certain conditions on the sen(cid:173)\nsitivity of n to single-neuron transitions (7i -t -1J'i, one finds on finite time-scales \nand for N -t 00 the macroscopic state n to evolve deterministically according to: \n(3) \n\n~n = EO' pt(O')8 [n-n(O')] Ei Wi(O') [n(FiO')-n(O')] \ndt \n\nEO' pt(O')8 [n-n(O')] \n\nThis equation depends explicitly on time through Pt(O'). However, there are two nat(cid:173)\nural ways for (3) to become autonomous: (i) by the term Ei Wi(O') [n(FiO') -n(O')] \ndepending on u only through n(O') (as for attractor networks away from satura(cid:173)\ntion), or (ii) by (1) allowing for solutions of the form Pt(O') = fdn(O')] (as for \nextremely diluted networks). In both cases Pt(O') drops out of (3). Simulations fur(cid:173)\nther indicate that for N -t 00 the macroscopic evolution usually depends only on \nthe statistical properties of the patterns {ell}, not on their microscopic realisation \n('self-averaging'). This leads us to the following closure assumptions: \n\n1. Probability equipartitioning in the n subshells of the ensemble: Pt(O') '\" \n8 [nt-n(O')]. If n indeed obeys closed equations, this assumption is safe. \n2. Self-averaging of the n flow with resfect to the microscopic details of the \nnon-nominated patterns: tt n -t (dt n)patt. \n\nOur equations (3) are hereby transformed into the closed set: \n\n~n _ (EO' 8 [n-n(O')] Ei Wi(O') [n(FiO') - n(O')]) \ndt \n\nEO' 8 [n-n(O')] \n\n-\n\npatt \n\nThe final observation is that the tool for averaging fractions is replica theory: \n\ndd n = lim \nt \n\nlim ~ (~Wi(O'l) [n(FiO'1)-n(O' 1)] rrn 8[n-n(O'O )])patt (4) \n\nn--tO N --too ~ ~ \n\nO'I \u00b7\u00b7\u00b7O' n \n\ni \n\n0=1 \n\nThe choice to be made for the observables n(O'), crucial for the closure assumptions \nto make sense, is constrained by requiring the theory to be exact in specific limits: \n\nexactness for a -t 0 : n = (m, ... ) \nexactness for t -t 00: n = (E, ... ) \n\n(for symmetric models only) \n\n4 SIMPLE VERSION OF THE THEORY \n\nFor the Hopfield model (2) the simplest two-parameter theory which is exact for a -t \no and for t -t 00 is consequently obtained by choosing n = (m,E). Equivalently \nwe can choose n = (m,r), where r(O') measures the 'interference energy': \n\nm = ~ L~I(7i \n\ni \n\nThe result of working out (4) for n = (m, r) is: \n\n!m = J dz Dm,r[z] tanh,B (m+z) - m \n\"2 dt r =; dz Dm,r[z]z tanh,B (m+z) + 1 - r \n1 d \n\n1 J \n\n\f256 \n\nA. C. C. COOLEN, S. N. LAUGHTON, D. SHERRINGTON \n\n15 ~----------------------------~ \n\nr \n\n/ \n\n/ \n\n/ \n\n/ \n\n/ \nI \n\no L -____________________________ ~ \n\no \n\nm \n\n1 \n\nFigure 1: Simulations (N = 32000, dots) versus simple RS theory (solid lines), for \na = 0.1 and j3 = 00. Upper dashed line: upper boundary of the physical region. \nLower dashed line: upper boundary of the RS region (the AT instability). \n\nin which Dm,r[z] is the distribution of 'interference-noise' terms in the PSP's, for \nwhich the replica calculation gives the outcome (in so-called RS ansatz): \n\n2 27rar \n\napr \n\napr \n\nDm,r[z] = e-~2 {l-jDY tanh [>.y [~] t+(~+Z)-~+{tl} \n+ e-~)2 {1-jDY tanh [>.y [~] t +(~-Z)~-{tl} \nthe remaining parameters {q, {t, p} to be solved from the coupled equations: \nwith Dy = [27rj-t e- h2 dy, ~ = apr->.2jp and>' = pyaq[l-p(l-q)]-l, and with \n1-p(1-q)2 \nr = [1-p(1-q)]2 \n\nq = Dy tanh2 [>.y+{t] \n\n2 27rar \n\nm = Dy tanh[>'y+{tj \n\nj \n\napr \n\napr \n\nj \n\nHere we only give (partly new) results of the calculation; details can be found \nin (Coolen and Sherrington, 1994). The noise distribution is not Gaussian (in \nagreement with simulations, in contrast to LCH). Our simple two-parameter theory \nis found to be exact for t '\" 0, t -7 00 and for a -7 O. Solving numerically the \ndynamic equations leads to the results shown in figures 1 and 2. We find a nice \nagreement with numerical simulations in terms of the flow in the (m, r) plane. \nHowever, for trajectories leading away from the recall state m '\" 1, the theory \nfails to reproduce an overall slowing down. These deviations can be quantified by \ncomparing cumulants of the noise distributions (Ozeki and Nishimori, 1994), or by \napplying the theory to exactly solvable models (Coolen and Franz, 1994). Other \nrecent applications include spin-glass models (Coolen and Sherrington, 1994) and \nmore general classes of attractor neural network models (Laughton and Coolen, \n1995). The simple two-parameter theory always predicts adequately the location of \nthe transients in the order parameter plane, but overestimates the relaxation speed. \nIn fact, figure 2 shows a remarkable resemblance to the results obtained for this \nmodel in (Horner et al, 1989) with the functional integral formalism; the graphs of \nm(t) are almost identical, but here they are derived in a much simpler way. \n\n\fModem Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks \n\n257 \n\n1 \n\n.8 \n\n2 \u00b76 \n~ \n\n.4 \n\n.2 \n\n10 \n\n--..., \n..., \n'-' \n!--\n\n5 \n\n--\n\n..... \n\n..... ..... ..... \n\n.... \n\n..... .... .... .... \n--\n\n.... ........ \n-----\n\n0 \n\n0 \n\n2 \n\n4 \n\n6 \n\nt \n\nB \n\n10 \n\n0 \n\n0 \n\n2 \n\n4 \n\n6 \n\nt \n\nB \n\n10 \n\nFigure 2: Simulations (N = 32000, dots) versus simple RS theory (RS stable: solid \nlines, RS unstable: dashed lines), now as functions of time, for Q; = 0.1 and f3 = 00. \n\n5 ADVANCED VERSION OF THE THEORY \nImproving upon the simple theory means expanding the set n beyond n = (m,E). \nAdding a finite number of observables will only have a minor impact; a qualitative \nstep forward, on the other hand, results from introducing a dynamic order parameter \nfunction. Since the microscopic dynamics (1) is formulated entirely in terms of \nneuron states and post-synaptic potentials we choose for n (u) the joint distribution: \n\nD[(, h](u) = N L <5 [( -O\"i] <5 [h-hi(U)] \n\n1 \n\ni \n\nThis choice has the advantages that (a) both m and (for symmetric systems) E are \nintegrals over D[(, h], so the advanced theory automatically inherits the exactness \nat t = 0 and t = 00 of the simple one, (b) it applies equally well to symmetric and \nnonsymmetric models and (c) as with the simple version, generalisation to models \nwith continuous neural variables is straightforward. Here we show the result of \napplying the theory to a model of the type (1) with synaptic interactions: \n\nJij = ~ ~i~j + .iN [cos(~ )Xij +sin(~ )Yij ] \n\nXij = Xji, Yij = -Yji (independent random Gaussian variables) \n\n(describing a nominated pattern being stored on a 'messy' synaptic background). \nThe parameter w controls the degree of synaptic symmetry (e.g. w = 0: symmetric, \nw = 7r: anti-symmetric) . Equation (4) applied to the observable D[(, h](u) gives: \n\n8 \nmDt[C h] = J2[1-(O\"tanh(f3H))Dt] 8h2Dt [(,h] + 8h A [( , h;Dt] \n\n~ \n\n8 \n\n+ :h {DdCh] [h-Jo(tanh(f3H ))Dt]} \n\n1 \n\n+2 [l+(tanh(f3h)] Dd--(, h] - 2 [l-(tanh(f3h)] DdC h] \n\n1 \n\n\f258 \n\nA. C. C. COOLEN, S. N. LAUGHfON, D. SHERRINGTON \n\no .------,------.------.------.------.------~ \n\nE \n\n- .2 \n\n-.4 \n\n-.6 \n\n- .8 \n\n\"(cid:173)\n\n\"-\n\n'~ \n\n~------- -- - --\n\n_ 1 L-____ -L ______ L -____ ~ ______ ~ ____ ~ ______ ~ \n6 \n\n4 \n\no \n\n2 \n\nt \n\nFigure 3: Comparison of simulations (N = 8000, solid line), simple two-parameter \ntheory (RS stable: dotted line, RS unstable: dashed line) and advanced theory \n(solid line) , for the w = a (symmetric background) model, with Jo = 0, f3 = 00. \nNote that the two solid lines are almost on top of each other at the scale shown. \n\n\". \n\n0.5 \n\n0 .0 \n\n-0.5 \n\nE \n\n-0.5 \n\no \n\n2 \n\n4 \n\n6 \n\no \n\n2 \n\n4 \n\n6 \n\nt \n\nt \n\nFigure 4: Advanced theory versus N = 5600 simulations in the w = ~7r (asymmetric \nbackground) model, with f3 = 00 and J = 1. Solid: simulations; dotted: solving the \nRS diffusion equation. \n\n\fModem Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks \n\n259 \n\nwith (f(a,H))D = L:\". JdH D[a,H]J(a, H). All complications are concentrated in \nthe kernel A[C h; DJ, which is to be solved from a nontrivial set of equations emerg(cid:173)\ning from the replica formalism. Some results of solving these equations numerically \nare shown in figures 3 and 4 (for details of the calculations and more elaborate com(cid:173)\nparisons with simulations we refer to (Laughton, Coolen and Sherrington, 1995; \nCoolen, Laughton and Sherrington, 1995)). It is clear that the advanced theory \nquite convincingly describes the transients of the simulation experiments, including \nthe hitherto unexplained slowing down, for symmetric and nonsymmetric models. \n\n6 DISCUSSION \n\nIn this paper we have described novel techniques for studying the dynamics of re(cid:173)\ncurrent neural networks near saturation. The simplest two-parameter theory (exact \nfor t = 0, for t --+ 00 and for 0: --+ 0) , which employs as dynamic order parameters \nthe overlap with a pattern to be recalled and the total 'energy' per neuron, already \ndescribes quite accurately the location of the transients in the order parameter \nplane. The price paid for simplicity is that it overestimates the relaxation speed. \nA more advanced version of the theory, which describes the evolution of the joint \ndistribution for neuron states and post-synaptic potentials, is mathematically more \ninvolved, but predicts the dynamical data essentially perfectly, as far as present \napplications allow us conclude. Whether this latter version is either exact, or just \na very good approximation, still remains to be seen. \nIn this paper we have restricted ourselves to models with binary neural variables, \nfor reasons of simplicity. The theories generalise in a natural way to models with \nanalogue neurons (here, however, already the simple version will generally involve \norder parameter functions as opposed to a finite number of order parameters). \nOngoing work along these lines includes, for instance, the analysis of analogue and \nspherical attractor networks and networks of coupled oscillators near saturation. \n\nReferences \n\nB. Derrida, E. Gardner and A. Zippelius (1987), Europhys. Lett. 4: 167-173 \nA.C .C. Coolen and D. Sherrington (1993), in J.G. Taylor (ed.), Mathematical Ap(cid:173)\nproaches to Neural Networks, 293-305. Amsterdam: Elsevier. \nS. Amari and K. Maginu (1988), Neural Networks 1: 63-73 \n\nH. Nishimori and T. Ozeki (1993), J. Phys. A 26: 859-871 \nH. Rieger, M. Schreckenberg and J. Zittartz (1988), Z. Phys. B 72: 523-533 \nH. Horner, D. Bormann, M. Frick, H. Kinzelbach and A. Schmidt (1989), Z. Phys. \nB 76: 381-398 \nA.C.C. Coolen and D. Sherrington (1994), Phys. Rev. E 49(3): 1921-1934 \nH. Nishimori and T. Ozeki (1994), J . Phys. A 27: 7061-7068 \n\nA.C.C. Coolen and S. Franz (1994), J. Phys. A 27: 6947-9954 \n\nA.C.C. Coolen and D. Sherrington (1994), J. Phys. A 27: 7687-7707 \n\nS.N. Laughton and A.C.C. Coolen (1995), Phys. Rev. E 51: 2581-2599 \n\nS.N. Laughton, A.C.C. Coolen and D. Sherrington (1995), J. Phys. A (in press) \nA.C.C. Coolen, S.N . Laughton and D. Sherrington (1995), Phys. Rev. B (in press) \n\n\f", "award": [], "sourceid": 1049, "authors": [{"given_name": "A.C.C.", "family_name": "Coolen", "institution": null}, {"given_name": "S.", "family_name": "Laughton", "institution": null}, {"given_name": "D.", "family_name": "Sherrington", "institution": null}]}