{"title": "Phase Diagram and Storage Capacity of Sequence-Storing Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 211, "page_last": 217, "abstract": null, "full_text": "Phase Diagram and Storage Capacity of \n\nSequence Storing Neural Networks \n\nA. During \n\nDept. of Physics \nOxford University \nOxford OX 1 3NP \nUnited Kingdom \n\na.duringl @physics.oxford.ac .uk \n\nA. C. C. Coolen \n\nDept. of Mathematics \n\nKing 's College \n\nLondon WC2R 2LS \n\nUnited Kingdom \n\ntcoolen @mth.kc1.ac.uk \n\nD. Sherrington \nDept. of Physics \nOxford University \nOxford OX I 3NP \nUnited Kingdom \n\nd.sherrington I @physics.oxford.ac.uk \n\nAbstract \n\nWe solve the dynamics of Hopfield-type neural networks which store se(cid:173)\nquences of patterns, close to saturation. The asymmetry of the interaction \nmatrix in such models leads to violation of detailed balance, ruling out an \nequilibrium statistical mechanical analysis. Using generating functional \nmethods we derive exact closed equations for dynamical order parame(cid:173)\nters, viz. the sequence overlap and correlation and response functions. \nin the limit of an infinite system size. We calculate the time translation \ninvariant solutions of these equations. describing stationary limit-cycles. \nwhich leads to a phase diagram. The effective retarded self-interaction \nusually appearing in symmetric models is here found to vanish, which \ncauses a significantly enlarged storage capacity of eYe ~ 0.269. com(cid:173)\npared to eYe ~ 0.139 for Hopfield networks s~oring static patterns. Our \nresults are tested against extensive computer simulations and excellent \nagreement is found. \n\n\f212 \n\nA. Diiring, A. C. C. Coo/en and D. Sherrington \n\n1 INTRODUCTION AND DEFINITIONS \n\nWe consider a system of N neurons O'(t) = {ai(t) = \u00b11}, which can change their states \ncollectively at discrete times (parallel dynamics). Each neuron changes its state with a \nprobability Pi(t) = ~[l-tanh,Bai(t)[Lj Jijaj(t)+Oi(t)]], so that the transition matrix is \nW[o'(s + l)IO'(s)] = II e.BO',(s+l)[E;=l J, j O'} (s)+ o,( s)]-ln2cosh(i3[E; =1 J'JO'} ( s)+(J, (s ))) \n\nN \n\ni=l \n\nwith the (non-symmetric) interaction strengths Jij chosen as \n\np \n\n-\n\nJ \nij - N ~ \"'i \n\n1 \"'\" el'+l el' \n\"'j' \n\n(I) \n\n(2) \n\n1'=1 \n\nThe ~r represent components of an ordered sequence of patterns to be stored I. The gain \nparameter ,B can be interpreted as an inverse temperature governing the noise level in the \ndynamics (1) and the number of patterns is assumed to scale as N, i. e. P = aN. If \nthe interaction matrix would have been chosen symmetrically, the model would be acces(cid:173)\nsible to methods originally developed for the equilibrium statistical mechanical analysis \nof physical spin systems and related models [1 , 2], in particular the replica method. For \nthe nonsymmetric interaction matrix proposed here this is ruled out, and no exact solution \nexists to our knowledge, although both models have been first mentioned at the same time \nand an approximate solution compatible with the numerical evidence at the time has been \nprovided by Amari [3] . The difficulty for the analysis is that a system with the interactions \n(2) never reaches equilibrium in the thermodynamic sense, so that equilibrium methods \nare not applicable. One therefore has to apply dynamical methods and give a dynamical \nmeaning to the notion of the recall state. Consequently, we will for this paper employ the \ndynamical method of path integrals, pioneered for spin glasses by de Dominicis [4] and \napplied to the Hopfield model by Rieger et al. [5] . \n\nWe point out that our choice of parallel dynamics for the problem of sequence recall is \ndeliberate in that simple sequential dynamics will not lead to stable recall of a sequence. \nThis is due to the fact that the number of updates of a single neuron per time unit is not \na constant for sequential dynamics. Schemes for using delayed asymmetric interactions \ncombined with sequential updates have been proposed (see e. g. [6] for a review), but are \noutside the scope of this paper. \nOur analysis starts with the introduction of a generating functional Z[1jJ] of the form \n\nZ[1jJ] = L p[O'(O), . .. , O'(t)] e- i E .[m, k, q, Q, K] = ~ LIn [ L Pi(a(O)) J II [dh(S;:h(S)] \n\ni \n\nO'(O) .. . O'(t) \n\nst 2::.1 L \u2022.\u2022 , < 1 [u\" (s)Q(s,s' )u\" (s' )+u\" (s)K(s' ,s)v\" (s' )+v\" (s)K(s,s' )u\" (s' )+v\" (s)q(s,s' )v\" (s')] . \n\n(8) \n\nThe first of these expressions is just a result of the introduction of 6 functions, while the \nsecond will turn out to represent a probability measure given by the evolution of a single \nneuron under prescribed fields and the third reflects the disorder contribution to the local \nfields in that single neuron measure2\u2022 We have thus reduced the original problem involving \nN neurons in a one-step Markov process to one involving just a single neuron, but at the \ncost of introducing two-time observables. \n\n3 DERIVATION OF SADDLE POINT EQUATIONS \n\nThe integral in (5) will be dominated by saddle points, in our case by a unique saddle \npoint when causality is taken into account. Extremising the exponent with respect to all \noccurring variables gives a number of equations, the most important of which give the \nphysical meanings of three observables: q(s, S') = C(s, S'), K(s , S') = iG(s, s'), \n\nm(s) = lim N1 ' \" (at (s)~i \n\nN~oo \n\n6 \n\n(9) \n\nwith \n\n2We have assumed p(u(O)) = n, p,(a,(O)). \n\n(10) \n\n1 \n\n\u2022 \n\nG(s, s ) = hm N 6 \n\nN~oo \n\n1 '\" a(ai(s) \n\nae ( ') , \n\ni S \n\nt \n\n\fPhase Diagram and Storage Capacity of Sequence-Storing Neural Networks \n\n215 \n\nwhich are the single-site correlation and response functions, respectively. The overline . . . \nis taken to represent disorder averaged values. Using also additional equations arising from \nthe normalisation Z[O] = 1, we can rewrite the single neuron measure ell as \n\n(f[{u}])* = 2:: In [dh(S;:h(S)] p(a(O))J[{u} ]eLs< t [t30'(S+1 )h( s)- ln 2COSh (.L3 h (s)) ] \n\nO'o ... O'(t) s< t \n\n(11 ) \nwith the short-hand R = L:~o GtlCGl . To simplify notation, we have here assumed \nthat the initial probabilities Pi(ai(O )) are uniform and that the external fields Oi(S) are \nso-called staggered ones, i. e. Oi (s) = O~:+ 1, which makes the single neuron measure \nsite-independent. This single neuron measure (II) represents the essential result of our \ncalculations and is already properly normalised (i.e. (1) = 1). \n\n* \n\nWhen one compares the present form of the single neuron measure with that obtained for \nthe symmetric Hopfield network, One finds in the latter model an additional term which \ncorresponds to a retarded self-interaction. The absence of such a term here suggests that \nthe present model will have a higher storage capacity. It can be explained by the constant \nchange of state of a large number of neurons as the network goes through the sequence, \nwhich prevents the build-up of microscopic memory of past activations. \n\nHowever, as is the case for the standard Hopfield model, the measure (II) is still too com(cid:173)\nplicated to find explicit equations for the observables we are interested in. Although it is \npossible to evaluate the necessary integrals numerically, we instead concentrate on the inter(cid:173)\nesting behaviour when transients have died out and time-translation invariance is present. \n\n4 STATIONARY STATE \n\nWe will now concentrate on the behaviour of the network at the stage when transients have \nsubsided and the system is on a macroscopic limit cycle. Then the relations \n\nm(s) = m \n\nC(s , s') = C(s - s') \n\n(12) \nhold and also R(s , s') = R(s - s') . We can then for simplicity shift the time origin \nto = - 00 and the upper temporal bound to t = 00 . Note, however, that this state is not \nto be confused with microscopic equilibrium in the thermodynamic sense. The stationary \nversions of the measure (11) for the interesting observables are then given by the following \nexpressions (note that C(O) = 1): \n\nG(s , s') = C(s - s'). \n\nm = I II dV( S;:W(S) e ivw-!w .Rw tanh/3[m + 0 + Q! v(O)] \nC(T f= 0) = In dV(S~:w(S) e iv .w-!w .Rw x \n\ns \n\ns \n\ntanh B[m + 0 + Q~V(T)] tanh /3 [m + 0 + Q ~ V(O)] \n\nG( T) = (30,,1 [1 -J If dv(s~~w(s) e'vw- ;wRw tanh' (3 [m + B + ,, '\"(0) 1] \n\n(13) \n\nand we notice that the response function is nOw limited to a single time step, which again \nreflects the influence of the uncorrelated flips induced by the sequence recall. These equa(cid:173)\ntions can be sol ved by separating the persistent and fluctuating parts of C( T) and R( T), \n\n\f216 \n\nA. During, A. C. C. Coolen and D. Sherrington \n\nC(T) = q + C(T), \n\nR(T) = r + R(T), \n\nlim C(T) = lim R(T) = O. \nT=\u00b1OO \n\nT=\u00b1OO \n\nDoing so eventually leads us to the coupled equations \n\np = [1 - ,82(1 - q)2rl \nm = / Dz tanh,8[m + e + zv'aP] \nq = / Dz tanh2 ,8[m + e + zv'aP] \n\n(14) \n\n(15) \n\nq = / Dz [/ Dx tanh,8 [m + e + zJoqp + xV 0(1 - q)p] r (17) \n\n(16) \n\nNote that the three equations (14-16) form a closed set, from which the persistent corre(cid:173)\nlation q simply follows. \n\n5 PHASE DIAGRAM AND STORAGE CAPACITY \n\n1.0 \n\n0.8 \n\n0.6 \n\nT \n\n0.4 I \nr \n, \n0.2 I \nI-\n\n0.0 \n\n0.0 \n\np \n\nR \n\n0.1 \n\n0.2 \n\n0.3 \n\na \n\nFigure 1: Phase diagram of the sequence storage network, in which one finds two phases: \na recall phase (R), characterized by {m f:. 0, q > 0, ij > O}, and a paramagnetic phase \n(P), characterized by {m = 0, q = 0, q > O}. The solid line separating the two phases \nis the theoretical prediction for the (discontinuous) phase transition. The markers represent \nsimulation results, for systems of N = 10, 000 neurons measured after 2, 500 iteration \nsteps, and obtained by bisection in o. The precision in terms of 0 is at least 6.0 = 0.005 \n(indicated by error bars); the values for T are exact. \n\nThe coupled equations (14-17) can be solved numerically for e = 0 to find the area in \nthe o-T plane where solutions m f:. 0 -\ncorresponding to sequence recall- exist. The \nboundary of this area describes the storage capacity of the system. This theoretical curve \ncan then be compared with computer simulations directly performing the neural dynamics \n\n\fPhase Diagram and Storage Capacity ojSequence-Storing Neural Networks \n\n217 \n\ngiven by (I) and (2). We show the result of doing both in the same accompanying diagram . \nWe find that there are only two types of solutions, namely a recall phase R where m f:. 0 \nand q f:. 0, and a paramagnetic phase where m = q = O. Unlike the standard Hopfield \nmodel, the present model does not have a spin glass phase with m = a and q f:. O. The \nagreement between simulations (done here for N = la , 000 neurons) and theoretical re(cid:173)\nsults is excellent and separate simulations of systems with up to N = 50, 000 neurons to \nassess finite size effects confirm that the numerical data are reliable. \n\n6 DISCUSSION \n\nIn this paper, we have used path integral methods to solve in the infinite system size limit \nthe dynamics of a non-symmetric neural network model, designed to store and recall a se(cid:173)\nquence of patterns, close to saturation. This model has been known for over a decade from \nnumerical simulations to possess a storage capacity roughly twice that of the symmetric \nHopfield model , but no rigorous analytic results were available. We find here that in con(cid:173)\ntrast to equilibrium statistical mechanical methods, which do not apply due to the absence \nof detailed balance, the powerful path integral formalism provides us with a solution and \na transparent explanation of the increased storage capacity. It turns out that this higher ca(cid:173)\npacity is due to the absence of a retarded self-interaction, viz. the absence of microscopic \nmemory of activations. \n\nThe theoretically obtained phase diagram can be compared to the results of numerical sim(cid:173)\nulations and we find excellent agreement. Our confidence in this agreement is supported \nby additional simulations to study the effect of finite size scaling. Full details of the calcu(cid:173)\nlations will be presented elsewhere [7] . \n\nReferences \n\n[I] Sherrington D and Kirkpatrick S 1975 Phys. Rev. Lett. 35 1972 \n[2] Amit D J, Gutfreund H, and Sompolinsky H 1985 Phys. Rev. Lett. 55 1530 \n[3] Amari Sand Maginu K 1988 Neural Networks 1 63 \n[4] de Dominicis G 1978 Phys. Rev. B 184913 \n[5] Rieger H, Schreckenberg M, and Zittartz J 1988 J. Phys. A: Math. Gen. 21 L263 \n[6] Kuhn R and van Hemmen J L 1991 Temporal Association ed E Domany, J L van \n\nHemmen , and K Schulten (Berlin, Heidelberg: Springer) p 213 \n\n[7] During A, Coolen A C C, and Sherrington D 1998 J. Phys. A: Math. Gen. 31 8607 \n\n\f", "award": [], "sourceid": 1587, "authors": [{"given_name": "A.", "family_name": "D\u00fcring", "institution": null}, {"given_name": "Anthony", "family_name": "Coolen", "institution": null}, {"given_name": "D.", "family_name": "Sherrington", "institution": null}]}