{"title": "Statistical Mechanics of Temporal Association in Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 176, "page_last": 182, "abstract": null, "full_text": "Statistical Mechanics of Temporal Association \nin Neural Networks with Delayed Interactions \n\nAndreas V.M. Herz \nDivision of Chemistry \nCaltech 139-74 \nPasadena, CA 91125 \n\nZhaoping Li \n\nSchool of Natural Sciences \n\nInstitute for Advanced Study \n\nPrinceton, N J 08540 \n\nJ. Leo van Hemmen \nPhysik-Department \nder TU M iinchen \nD-8046 Garching, FRG \n\nAbstract \n\nWe study the representation of static patterns and temporal associa(cid:173)\ntions in neural networks with a broad distribution of signal delays. \nFor a certain class of such systems, a simple intuitive understanding \nof the spatia-temporal computation becomes possible with the help \nof a novel Lyapunov functional. It allows a quantitative study of \nthe asymptotic network behavior through a statistical mechanical \nanalysis. We present analytic calculations of both retrieval quality \nand storage capacity and compare them with simulation results. \n\n1 INTRODUCTION \n\nBasic computational functions of associative neural structures may be analytically \nstudied within the framework of attractor neural networks where static patterns are \nstored as stable fixed-points for the system's dynamics. If the interactions between \nsingle neurons are instantaneous and mediated by symmetric couplings, there is a \nLyapunov function for the retrieval dynamics (Hopfield 1982). The global compu(cid:173)\ntation corresponds in that case to a downhill motion in an energy landscape created \nby the stored information. Methods of equilibrium statistical mechanics may be ap(cid:173)\nplied and permit a quantitative analysis of the asymptotic network behavior (Amit \net al. 1985, 1987). The existence of a Lyapunov function is thus of great con(cid:173)\nceptual as well as technical importance. Nevertheless, one should be aware that \nenvironmental inputs to a neural net always provide information in both space and \ntime. It is therefore desirable to extend the original Hopfield scheme and to explore \npossibilities for a joint representation of static patterns and temporal associations. \n\n176 \n\n\fStatistical Mechanics of Temporal Association in Neural Networks \n\n177 \n\nSignal delays are omnipresent in the brain and play an important role in biolog(cid:173)\nical information processing. Their incorporation into theoretical models seems to \nbe rather convincing, especially if one includes the distribution of the delay times \ninvolved. Kleinfeld (1986) and Sompolinsky and Kanter (1986) proposed models \nfor temporal associations, but they only used a single delay line between two neu(cid:173)\nrons. Tank and Hopfield (1987) presented a feedforward architecture for sequence \nrecognition based on multiple delays, but they just considered information relative \nto the very end of a given sequence. Besides these deficiences, both approaches lack \nthe quality to acquire knowledge through a true learning mechanism: Synaptic ef(cid:173)\nficacies have to be calculated by hand which is certainly not satisfactory both from \na neurobiological point of view and also for applications in artificial intelligence. \nThis drawback has been overcome by a careful interpretation of the Hebb principle \n(1949) for neural networks with a broad distribution of transmission delays (Herz \net al. 1988, 1989). After the system has been taught stationary patterns and \ntemporal sequences -\nit reproduces them with high \nprecission when triggered suitably. In the present contribution, we focus on a special \nclass of such delay networks and introduce a Lyapunov (energy) functional for the \ndeterministic retrieval dynamics (Li and Herz 1990). We thus generalize Hopfield's \napproach to the domain of temporal associations. Through an extension of the usual \nformalism of equilibrium statistical mechanics to time-dependent phenomena, we \nanalyze the network performance under a stochastic (noisy) dynamics. We derive \nquantitative results on both the retrieval quality and storage capacity, and close \nwith some remarks on possible generalizations of this approach. \n\nby the same principle ! -\n\n2 DYNAMICS OF THE NEURONS \n\nThroughout what follows, we describe a neural network as a collection of N two(cid:173)\nstate neurons with activities Si = 1 for a firing cell and Si = -1 for a quiescent \none. The cells are connected by synapses with modifiable efficacies Jij(r). Here r \ndenotes the delay for the information transport from j to i. We focus on a soliton(cid:173)\nlike propagation of neural signals, characteristic for the (axonal) transmission of \naction potentials, and consider a model where each pair of neurons is linked by \nseveral axons with delays 0 ~ r < rmax. Other architectures with only a single \nlink have been considered elsewhere (Coolen and Gielen 1988; Herz et al. 1988, \n1989; Kerszberg and Zippelius 1990). External stimuli are fed into the system via \nreceptors Ui =\u00b11 with input sensitivity 'Y. The postsynaptic potentials are given by \n\nhi(t) = (1- 'Y) L L Jij(r)Sj(t - r) + 'YUi(t) . \n\nN T m .. x \n\n(1) \n\nj=l T=O \n\nWe concentrate on synchronous dynamics (Little 1974) with basic time step ~t = \n1. Consequently, signal delays take nonnegative integer values. Synaptic noise is \ndescribed by a stochastic Glauber dynamics with noise level (3=T- 1 (Peretto 1984), \n\nProb[Si(t + 1) = \u00b11] = 'i{1 \u00b1 tanh ((3hi(t)]} , \n\n1 \n\n(2) \n\nwhere Prob denotes probability. For (3-00, we arrive at a deterministic dynamics, \n\nSi(t + 1) = sgn[hi(t)] = -1, \n\n_ { 1, \n\nif hi (t) > 0 \nif hi(t) < 0 . \n\n(3) \n\n\f178 \n\nHerz, Li, and van Hemmen \n\n3 HEBBIAN LEARNING \n\nDuring a learning session the synaptic strengths may change according to the Hebb \nprinciple (1949). We focus on a connection with delay r between neurons i and j. \nAccording to Hebb, the corresponding efficacy Jij(r) will be increased if cell j takes \npart in firing cell i. In its physiological context, this rule was originaly formulated \nfor excitatory synapses only, but for simplicity, we apply it to all synapses. \nDue to the delay r in (1) and the parallel dynamics (2), it takes r+ 1 time steps until \nneuron j actually influences the state of neuron i. Jij(r) thus changes by an amount \nproportional to the product of Sj(t-r) and Si(t+l). Starting with Jij(r) = 0, we \nobtain after P learning sessions, labeled by I-' and each of duration D/1' \n\nJij(r) = e:(r)N- 1 2: 2: Si(t/1 + I)Sj(t/1 - r) = e:(r)iij(r) . \n\nP D\", \n\n/1=1 t\", =1 \n\n(4) \n\nThe parameters e:( r), normalized by L;~o~ e:( r) = 1, take morphological character(cid:173)\nistics of the delay lines into account; N- 1 is a scaling factor useful for the theoretical \nanalysis. By (4), synapses act as microscopic feature detectors during the learning \nsessions and store correlations of the taught sequences in both space (i, j) and time \n(r). In general, they will be asymmetric in the sense that Jij(r) :/; Jji(r). \nDuring learning, we set T = 0 and 'Y = 1 to achieve a \"clamped learning scenario\" \nwhere the system evolves strictly according to the external stimuli, Si(t/1) = O\"i(t/1-1). \nWe study the case where all input sequences O\"i(t/1) are cyclic with equal periods \nD/1 = D, i.e., O\"i(t/1) = O\"i(t/1 \u00b1 D) for alII-'. In passing we note that one should offer \nthe sequences already rmax time steps before allowing synaptic plasticity a la (4) so \nthat both Si and Sj are in well defined states during the actual learning sessions. \nWe define patterns erao by erao - O\"i(t/1 = a) for 0 < a < D and get \n\nJij ( r) = e:( r)N- 1 2: 2: er.~+l ef.~-T . \n\nP D-1 \n\n/1=1 a=O \n\n(5) \n\nOur learning scheme is thus a generalization of outer-product rules to spatio(cid:173)\ntemporal patterns. As in the following, temporal arguments of the sequence pattern \nstates e and the synaptic couplings should always be understood modulo D. \n4 LYAPUNOV FUNCTIONAL \n\nUsing formulae (1)-(5), one may derive equations of motion for macroscopic order \nparameters (Herz et al. 1988, 1989) but this kind of analysis only applies to the case \nP ~ 10gN. However, note that from (4) and (5), we get iij(r) = iji(D - (2 + r)). \nFor all networks whose a priori weights e:( r) obey e:( r) = e:(D - (2 + r)) we have \nthus found an \"extended synaptic symmetry\" (Li and Herz 1990), \n\ngeneralizing Hopfield's symmetry assumption Jij = Jji in a natural way to the \ntemporal domain. To establish a Lyapunov functional for the noiseless retrieval \n\n(6) \n\n\fStatistical Mechanics of Temporal Association in Neural Networks \n\n179 \n\ndynamics (3), we take ,=0 in (1) and define \n\nH(t) = -- 2: 2: Jij(T)Si(t - a)Sj(t - (a + T + 1)%D) , \n\n(7) \n\n1 N D-l \n\u00b0 \n\n',]= a,'T= \n\n2 . . 1 \n\nwhere a%b = a mod b. The functional H depends on aI/states between t+l-D and \n\nt so that solutions with constant H, like D-periodic cycles, need not be static fixed \npoints of the dynamics. By (1), (5) and (6), the difference AH(t)=H(t)-H(t-l) \n}s \n\nP D-I N \n\nI \n\nN \n\n. \nI=} \n\n2N \n\n,= \n. 1 \n\n1'= a=O \n\nAH(t)= -2: [Si(t)-Si(t-D)]h,(t-l)- e;(D-l) 2: 2: {2:erao[Si(t)-Si(t-D)]}2 . \n(8) \nThe dynamics (3) implies that the first term is nonpositive. Since e;( T) > 0, the same \nholds true for the second one. For finite N, H is bounded and AH has to vanish \nas t-oo. The system therefore settles into a state with Si(t)=Si(t-D) for all i. \nWe have thus exposed two important facts: (a) the retrieval dynamics is governed \nby a Lyapunov functional, and (b) the system relaxes to a static state or a limit \ncycle with Si(t)=Si(t - D) - oscillatory solutions with the same period as that of \nthe taught cycles or a period which is equal to an integer fraction of D. \nStepping back for an overview, we notice that H is a Lyapunov functional for all \nnetworks which exhibit an \"extended synaptic symmetry\" (6) and for which the \nmatrix J(D - 1) is positive semi-definite . The Hebbian synapses (4) constitute an \nimportant special case and will be the main subject of our further discussion. \n\n5 STATISTICAL MECHANICS \n\nWe now prove that a limit cycle of the retrieval dynamics indeed resembles a stored \nsequence. We proceed in two steps. First, we demonstrate that our task concerning \ncyclic temporal associations can be mapped onto a symmetric network without \ndelays. Second, we apply equilibrium statistical mechanics to study such \"equivalent \nsystems\" and derive analytic results for the retrieval quality and storage capacity. \nD-periodic oscillatory solutions ofthe retrieval dynamics can be interpreted as static \nstates in a \"D-plicated\" system with D columns and N rows of cells with activities \n\nSia. A network state will be written A = (Ao, AI' ... ' AD-d with Aa = {Sia; 1 ::; \ni < N}. To reproduce the parallel dynamics of the original system, neurons Sia \nwith a = t%D are updated at time t. The time evolution of the new network \ntherefore has a pseudo-sequential characteristic: synchronous within single columns \nactivities at time t are given by Sia(t) = SiCa + n,) for a ~ t%D and Sia(t) = \nand sequentially ordered with respect to these columns. Accordingly, the neural \nSiCa + n, - D) for a>t%D, where n, is defined through t=nt+t%D. Due to (6), \nsymmetric efficacies Jljb = Jft may be contructed for the new system by \n\nJ:/ = Jij (b - a - 1)%D) , \n\n(9) \n\nallowing a well-defined Hamiltonian, equal to that of a Hopfield net of size N D, \n\n1 N D-l \n\nH = -- 2: 2: Jt/ SiaSjb . \n\n2 i,;=l a,b=O \n\n(10) \n\n\f180 \n\nHerz, Li, and van Hemmen \n\nAn evaluation of (10) in terms of the former state variables reveals that it is identical \nto the Lyapunov functional (7). The interpretation, however, is changed: a limit \ncycle of period D in the original network corresponds to a fixed-point of the new \nsystem of size ND. We have thus shown that the time evolution of a delay network \nwith extended symmetry can be understood in terms of a downhill motion in the \nenergy landscape of its \"equivalent system\" . \nFor Hebbian couplings (5), the new efficacies Jf/ take a particularly simple form if \nwe define patterns {~raa; 1 ~ i ~ N, 0 ~ a O. A detailed analysis of the case where the \nnumber of cycles remains bounded as N -+ 00 can be found in (Li and Herz 1990). \nAs in the replica-symmetric theory of Amit et al. (1987), we assume that the network \nis in a state highly correlated with a finite number of stored cycles. The remaining, \nextensively many cycles are described as a noise term. We define \"partial\" overlaps \nby m~a = N-l Li ~raa Sia. These macroscopic order parameters measure how close \nthe system is to a stored pattern ~lJa at a specific column a. We consider retrieval \n\n\fStatistical Mechanics of Temporal Association in Neural Networks \n\n181 \n\nsolutions, i.e., m~a = mllba,o, and arrive at the fixed-point equations (Li and Herz \n1990) \n\n(14) \n\nwhere \n\nand q = \u00ab tanh 2[,8{L: mllC D +..;c;;z }])). (15) \n\nDouble angular brackets represent an average with respect to both the \"condensed\" \ncycles and the normalized Gaussian random variable z. The Ak(\u00a3) are eigenvalues \nof the matrix \u00a3. Retrieval is possible when solutions with mil > 0 for a single cycle \nJ.L exist, and the storage capacity a c is reached when such solutions cease to exist. It \nshould be noted that each cycle consists of D patterns so that the storage capacity \nfor single patterns is ac = Da c \u2022 During the recognition process, however, each of \nthem will trigger the cycle it belongs to and cannot be retrieved as a static pattern. \nFor systems with a \"maximally uniform\" distribution, \u00a3ab = (D_1)-l(1- bab), we \nget \n\nD \n\n2 \n0.100 \n\n3 \n0.110 \n\n4 \n0.116 \n\n5 \n0.120 \n\n00 \n\n0.138 \n\nwhere the last result is identical to that for the corresponding Hopfield model since \nthe diagonal terms of \u00a3 can be neglected in that case. The above findings agree \nwell with estimates from a finite-size analysis (N ~ 3000) of data from numerical \nsimulations as shown by two examples. For D = 3, we have found a c = 0.120 \u00b1 0.015, \nfor D=4, a c =0.125\u00b10.0l5. Our results demonstrate that the storage capacity for \ntemporal associations is comparable to that for static memories. As an example, \ntake D = 2, i.e., the Little model. In the limit of large N, we see that 0.100 \u00b7 N \ntwa-cycles of the form erDD ~ erlo may be recalled as compared to 0.138\u00b7 N static \npatterns (Fontanari and Koberle 1987); this leads to an 1.45-fold increase of the \ninformation content per synapse. \nThe influence of the weight distribution on the network behavior may be demon(cid:173)\nstrated by some choices of g( T) for D = 4: \n\nT \ng(T) \ng(T) \ng(T) \n\n1 \n\n0 \n2 3 \n1/3 1/3 1/3 0 \n1/2 0 1/2 0 \n0 0 \n0 \n\n1 \n\na c \n0.116 \n0.100 \n0.050 \n\nffic \n0.96 \n0.93 \n0.93 \n\nThe storage capacity decreases with decreasing number of delay lines, but measured \nper synapse, it does increase. However, networks with only a few number of delays \nare less fault-tolerant as known from numerical simulations (Herz et al. 1989). For \nall studied architectures, retrieved sequences contain less than 3.5% errors. \nOur results prove that an extensive number of temporal associations can be stored \nas spatia-temporal attractors for the retrieval dynamics. They also indicate that \ndynamical systems with delayed interactions can be programmed in a very efficient \nmanner to perform associative computations in the space-time domain. \n\n\f182 \n\nHerz, Li, and van Hemmen \n\n6 CONCLUSION \n\nLearning schemes can be successful only if the structure of the learning task is \ncompatible with both the network architecture and the learning algorithm. In the \npresent context, the task is to store simple temporal associations. It can be accom(cid:173)\nplished in neural networks with a broad distribution of signal delays and Hebbian \nsynapses which, during learning periods, operate as microscopic feature detectors \nfor spatio-temporal correlations within the external stimuli. The retrieval dynamics \nutilizes the very same delays and synapses, and is therefore rather robust as shown \nby numerical simulations and a statistical mechanical analysis. \nOur approach may be generalized in various directions. For example, one can investi(cid:173)\ngate more sophisticated learning rules or switch to continuous neurons in \"iterated(cid:173)\nmap networks\" (Marcus and Westervelt 1990). A generalization of the Lyapunov \nfunctional (7) covers that case as well (Herz, to be published) and allows a direct \ncomparison of theoretical predictions with results from hardware implementations. \nFinally, one could try to develop a Lyapunov functional for a continuous-time dy(cid:173)\nnamics with delays which seems to be rather significant for applications as well as \nfor the general theory of functional differential equations and dynamical systems. \n\nAcknowledgements \n\nIt is a pleasure to thank Bernhard Sulzer, John Hopfield, Reimer Kuhn and Wulf(cid:173)\nram Gerstner for many helpful discussions. AVMH acknowledges support from the \nStudienstiftung des Deutschen Volkes. ZL is partly supported by a grant from the \nSeaver Institute. \n\nReferences \n\n1989 Bioi. Cybern. 60 457 \n\n1987 Ann. Phys. (N. Y.) 173 30 \n\nAmit D J, Gutfreund Hand Sompolinsky H 1985 Phys. Rev. A 32 1007 \n-\nCool en A C C and Gielen C CAM 1988 Europhys. Lett. 7 281 \nFontanari J F and Koberle R 1987 Phys. Rev. A 36 2475 \nHebb D 0 1949 The Organization of Behavior Wiley, New York \nvan Hemmen J L 1986 Phys. Rev. A 34 3435 \nHerz A V M, Sulzer B, Kuhn R and van Hemmen J L 1988 Europhys. Lett. 7 663 \n-\nHopfield J J 1982 Proc. Natl. Acad. Sci. USA 79 2554 \nKerszberg M and Zippelius A 1990 Phys. Scr. T33 54 \nKleinfeld D 1986 Proc. Natl. Acad. Sci. USA 83 9469 \nLi Z and Herz A V M 1990 inLectureNotes in Physics 368 pp287 Springer ,Heidelberg \nLittle W A 1974 Math. Biosci. 19 101 \nMarcus C M and Westervelt R M 1990 Phys. Rev. A 42 2410 \nPeretto P 1984 Bioi. Cybern. 50 51 \nSompolinsky H and Kanter I 1986 Phys. Rev. Lett. 57 2861 \nTank D Wand Hopfield J J 1987 Proc. Natl. Acad. Sci. USA 84 1896 \n\n\f", "award": [], "sourceid": 374, "authors": [{"given_name": "Andreas", "family_name": "Herz", "institution": null}, {"given_name": "Zhaoping", "family_name": "Li", "institution": null}, {"given_name": "J.", "family_name": "van Hemmen", "institution": null}]}