{"title": "Minimax and Hamiltonian Dynamics of Excitatory-Inhibitory Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 329, "page_last": 335, "abstract": null, "full_text": "Minimax and Hamiltonian Dynamics of \n\nExcitatory-Inhibitory Networks \n\nH. S. Seung, T. J. Richardson \n\nBell Labs, Lucent Technologies \n\nMurray Hill, NJ 07974 \n\n{seungltjr}~bell-labs.com \n\nJ. C. Lagarias \n\nAT&T Labs-Research \n180 Park Ave. D-130 \n\nFlorham Park, NJ 07932 \njcl~research.att.com \n\nJ. J. Hopfield \n\nDept. of Molecular Biology \n\nPrinceton University \nPrinceton, N J 08544 \n\njhopfield~vatson.princeton.edu \n\nAbstract \n\nA Lyapunov function for excitatory-inhibitory networks is constructed. \nThe construction assumes symmetric interactions within excitatory and \ninhibitory populations of neurons, and antisymmetric interactions be(cid:173)\ntween populations. The Lyapunov function yields sufficient conditions \nfor the global asymptotic stability of fixed points. If these conditions \nare violated, limit cycles may be stable. The relations of the Lyapunov \nfunction to optimization theory and classical mechanics are revealed by \nminimax and dissipative Hamiltonian forms of the network dynamics. \n\nThe dynamics of a neural network with symmetric interactions provably converges to \nfixed points under very general assumptions[l, 2]. This mathematical result helped \nto establish the paradigm of neural computation with fixed point attractors[3]. But \nin reality, interactions between neurons in the brain are asymmetric. Furthermore, \nthe dynamical behaviors seen in the brain are not confined to fixed point attractors, \nbut also include oscillations and complex nonperiodic behavior. These other types \nof dynamics can be realized by asymmetric networks, and may be useful for neural \ncomputation. For these reasons, it is important to understand the global behavior \nof asymmetric neural networks. \nThe interaction between an excitatory neuron and an inhibitory neuron is clearly \nasymmetric. Here we consider a class of networks that incorporates this fundamen(cid:173)\ntal asymmetry of the brain's microcircuitry. Networks of this class have distinct \npopulations of excitatory and inhibitory neurons, with antisymmetric interactions \n\n\f330 \n\nH. S. Seung, T. 1. Richardson, J. C. Lagarias and 1. 1. Hopfield \n\nbetween populations and symmetric interactions within each population. Such net(cid:173)\nworks display a rich repertoire of dynamical behaviors including fixed points, limit \ncycles[4, 5] and traveling waves[6]. \nAfter defining the class of excitatory-inhibitory networks, we introduce a Lyapunov \nfunction that establishes sufficient conditions for the global asymptotic stability \nof fixed points. The generality of these conditions contrasts with the restricted \nnature of previous convergence results, which applied only to linear networks[5]' or \nto nonlinear networks with infinitely fast inhibition[7]. \n\nThe use of the Lyapunov function is illustrated with a competitive or winner-take-all \nnetwork, which consists of an excitatory population of neurons with recurrent inhi(cid:173)\nbition from a single neuron[8]. For this network, the sufficient conditions for global \nstability of fixed points also happen to be necessary conditions. In other words, \nwe have proved global stability over the largest possible parameter regime in which \nit holds, demonstrating the power of the Lyapunov function. There exists another \nparameter regime in which numerical simulations display limit cycle oscillations[7]. \n\nSimilar convergence proofs for other excitatory-inhibitory networks may be obtained \nby tedious but straightforward calculations. All the necessary tools are given in the \nfirst half of the paper. But the rest of the paper explains what makes the Lyapunov \nfunction especially interesting, beyond the convergence results it yields: its role in \na conceptual framework that relates excitatory-inhibitory networks to optimization \ntheory and classical mechanics. \n\nThe connection between neural networks and optimization[3] was established by \nproofs that symmetric networks could find minima of objective functions[l, 2]. Later \nit was discovered that excitatory-inhibitory networks could perform the minimax \ncomputation of finding saddle points[9, 10, 11], though no general proof of this was \ngiven at the time. Our Lyapunov function finally supplies such a proof, and one of \nits components is the objective function of the network's minimax computation. \n\nOur Lyapunov function can also be obtained by writing the dynamics of excitatory(cid:173)\ninhibitory networks in Hamiltonian form, with extra velocity-dependent terms. If \nthese extra terms are dissipative, then the energy of the system is nonincreasing, \nand is a Lyapunov function. If the extra terms are not purely dissipative, limit \ncycles are possible. Previous Hamiltonian formalisms for neural networks made \nthe more restrictive assumption of purely antisymmetric interactions, and did not \ninclude the effect of dissipation[12]. \n\nThis paper establishes sufficient conditions for global asymptotic stability of fixed \npoints. The problem of finding sufficient conditions for oscillatory and chaotic \nbehavior remains open. The perspectives of minimax and Hamiltonian dynamics \nmay help in this task. \n\n1 EXCITATORY-INHIBITORY NETWORKS \n\nThe dynamics of an excitatory-inhibitory network is defined by \n\nf(u+Ax-By) , \nTxX+X \nTyY+y = g(v+BTx-Cy). \n\n(1) \n(2) \n\nThe state variables are contained in two vectors x E Rm and y E Rn, which represent \nthe activities of the excitatory and inhibitory neurons, respectively. \nThe symbol f is used in both scalar and vector contexts. The scalar function \nf : R ~ R is monotonic nondecreasing. The vector function f : Rm ~ Rm is \n\n\fMinimax and Hamiltonian Dynamics of Excitatory-Inhibitory Networks \n\n331 \n\ndefined by applying the scalar function 1 to each component of a vector argument, \ni.e., l(x) = (J(xt) , ... ,1(xm)). The symbol 9 is used similarly. \n\nThe symmetry of interaction within each population is imposed by the constraints \nA = AT and C = CT. The antisymmetry of interaction between populations is \nmanifest in the occurrence of - B and BT in the equations. The terms \"excitatory\" \nand \"inhibitory\" are appropriate with the additional constraint that the entries of \nmatrices A, B, and C are nonnegative. Though this assumption makes sense in \na neurobiological context the mathematics does not depends on it. The constant \nvectors u and v represent tonic input from external sources, or alternatively bias \nintrinsic to the neurons. \nThe time constants Tz and Ty set the speed of excitatory and inhibitory synapses, \nIn the limit of infinitely fast inhibition, Ty = 0, the convergence \nrespectively. \ntheorems for symmetric networks are applicable[l, 2], though some effort is required \nin applying them to the case C =/; 0. If the dynamics converges for Ty = 0, then \nthere exists some neighborhood of zero in which it still converges[7]. Our Lyapunov \nfunction goes further, as it is valid for more general T y \u2022 \n\nThe potential for oscillatory behavior in excitatory-inhibitory networks like (1) has \nlong been known[4, 7]. The origin of oscillations can be understood from a simple \ntwo neuron model. Suppose that neuron 1 excites neuron 2, and receives inhibition \nback from neuron 2. Then the effect is that neuron 1 suppresses its own activity \nwith an effective delay that depends on the time constant of inhibition. If this delay \nis long enough, oscillations result. However, these oscillations will die down to a \nfixed point, as the inhibition tends to dampen activity in the circuit. Only if neuron \n1 also excites itself can the oscillations become sustained. \n\nTherefore, whether oscillations are damped or sustained depends on the choice of \nparameters. In this paper we establish sufficient conditions for the global stability of \nfixed points in (1). The violation of these sufficient conditions indicates parameter \nregimes in which there may be other types of asymptotic behavior, such as limit \ncycles. \n\n2 LYAPUNOV FUNCTION \n\nWe will assume that 1 and 9 are smooth and that their inverses 1-1 and g-1 exist. \nIf the function 1 is bounded above and/or below, then its inverse 1-1 is defined on \nthe appropriate subinterval of R. Note that the set of (x, y) lying in the range of \n(J,g) is a positive invariant set under (1) and that its closure is a global attractor \nfor the system. \nThe scalar function F is defined as the antiderivative of 1, and P as the Legendre \nmaxp{px - F(p)}. The derivatives of these conjugate convex \ntransform P(x) \nfunctions are, \n\nF'(x) = l(x) , \n\n(3) \n\nThe vector versions of these functions are defined componentwise, as in the definition \nof the vector version of 1. The conjugate convex pair G, (; is defined similarly. \nThe Lyapunov function requires generalizations of the standard kinetic energies \nTzx2/2 and Tyy2/2. These are constructed using the functions ~ : Rm x Rm ~ R \nand r : Rn x Rn ~ R, defined by \n= \n\n~(p,x) \nr(q,y) \n\nITF(p) -xTp+lTP(x) , \nITG(q) _yTq+ IT(;(y) . \n\n(4) \n(5) \n\n\f332 \n\nH. S. Seung, T. 1. Richardson, J. C. Lagarias and J. J. Hopfield \n\nThe components of the vector 1 are all ones; its dimensionality should be clear \nfrom context. The function ~(p, x) is lower bounded by zero, and vanishes on \nthe manifold I(p) = x, by the definition of the Legendre transform. Setting p = \nU + Ax - By, we obtain the generalized kinetic energy T;l~(u + Ax - By, x), which \nvanishes when x = 0 and is positive otherwise. It reduces to T;xx 2 /2 in the special \ncase where I is the identity function. \nTo construct the Lyapunov function, a multiple of the saddle function \n\nS = _uT x - !xT Ax + vT Y - !yTCy + ITP(x) + yTBT x - ITG(y) \n\n(6) \n\n2 \n\n2 \n\nis added to the kinetic energy. The reason for the name \"saddle function\" will be \nexplained later. Then \n\nL = T;l~(U + Ax - By,x) + T;lr(v + BT x - Cy, y) + rS \n\n(7) \nis a Lyapunov function provided that it is lower bounded, nonincreasing, and t only \nvanishes at fixed points of the dynamics. Roughly speaking, this is enough to prove \nthe global asymptotic stability of fixed points, although some additional technical \ndetails may be involved. \n\nIn the next section, the Lyapunov function will be applied to an example network, \nyielding sufficient conditions for the global asymptotic stability of fixed points. \nIn this particular network, the sufficient conditions also happen to be necessary \nconditions. Therefore the Lyapunov function succeeds in delineating the largest \npossible parameter regime in which point attractors are globally stable. Of course, \nthere is no guarantee of this in general, but the power of the Lyapunov function is \nmanifest in this instance. \nBefore proceeding to the example network, we pause to state some general conditions \nfor L to be nonincreasing. A lengthy but straightforward calculation shows that \nthe time derivative of L is given by \n\nt = xT Ax - iJTCiJ \n\n_(T;l + r)j;T(J-l (T;xX + x) - I-I (x)J \n_(T;l - r)iJT[g-l(TyiJ + y) - g-l(y)J . \n\nTherefore, L is nonincreasing provided that \n(a-b)TA(a-b) \n\nmax ( \na,b a - b) \n\nT [ \n\nI-l(a) - I-l(b)] \n\n(a - b)TC(a - b) \n\n. \nmm \na,b (a - b) g-l(a) - g-l(b)] \n\nT[ \n\n< 1 + rT z , \n\n> 1 - rTy . \n\n(8) \n\n(9) \n\n(10) \n\nThe quotients in these inequalities are generalizations of the Rayleigh-Ritz ratios of \nA and C. If I and 9 were linear, the left hand sides of these inequalities would be \nequal to the maximum eigenvalue of A and the minimum eigenvalue of C. \n\n3 AN EXAMPLE: COMPETITIVE NETWORK \n\nThe competitive or winner-take-all network is a classic example of an excitatory(cid:173)\ninhibitory network[8, 7J . \nIts population of excitatory neurons Xi receives self(cid:173)\nfeedback of strength a and recurrent feedback from a single inhibitory neuron y, \n\nTzii + Xi \n\nI(Ui + aXi - y) , \n\nT.Y + y = 9 ( ~>i) . \n\n(11) \n\n(12) \n\n\fMinimax and Hamiltonian Dynamics of Excitatory-Inhibitory Networks \n\n333 \n\nThis is a special case of (1), with A = aI, B = 1, and C = o. \nThe global inhibitory neuron mediates a competitive interaction between the exci(cid:173)\ntatory neurons. If the competition is very strong, a single excitatory neuron \"wins,\" \nshutting off all the rest. If the competition is weak, more than one excitatory neuron \ncan win, usually those corresponding to the larger Ui. Depending on the choice of f \nand g, self-feedback a, and time scales Tx and Ty, this network exhibits a variety of \ndynamical behaviors, including a single point attractor, multiple point attractors, \nand limit cycles[5, 7]. \nWe will consider the specific case where f and 9 are the rectification nonlinearity \n[x]+ == max{ x, o}. The behavior ofthis network will be described in detail elsewhere; \nonly a brief summary is given here. With either of two convenient choices for r, \nr = T;1 or r = a - T;1, it can be shown that the resulting L is bounded below \nfor a < 2 and nonincreasing for a < T;1 + T;1. These are sufficient conditions for \nthe global stability of fixed points. They also turn out to be necessary conditions, \nas it can be verified that the fixed points are locally unstable if the conditions are \nviolated. The behaviors in the parameter regime defined by these conditions can \nbe divided into two rough categories. For a < 1, there is a unique point attractor, \nat which more than one excitatory neuron can be active, in a soft form of winner(cid:173)\ntake-all. For a > 1, more than one point attractor may exist. Only one excitatory \nneuron is active at each of these fixed points, a hard form of winner-take-all. \n\n4 MINIMAX DYNAMICS \n\nIn the field of optimization, gradient descent-ascent is a standard method for finding \nsaddle points of an objective function. This section of the paper explains the close \nrelationship between gradient descent-ascent and excitatory-inhibitory networks[9, \n10]. Furthermore, it reviews existing results on the convergence of gradient descent(cid:173)\nascent to saddle points[13, 10], which are the precedents of the convergence proofs \nof this paper. \nThe similarity of excitatory-inhibitory networks to gradient descent-ascent can be \nseen by comparing the partial derivatives of the saddle function (6) to the velocities \nx and ii, \n\nas \n- ax \nas \nay \n\n(13) \n\n(14) \n\nThe notation a '\" b means that the vectors a and b have the same signs, component \nby component. Because f and 9 are monotonic nondecreasing functions, x has the \nsame signs as -as/ax, while iJ has the same signs as as/ay. In other words, the \ndynamics of the excitatory neurons tends to minimize S, while that of the inhibitory \nneurons tends to maximize S. \nIf the sign relation\", is replaced by equality in (13), we obtain a true gradient \ndescent-ascent dynamics, \n\n. \n\nTxX = - ax ' \n\nas \n\n. as \n\nTyy = ay . \n\n( 5) \n1 \n\nSufficient conditions for convergence of gradient descent-ascent to saddle points \nare known[13, 10]. The conditions can be derived using a Lyapunov function con(cid:173)\nstructed from the kinetic energy and the saddle function, \nL = ~Txlxl2 + ~Tylill2 + rS . \n\n(16) \n\n\f334 \n\nH. S. Seung, T. 1. Richardson, 1. C. Lagarias and 1. 1. Hopfield \n\nThe time derivative of L is given by \n\nL\u00b7 \n\n'T82S. \n\n'T82S . \n\n= -x 8x2 X + y 8y2 Y - rTxx + rTyy \n\u00b7 2 \n\n\u00b72\n\n. \n\n(17) \n\nWeak sufficient conditions can be derived with the choice r = 0, so that L includes \nonly kinetic energy terms. Then L is obviously lower bounded by zero. Furthermore, \nL is nonincreasing if 8 2 S /8x2 is positive definite for all y and 8 2 S / 8y2 is negative \ndefinite for all x. In this case, the existence of a unique saddle point is guaranteed, \nas S is convex in x for all y , and concave in y for all x[13, 10]. \nIf there is more than one saddle point, the kinetic energy by itself is generally not \na Lyapunov function. This is because the dynamics may pass through the vicinity \nof more than one saddle point before it finally converges, so that the kinetic energy \nbehaves nonmonotonically as a function of time. In this situation, some appropriate \nnonzero r must be found. \nThe Lyapunov function (7) for excitatory-inhibitory networks is a generalization \nof the Lyapunov function (16) for gradient descent-ascent. This is analogous to \nthe way in which the Lyapunov function for symmetric networks generalizes the \npotential function of gradient descent. \n\nIt should be noted that gradient descent-ascent is an unreliable way of finding a \nsaddle point. It is easy to construct situations in which it leads to a limit cycle. \nThe unreliability of gradient descent-ascent contrasts with the reliability of gradient \ndescent at finding local minimum of a potential function. Similarly, symmetric \nnetworks converge to fixed points, but excitatory-inhibitory networks can converge \nto limit cycles as well. \n\n5 HAMILTONIAN DYNAMICS \n\nThe dynamics of an excitatory-inhibitory network can be written in a dissipative \nHamiltonian form . To do this, we define a phase space that is double the dimension \nofthe state space, adding momenta (Px,Py) that are canonically conjugate to (x, y). \nThe phase space dynamics \n\nTxX + X -\nf(Px) , \nTyY + y = g(py) , \n\n(r+ :t) (u+Ax-By-px) = o , \n(r+ !) (v+BTx-Cy-py) - o , \n\n(18) \n(19) \n\n(20) \n\n(21) \n\nreduces to the state space dynamics (1) on the affine space A = {(Px, PY' x, y) : Px = \nu + Ax - By,py = v + BTx - Cy}. Provided that r > 0, the affine space A is an \nattractive invariant manifold. \nDefining the Hamiltonian \n\nH(px, X'PY' y) = T;l~(Px, x) + T;lr(py, y) + rS(x, y) , \n\nthe phase space dynamics (18) can be written as \n\n8H \n8px ' \n8H \n8py , \n\n(22) \n\n(23) \n\n(24) \n\n\fMinimax and Hamiltonian Dynamics of Excitatory-Inhibitory Networks \n\n- ~~ + Ax - By - (r;l + r)[pX - i-leX)] , \n_ BH + BT x _ Gy _ (r- l _ r)r~ _ g-l(y)] \n\ny \n\nlJ'y \n\nPy = \n\nBy \n\n335 \n\n(25) \n\n(26) \n\n(27) \nOn the invariant manifold A, the Hamiltonian is identical to the Lyapunov function \n(7) defined previously. \n\n+2r(v+BT x-Gy-py) . \n\nThe rate of change of the energy is given by \n\nH \n\n-\n\nxT Ax - (r;l + r)xT[px - i-lex)] \n-yTGy _ (r;l _ r)yT[py _ g-l(y)] \n+2ryT(v + BT x - Gy - Py) . \n\n(28) \n\nThe last term vanishes on the invariant manifold, leaving a result identical to (8). \nTherefore, if the noncanonical terms in the phase space dynamics (18) dissipate \nenergy, then the Hamiltonian is nonincreasing. It is also possible that the velocity(cid:173)\ndependent terms may pump energy into the system, rather than dissipate it, in \nwhich case oscillations or chaotic behavior may arise. \n\nAcknowledgments This work was supported by Bell Laboratories. We would \nlike to thank Eric Mjolsness for useful discussions. \n\nReferences \n[1] M. A. Cohen and S. Grossberg. Absolute stability of global pattern formation and \n\nparallel memory storage by competitive neural networks. IEEE, 13:815-826, 1983. \n\n[2] J. J. Hopfield. Neurons with graded response have collective computational properties \n\nlike those of two-state neurons. Proc. Natl. Acad. Sci. USA, 81:3088-3092, 1984. \n\n[3] J. J. Hopfield and D. W. Tank. Computing with neural circuits: a model. Science, \n\n233:625-633, 1986. \n\n[4] H. R. Wilson and J . D. Cowan. A mathematical theory of the functional dynamics \n\nof cortical and thalamic nervous tissue. Kybernetik, 13:55-80, 1973. \n\n[5] Z. Li and J. J. Hopfield. Modeling the olfactory bulb and its neural oscillatory pro(cid:173)\n\ncessings. Bioi. Cybern., 61:379-392, 1989. \n\n[6] S. Amari. Dynamics of pattern formation in lateral-inhibition type neural fields. Bioi. \n\nCybern., 27:77-87, 1977. \n\n[7] B. Ermentrout. Complex dynamics in winner-take-all neural nets with slow inhibition. \n\nNeural Networks, 5:415-431, 1992. \n\n[8} S. Amari and M. A. Arbib. Competition and cooperation in neural nets. In J. Metzler, \n\neditor, Systems Neuroscience, pages 119-165. Academic Press, New York, 1977. \n\n[9} E. Mjolsness and C. Garrett. Algebraic transformations of objective functions. Neural \n\nNetworks, 3:651-669, 1990. \n\n[10} J. C. Platt and A. H. Barr. Constrained differential optimization. In D. Z. Anderson, \neditor, Neural Information Processing Systems, page 55, New York, 1987. American \nIristitute of Physics. \n\n[11] 1. M. Elfadel. Convex potentials and their conjugates in analog mean-field optimiza(cid:173)\n\ntion. Neural Computation, 7(5):1079-1104, 1995. \n\n[12] J. D. Cowan. A statistical mechanics of nervous activity. In Some mathematical \n\nquestions in biology, volume III. AMS, 1972. \n\n[13] K. J. Arrow, L. Hurwicz, and H. Uzawa. Studies in linear and non-linear programming. \n\nStanford University, Stanford, 1958. \n\n\f", "award": [], "sourceid": 1336, "authors": [{"given_name": "H. Sebastian", "family_name": "Seung", "institution": null}, {"given_name": "Tom", "family_name": "Richardson", "institution": null}, {"given_name": "J.", "family_name": "Lagarias", "institution": null}, {"given_name": "John J.", "family_name": "Hopfield", "institution": null}]}