{"title": "CAM Storage of Analog Patterns and Continuous Sequences with 3N2 Weights", "book": "Advances in Neural Information Processing Systems", "page_first": 91, "page_last": 97, "abstract": "", "full_text": "CAM Storage of Analog Patterns and \n\nContinuous Sequences with 3N2 Weights \n\nBill Baird \nDept Mathematics and \nDept Molecular and Cell Biology, \n129 LSA, U .C.Berkeley, \nBerkeley, Ca. 94720 \n\nFrank Eeckman \nLawrence Livermore \nNational Laboratory, \nP.O. Box 808 (L-426), \nLivermore, Ca. 94550 \n\nAbstract \n\nA simple architecture and algorithm for analytically guaranteed associa(cid:173)\ntive memory storage of analog patterns, continuous sequences, and chaotic \nattractors in the same network is described. A matrix inversion determines \nnetwork weights, given prototype patterns to be stored. There are N units \nof capacity in an N node network with 3N 2 weights. It costs one unit per \nstatic attractor, two per Fourier component of each sequence, and four per \nchaotic attractor. There are no spurious attractors, and there is a Lia(cid:173)\npunov function in a special coordinate system which governs the approach \nof transient states to stored trajectories. Unsupervised or supervised incre(cid:173)\nmental learning algorithms for pattern classification, such as competitive \nlearning or bootstrap Widrow-Hoff can easily be implemented. The archi(cid:173)\ntecture can be \"folded\" into a recurrent network with higher order weights \nthat can be used as a model of cortex that stores oscillatory and chaotic \nattractors by a Hebb rule. Hierarchical sensory-motor control networks \nmay be constructed of interconnected \"cortical patches\" of these network \nmodules. Network performance is being investigated by application to the \nproblem of real time handwritten digit recognition. \n\n1 \n\nIntroduction \n\nWe introduce here a \"projection network\" which is a new network for implementa(cid:173)\ntion of the \"normal form projection algorithm\" discussed in [Bai89, Bai90b]. The \nautoassociative case of this network is formally equivalent to the previous higher \norder network realization used as a biological model [Bai90a]. It has 3N 2 weights \ninstead of N2 + N 4 , and is more useful for engineering applications. All the math(cid:173)\nematical results proved for the projection algorithm in that case carryover to this \n\n91 \n\n\f92 \n\nBaird and Eeckman \n\nINPUT \n\nX' 1 \n\np- 1 matrix \n\nX' \nn Network Coordinates \n\nA matrix \n\nDynamic \nwinner-take-all \nNetwork \n\nP matrix \n\nOUTPUT \n\nNormal Form: \nVi = aVi - Vi wj aijVj \n\u2022 \n2 \n\n'\"' \n\nx=Pii \n\nNetwork Coordinates \n\nFigure 1: Projection Network - 3N 2 weights. The A matrix determines a k-winner(cid:173)\ntake-all net - programs attractors, basins of attraction, and rates of convergence. \nThe columns of P contain the ouptut patterns associated to these attractors. The \nrows of p-l determine category centroids \n\nnew architecture, but more general versions can be trained and applied in novel \nways. The discussion here will be informal, since space prohibits technical detail \nand proofs may be found in the references above. \n\nA key feature of a net constructed by this algorithm is that the underlying dynamics \nis explicitly isomorphic to any of a class of standard, well understood nonlinear \ndynamical systems - a \"normal form\" [GH83]. This system is chosen in advance, \nindependent of both the patterns to be stored and the learning algorithm to be \nused. This control over the dynamics permits the design of important aspects of \nthe network dynamics independent of the particular patterns to be stored. Stability, \nbasin geometry, and rates of convergence to attractors can be programmed in the \nstandard dynamical system. \n\nHere we use the normal form for the Hopf bifurcation [GH83] as a simple recurrent \ncompetitive k-winner-take-all network with a cubic nonlinearity. This network lies \nin what might considered diagonalized or \"overlap\" or \"memory coordinates\" (one \nmemory per k nodes). For temporal patterns, these nodes come in complex conju(cid:173)\ngate pairs which supply Fourier components for trajectories to be learned. Chaotic \ndynamics may be created by specific programming of the interaction of two pairs \nof these nodes. \nLearning of desired spatial or spatia-temporal patterns is done by projecting sets of \n\n\fCAM Storage of Analog Patterns and Continuous Sequences with 3N2 Weights \n\n93 \n\nthese nodes into network coordinates( the standard basis) using the desired vectors \nas corresponding columns of a transformation matrix P. In previous work, the \ndifferential equations of the recurrent network itself are linearly transformed or \n\"projected\" , leading to new recurrent network equations with higher order weights \ncorresponding to the cubic terms of the recurrent network. \n\n2 The Projection Network \n\nIn the projection net for autoassociation, this algebraic projection operation into \nand out of memory coordinates is done explicitly by a set of weights in two feed(cid:173)\nforward linear networks characterized by weight matrices p-l and P. These map \ninputs into and out of the nodes of the recurrent dynamical network in memory \ncoordinates sandwiched between them. This kind of network, with explicit input \nand output projection maps that are inverses, may be considered an \"unfolded\" \nversion of the purely recurrent networks described in the references above. \nThis network is shown in figure 1. Input pattern vectors i' are applied as pulses \nwhich project onto each vector of weights (row of the p-l matrix) on the input \nto each unit i of the dynamic network to establish an activation level Vi which \ndetermines the initial condition for the relaxation dynamics of this network. The \nrecurrent weight matrix A of the dynamic network can be chosen so that the unit \nor predefined subspace of units which recieves the largest projection of the input \nwill converge to some state of activity, static or dynamic, while all other units are \nsupressed to zero activity. \n\nThe evolution of the activity in these memory coordinates appears in the original \nnetwork coordinates at the output terminals as a spatia-temporal pattern which \nmay be fully distributed accross all nodes. Here the state vector of the dynamic \nnetwork has been transformed by the P matrix back into the coordinates in which \nthe input was first applied. At the attractor v* in memory coordinates, only a lin(cid:173)\near combination of the columns of the P weight matrix multiplied by the winning \nnonzero modes of the dynamic net constitute the network representation of the out(cid:173)\nput of the system. Thus the attractor retrieved in memory coordinates reconstructs \nits learned distributed representation i* through the corresponding columns of the \noutput matrix P, e.g. p-1i' = v, v -+ ir, Pir = i* . \nFor the special case of content addressable memory or autoassociation, which we \nhave been describing here, the actual patterns to be learned form the columns \nof the output weight matrix P, and the input matrix is its inverse p-l. These \nare the networks that can be \"folded\" into higher order recurrent networks. For \northonormal patterns, the inverse is the transpose of this output matrix of memories, \np-l = pT, and no computation of p-l is required to store or change memories -\njust plug the desired patterns into appropriate rows and columns of P and pT. \n\nIn the autoassociative network, the input space, output space and normal form \nstate space are each of dimension N. The input and output linear maps require \nN2 weights each, while the normal form coefficients determine another N 2 weights. \nThus the net needs only 3N2 weights, instead of the N 2 + N 4 weights required by \nthe folded recurrent network. The 2N2 input and output weights could be stored \noff-chip in a conventional memory, and the fixed weights of the dynamic normal \nform network could be implemented in VLSI for fast analog relaxation. \n\n\f94 \n\nBaird and Eeckman \n\n3 Learning Extensions \nMore generally, for a heteroassociative net (i. e., a net designed to perform a map \nfrom input space to possibly different output space) the linear input and output \nmaps need not be inverses, and may be noninvertible. They may be found by any \nlinear map learning technique such as Widrow-Hoff or by finding pseudoinverses. \n\nLearning of all desired memories may be instantaneous, when they are known in \nadvance, or may evolve by many possible incremental methods, supervised or un(cid:173)\nsupervised. The standard competitive learning algorithm where the input weight \nvector attached to the winning memory node is moved toward the input pattern \ncan be employed. We can also decrease the tendency to choose the most frequently \nselected node, by adjusting paratmeters in the normal form equations, to realize the \nmore effective frequency selective competitive learning algorithm [AKCM90]. Su(cid:173)\npervised algorithms like bootstrap Widrow Hoff may be implemented as well, where \na desired output category is known. The weight vector of the winning normal form \nnode is updated by the competitive rule, if it is the right category for that input, \nbut moved away from the input vector, if it is not the desired category, and the \nweight vector of the desired node is moved toward the input. \n\nThus the input map can be optimized for clustering and classification by these \nalgorithms, as the weight vectors (row vectors of the input matrix) approach the \ncentroids of the clusters in the input environment. The output weight matrix may \nthen be constructed with any desired output pattern vectors in appropriate columns \nto place the attractors corresponding to these categories anywhere in the state space \nin network coordinates that is required to achieve a desired heteroassociation. \n\nIf either the input or the output matrix is learned, and the other chosen to be its \ninverse, then these competitive nets can be folded into oscillating biological versions, \nto see what the competive learning algorithms correspond to there. Now either the \nrows of the input matrix may be optimized for recognition, or the columns of the \noutput matrix may be chosen to place attractors, but not both. We hope to be able \nto derive a kind of Hebb rule in the biological network, using the unfolded form of \nthe network, which we can prove will accomplish competitive learning. Thus the \nwork on engineering applications feeds back on the understanding of the biological \nsystems. \n\nVi = 2:;=1 hjVj - Vi 2:;=1 AijVJ \n\n4 Programming the Normal Form Network \nThe key to the power of the projection algorithm to program these systems lies in \nthe freedom to chose a well understood normal form for the dynamics, indepen(cid:173)\ndent of the patterns to be learned. The Hopf normal form used here, (in Cartesian \nis especially easy to work with \ncoordinates) \nfor programming periodic attractors, but handles fixed points as well. J is a ma(cid:173)\ntrix with real eigenvalues for determining static attractors, or complex conjugate \neignevalue pairs in blocks along the diagonal for periodic attractors. The real parts \nare positive, and cause initial states to move away from the origin, until the com(cid:173)\npetitive (negative) cubic terms dominate at some distance, and cause the flow to \nbe inward from all points beyond. The off-diagonal cubic terms cause competition \nbetween directions of flow within a spherical middle region and thus create multiple \nattractors and basins. The larger the eigenvalues in J and off-diagonal weights in \n\n\fCAM Storage of Analog Patterns and Continuous Sequences with 3N2 Weights \n\n95 \n\nA, the faster the convergence to at tractors in this region. \nIt is easy to choose blocks of coupling along the diagonal of the A matrix to produce \ndifferent kinds of attractors, static, periodic, or chaotic, in different coordinate \nsubspaces of the network. The sizes of the subspaces can be programmed by the \nsizes of the blocks. The basin of attraction of an attractor determined within \na subspace is guaranteed to contain the subspace [Bai90b]. Thus basins can be \nprogrammed, and \"spurious\" attractors can be ruled out when all subspaces have \nbeen included in a programmed block. \n\nThis can be accomplished simply by choosing the A matrix entries outside the \nblocks on the diagonal (which determine coupling of variables within a subspace) to \nbe greater (more negative) than those within the blocks. The principle is that this \nmakes the subspaces defined by the blocks compete exhaustively, since intersubspace \ncompetition is greater than subspace self-damping. Within the middle region, the \nflow is forced to converge laterally to enter the subspaces programmed by the blocks. \n\nAn simple example is a matrix of the form, \n\nd \n\nd \n\nA= \n\n(g) \n\nd c \nd c \nc d \nc d \n\n(g) \n\nwhere 0 < c < d < g. There is a static attractor on each axis (in each one \ndimensional subspace) corresponding to the first two entries on the diagonal, by \nthe agrument above. In the first two dimensional subspace block there is a single \nfixed point in the interior of the subspace on the main diagonal, because the off(cid:173)\ndiagonal entries within the block are symmetric and less negative than those on the \ndiagonal. The components do not compete, but rather combine. Nevertheless, the \nflow from outside is into the subspace, because the entries outside the subspace are \nmore negative than those within it. \n\nThe last subspace contains entries appropriate to guarantee the stability of a peri(cid:173)\nodic attractor with two frequencies (Fourier components) chosen in the J matrix. \nThe doubling of the entries is because these components come in complex conjugate \npairs (in the J matrix blocks) which get identical A matrix coupling. Again, these \npairs are combined by the lesser off-diagonal coupling within the block to form a \nsingle limit cycle attractor. A large subspace can store a complicated continuous \nperiodic spatia-temporal sequence with many component frequencies. \n\nThe discrete Fourier transform of a set of samples of such a sequence in space and \ntime can be input directly to the P matrix as a set of complex columns corresponding \nto the frequencies in J and the subspace programmed in A. N /2 total DFT samples \nof N dimensional time varying spatial vectors may be placed in the P matrix, and \nparsed by the A matrix into M < N /2 separate sequences as desired, with separate \nbasins of attraction guaranteed [Bai90b]. For a symmetric A matrix, there is a \n\n\f96 \n\nBaird and Eeckman \n\nLiapunov function, in the amplitude equations of a polar coordinate version of the \nnormal form, which governs the approach of initial states to stored trajectories. \n\n5 Chaotic Attractors \nChaotic attractors may be created in this normal form, with sigmoid nonlinearities \nadded to the right hand side, Vi -+ tanh( vd. The sigmoids yield a spectrum of higher \norder terms that break the phase shift symmetry of the system. Two oscillatory \npairs of nodes like those programmed in the block above can then be programmed \nto interact chaotically. In our simulations, for example, if we set the upper block of \nd entries to -1, and the lower to 1, and replace the upper c entries with 4.0, and the \nlower with -0.4, we get a chaotic attractor of dimension less than four, but greater \nthan three. \nThis is \"weak\" or \"phase coherent\" chaos that is still nearly periodic. It is created \nby the broken symmetry, when a homo clinic tangle occurs to break up an invariant \n3-torus in the flow [G H83]. This is the Ruelle-Takens route to chaos and has been \nobserved in Taylor-Couette flow when both cylnders are rotated. We believe that \nsets of Lorentz equations in three dimensional subspace blocks could be used in a \nprojection network as well. Experiments of Freeman, however, have suggested that \nchaotic attractors of the above dimension occur in the olfactory system [Fre87]. \nThese might most naturally occur by the interaction of oscillatory modes. \n\nIn the projection network or its folded biological version, these chaotic attractors \nhave a basin of attraction in the N dimensional state space that constitues a cat(cid:173)\negory, just like any other attractor in this system. They are, however, \"fuzzy\" \nat tr actors , and there may be computational advantages to the basins of attraction \n(categories) produced by chaotic attractors, or to the effects their outputs have \nas fuzzy inputs to other network modules. The particular N dimensional spatia(cid:173)\ntemporal patterns learned for the four components of these chaotically paired modes \nmay be considered a coordinate specific \"encoding\" of the strange attractor, which \nmay constitute a recognizable input to another network, if it falls within some \nlearned basin of attraction. While the details of the trajectory of a strange attrac(cid:173)\ntor in any real physical continuous dynamical system are lost in the noise, there \nis still a particular statistical structure to the attractor which is a recognizable \n\"sign at ure\" . \n6 Applications \nHandwritten characters have a natural translation invariant analog representation \nin terms of a sequence of angles that parametrize the pencil trajectory, and their \nclassification can be taken as a static or temporal pattern recognition problem. We \nhave constructed a trainable on-line system to which anyone may submit input \nby mouse or digitizing pad, and observe the performance of the system for them(cid:173)\nselves, in immediate comparison to their own internal recognition response. The \nperformance of networks with static, periodic, and chaotic attractors may be tested \nsimultaneously, and we are presently assessing the results. \n\nThese networks can be combined into a hierarchical architecture of interconnected \nmodules. The larger network itself can then be viewed as a projection network, \ntransformed into biological versions, and its behavior analysed with the same tools \nthat were used to design the modules. The modules can model \"patches\" of cortex \n\n\fCAM Storage of Analog Patterns and Continuous Sequences with 3N2 Weights \n\n97 \n\ninterconnected to form sensory-motor control networks. These can be configured to \nyield autonomous adaptive \"organisms\" which learn useful sequences of behaviors \nby reinforcement from their environment. \nThe A matrix for a network like that above may itself become a sub-block in the A \nmatrix of a larger network. The overall network is then a projection network with \nzero elements in off-diagonal A matrix entries outside blocks that define multiple \nattractors for the submodules. The modules neither compete nor combine states, \nin the absence of A matrix coupling between them, but take states independently \nbased on their inputs to each other through the weights in the matrix J (which \nhere describes full coupling). The modules learn connection weights Jij between \nthemselves which will cause the system to evolve under a clocked \"machine cycle\" \nby a sequence of transitions of attractors (static, oscillatory, or chaotic) within \nthe modules, much as a digital computer evolves by transitions of its binary flip(cid:173)\nflop states. This entire network may be folded to use more fault tolerant and \nbiologically plausible distributed representations, without disrupting the identity of \nthe subnetworks. \n\nSupervised learning by recurrent back propagation or reinforcement can be used to \ntrain the connections between modules. When the inputs from one module to the \nnext are given as impulses that establish initial conditions, the dynamical behavior \nof a module is described exactly by the projection theorem [Bai89]. Possible ap(cid:173)\nplications include problems such as system identification and control, robotic path \nplanning, gramatical inference, and variable-binding by phaselocking in oscillatory \nsemantic networks. \n\nAcknowledgements: \n\nSupported by AFOSR-87-0317, and a grant from LLNL. It is a pleasure to acknuwl(cid:173)\nedge the support of Walter Freeman and invaluable assistance of Morris Hirsch. \n\nReferences \n\n[Bai89] \n\n[Bai90a] \n\n[AKCM90] C. Ahalt, A. Krishnamurthy, P. Chen, and D. Melton. Competitive \nlearning algorithms for vector quantization. Neural Networks, 3:277-\n290,1990. \nB Baird. A bifurcation theory approach to vector field programming \nfor periodic attractors. In Proc. Int. Joint Conf on Neural Networks, \nWash. D.C., pages 1:381-388, June 1989. \nB. Baird. Bifurcation and learning in network models of oscillating \ncortex. In S. Forest, editor, Emergent Computation, pages 365-384. \nNorth Holland, 1990. also in Physica D, 42. \nB. Baird. A learning rule for cam storage of continuous periodic se(cid:173)\nquences. \npages 3: 493-498, June 1990. \nW.J. Freeman. Simulation of chaotic eeg patterns with a dynamic model \nof the olfactory system. Biological Cybernetics, 56:139, 1987. \nJ. Guckenheimer and D. Holmes. Nonlinear Oscillations, Dynamical \nSystems, and Bifurcations of Vector Fields. Springer, New York, 1983. \n\nIn Proc. Int. Joint Conf on Neural Networks, San Diego, \n\n[Bai90b] \n\n[Fre87] \n\n[GH83] \n\n\f", "award": [], "sourceid": 336, "authors": [{"given_name": "Bill", "family_name": "Baird", "institution": null}, {"given_name": "Frank", "family_name": "Eeckman", "institution": null}]}