{"title": "High Order Neural Networks for Efficient Associative Memory Design", "book": "Neural Information Processing Systems", "page_first": 233, "page_last": 241, "abstract": null, "full_text": "233 \n\nHIGH ORDER NEURAL NETWORKS FOR EFFICIENT \n\nASSOCIATIVE MEMORY DESIGN \n\nI. GUYON\u00b7, L. PERSONNAZ\u00b7, J. P. NADAL\u00b7\u00b7 and G. DREYFUS\u00b7 \n\n\u2022 Ecole Superieure de Physique et de Chimie Industrielles de la Ville de Paris \n\nLaboratoire d'Electronique \n\n10, rue Vauquelin \n\n75005 Paris (France) \n\n\u2022\u2022 Ecole Normale Superieure \n\nGroupe de Physique des Solides \n\n24, rue Lhomond \n\n75005 Paris (France) \n\nABSTRACT \n\nWe propose learning rules for recurrent neural networks with high-order \ninteractions between some or all neurons. The designed networks exhibit the \ndesired associative memory function: perfect storage and retrieval of pieces \nof information and/or sequences of information of any complexity. \n\nINTRODUCTION \n\nIn the field of information processing, an important class of potential \napplications of neural networks arises from their ability to perform as \nassociative memories. Since the publication of J. Hopfield's seminal paper 1, \ninvestigations of the storage and retrieval properties of recurrent networks \nhave led to a deep understanding of their properties. The basic limitations of \nthese networks are the following: \n- their storage capacity is of the order of the number of neurons; \n- they are unable to handle structured problems; \n- they are unable to classify non-linearly separable data. \n\n\u00a9 American Institute of Physics 1988 \n\n\f234 \n\nIn order to circumvent these limitations, one has to introduce additional \nnon-linearities. This can be done either by using \"hidden\", non-linear units, or \nby considering multi-neuron interactions2. This paper presents learning rules \nfor networks with multiple interactions, allowing the storage and retrieval, \neither of static pieces of information (autoassociative memory), or of temporal \nsequences (associative memory), while preventing an explosive growth of the \nnumber of synaptic coefficients. \n\nAUTOASSOCIATIVE MEMORY \n\nThe problem that will be addressed in this paragraph is how to design an \nautoassociative memory with a recurrent (or feedback) neural network when \nthe number p of prototypes is large as compared to the number n of neurons. \nWe consider a network of n binary neurons, operating in a synchronous \nmode, with period t. The state of neuron i at time t is denoted by (Ji(t), and the \nstate of the network at time t is represented by a vector ~(t) whose \ncomponents are the (Ji(t). The dynamics of each neuron is governed by the \nfollowing relation: \n\n(Ji(t+t) = sgn vi(t). \n\n(1 ) \n\nIn networks with two-neuron interactions only, the potential vi(t) is a linear \nfunction of the state of the network: \n\nFor autoassociative memory design, it has been shown3 that any set of \ncorrelated patterns, up to a number of patterns p equal to 2n, can be made the \nstable states of the system, provided the synaptic matrix is computed as the \northogonal projection matrix onto the subspace spanned by the stored \nvectors. However, as p increases, the rank of the family of prototype vectors \nwill increase, and finally reach the value of n. In such a case, the synaptic \nmatrix reduces to the identity matrix, so that all 2n states are stable and the \nenergy landscape becomes flat. Even if such an extreme case is avoided, the \nattractivity of the stored states decreases with increasing p, or, in other terms, \n\n\f235 \n\nthe number of fixed points which are not the stored patterns increases; this \nproblem can be alleviated to a large extent by making a useful use of these \n\"spurious\" fixed points4. Another possible solution consists in \"gardening\" the \nstate space in order to enlarge the basins of attraction of the fixed points5. \nAnyway, no dramatic improvements are provided by all these solutions since \nthe storage capacity is always O(n). \n\nWe now show that the introduction of high-order interactions between \nneurons, increases the storage capacity proportionally to the number of \nconnections per neuron. The dynamical behaviour of neuron i is still governed \nby (1). We consider two and three-neuron interactions, extension to higher \norder are straightforward. \nThe potential vi (t) is now defi ned as \n\nIt is more convenient, for the derivation of learning rules, to write the potential \nin the matrix form: \n\n~(t) = C ;t(t), \n\nwhere :\u00a5(t) is an m dimensional vector whose components are taken among \nthe set of the (n2+n)/2 values: a1 , ... , an' a1 a2 , ... , aj al ' ... , a n-1 an. \n\nAs in the case of the two-neuron interactions model, we want to compute the \ninteraction coefficients so that the prototypes are stable and attractor states. \nA condition to store a set of states Q:k (k=1 to p) is that y'k= Q:k for all k. Among \nthe solutions, the most convenient solution is given by the (n,m) matrix \n\nc=I,rl \n\n(2) \n\nwhere I, is the (n,p) matrix whose columns are the Q:k and rl is the (p,m) \npseudoinverse of the (m,p) matrix r whose columns are the {. This solution \nsatisfies the above requirements, up to a storage capacity which is related to \nthe dimension m of vectors :\u00a5. Thus, in a network with three-neuron \n\n\f236 \n\ninteractions, the number of patterns that can be stored is O(n2). Details on \nthese derivations are published in Ref.6. \nBy using only a subset of the products {aj all. the increase in the number of \nsynaptic coefficients can remain within acceptable limits, while the attractivity \nof the stored patterns is enhanced, even though their number exceeds the \nnumber of neurons ; this will be examplified in the simulations presented \nbelow. \nFinally, it can be noticed that, if vector ~contains all the {ai aj}' i=1, ... n, j=1, ... n, \nonly, the computation of the vector potential ~=C~can be performed after the \nfollowing expression: \n\nwhere ~ stands for the operation which consists in squaring all the matrix \ncoefficients. Hence, the computation of the synaptic coefficients is avoided, \nmemory and computing time are saved if the simulations are performed on a \nconventional computer. This formulation is also meaningful for optical \nimplementations, the function ell being easily performed in optics 7. \n\nIn order to illustrate the capabilities of the learning rule, we have performed \nnumerical simulations which show the increase of the size of the basins of \nattraction when second-order interactions, in addition to the first-order ones, \nare used. The simulations were carried out as follows. The number of neurons \nn being fixed, the amount of second-order interactions was chosen ; p \nprototype patterns were picked randomly, their components being \u00b11 with \nprobability 0.5 ; the second-order interactions were chosen randomly. The \nsynaptic matrix was computed from relation (2). The neural network was \nforced into an initial state lying at an initial Hamming distance Hi from one of \nthe prototypes {!k ; it was subsequently left to evolve until it reached a stable \nstate at a distance Hf from {!k. This procedure was repeated many times for \neach prototype and the Hf were averaged over all the tests and all the \nprototypes. \nFigures 1 a. and 1 b. are charts of the mean values of Hf as a function of the \nnumber of prototypes, for n = 30 and for various values of m (the dimension of \n\n\f237 \n\nvector ':/.). These curves allowed us to determine the maximum number of \nprototype states which can be stored for a given quality of recall. Perfect recall \nimplies Hf =0 ; when the number of prototypes increases, the error in recall \nmay reach Hf =H i : the associative memory is degenerate. The results \nobtained for Hi In =10% are plotted on Figure 1 a. When no high-order \ninteractions were used, Hf reached Hi for pIn = 1, as expected; conversely, \nvirtually no error in recall occured up to pIn = 2 when all second-order \ninteractions were taken into account (m=465). Figure 1 b shows the same \nquantities for Hi=200/0 ; since the initial states were more distant from the \nprototypes, the errors in recall were more severe. \n\n1.2 \n\n1.0 \n\n0.8 \n\n:f \nA 0.6 \nf \nv \n\n0.4 \n\n0.2 \n\n0.0 \n\n0 \n\n1.2 \n\n1.0 \n\n0.8 \n\n:f \nA 0.6 \nf \nv \n\n0.4 \n\n0.2 \n\n0.0 \n\n0 \n\n(8) \n\n2 \n\npIn \n\n3 \n\n(b) \n\n2 \n\npIn \n\n3 \n\nFig. 1. Improvement of the attractivity by addition of three-neuron interactions \nto the two-neuron interactions. All prototypes are always stored exactly (all \ncurves go through the origin). Each point corresponds to an average over \nmin(p,10) prototypes and 30 tests for each prototype. \n[] Projection: m = n = 30; \u2022 m = 120 ; \u2022 m = 180; 0 m = 465 (all interactions) \n1 a: Hi I n =10% \n\n; 1 b : Hi In =20%. \n\nTEMPORAL SEQUENCES (ASSOCIATIVE MEMORY) \n\nThe previous section was devoted to the storage and retrieval of items of \ninformation considered as fixed points of the dynamics of the network \n(autoassociative memory design). However, since fully connected neural \nnetworks are basically dynamical systems, they are natural candidates for \n\n\f238 \n\nstoring and retrieving information which is dynamical in nature, i.e., temporal \nsequences of patterns8: In this section, we propose a general solution to the \nproblem of storing and retrieving sequences of arbitrary complexity, in \nrecurrent networks with parallel dynamics. \nSequences consist in sets of transitions between states {lk_> Q:k+ 1, k=1, ... , p. \nA sufficient condition to store these sets of transitions is that y..~ Q:k+ 1 for all k. \nIn the case of a linear potential y\"=C Q:, the storage prescription proposed in \nref.3 can be used: C=r,+r,I, \nwhere r, is a matrix whose columns are the Q:k and r,+ is the matrix whose \ncolumns are the successors Q:k+ 1 of Q:k. If P is larger than n, one can use \nhigh-order interactions, which leads to introduce a non-linear potential Y..=C ';f. , \nwith ';f. as previously defined. We proposed in ref. 1 0 the following storage \nprescription : \n\n(3) \n\nThe two above prescriptions are only valid for storing simple sequences, \nwhere no patterns occur twice (or more). Suppose that one pattern occurs \ntwice; when the network reaches this bifurcation point, it is unable to make a \ndecision according the deterministic dynamics described in (1), since the \nknowledge of the present state is not sufficient. Thus, complex sequences \nrequire to keep, at each time step of the dynamics, a non-zero memory span. \n\nThe vector potential Y..=C':J. must involve the states at time t and t-t, which \nleads to define the vector ';f. as a concatenation of vectors Q:(t), ~(t-t), Q:(t)\u00aeQ:(t), \nQ:(t)\u00aeQ:(t-t), or a suitable subset thereof. The subsequent vector Q:(t+t) is still \ndetermined by relation (1). In this form, the problem is a generalization of the \nstorage of patterns with high order interactions, as described above. The \nstorage of sequences can be still processed by relation (3). \n\nThe solution presented above has the following features: \ni) Sequences with bifurcation points can be stored and retrieved. \nii) The dimension of the synaptic matrix is at most (n,2(n2+n)), and at least \n(n,2n) in the linear case, so that at most 2n(n2+n) and at least 2n2 synapses \nare required. \n\n\f239 \n\niii) The storage capacity is O(m), where m is the dimension of the vector ';t . \niv) Retrieval of a sequence requires initializing the network with two states in \nsuccession. \n\nThe example of Figure 2 illustrates the retrieval performances of the latter \nlearning rule. We have limited vector ';t to Q:(t}\u00aeQ:(t-t). In a network of n=48 \nneurons, a large number of poems have been stored, with a total of p=424 \nelementary transitions. Each state is consists in the 6 bit codes of 8 letters. \n\nALOUETTE \nJETE \nPLUMERAI \nALOUETTE \nGENTILLE \nALOUETTE \nALOUETTE \nJETE \nPLUMERAI \n\nJE NE \nOLVMERAI \nAQFUETTE \nJEHKILLE \nSLOUETTE \nALOUETTE \nJETE \nPLUMERAI \n\nFig. 2. One of the stored poems is shown in the first column. The network is \ninitialized with two states (the first two lines of the second column). After a few \nsteps, the network reaches the nearest stored sequence. \n\nLOCAL LEARNING \n\nFinally, it should be mentioned that all the synaptic matrices introduced in this \npaper can be computed by iterative, local learning rules. \nFor autoassociative memory, it has been shown analytically9 that the \nprocedure: \n\nwith Cij(O) = 0, \n\nwhich is a Widrow-Hoff type learning rule, yields the projection matrix, when \n\n\f240 \n\nthe number of presentations of the prototypes {~k} goes to infinity, if the latter \nare linearly independent. \nA derivation along the same lines shows that, by repeated presentations of \nthe prototype transitions, the learning rules: \n\nlead to the exact solutions (relations (2) and (3) respectively), if the vectors }< \nare Ii nearly independent. \n\nGENERALIZATION TASKS \n\nlayers of \n\nApart from storing and retrieving static pieces of information or sequences, \nneural networks can be used to solve problems in which there exists a \nstructure or regularity in the sample patterns (for example presence of clumps, \nparity, symmetry ... ) that the network must discover. Feed-forward networks \nwith multiple \ntrained with \nback-propagation algorithms for these purposes; however, one-layer \nfeed-forward networks with mUlti-neuron interactions provide an interesting \nalternative. For instance, a proper choice of vector ':I. (second-order terms only) \nwith the above learning rule yields a perfectly straightforward solution to the \nexclusive-OR problem. Maxwell et al. have shown that a suitable high-order \nneuron is able to exhibit the \"ad hoc network solution\" for the contiguity \nproblem 11. \n\nfirst-order neurons can be \n\nCONCLUSION \n\nThe use of neural networks with high-order interactions has long been \nadvocated as a natural way to overcome the various limitations of the Hopfield \nmodel. However, no procedure guaranteed to store any set of information as \nfixed points or as temporal sequences had been proposed. The purpose of \nthe present paper is to present briefly such storage prescriptions and show \n\n\f241 \n\nsome illustrations of the use of these methods. Full derivations and extensions \nwill be published in more detailed papers. \n\nREFERENCES \n\nJ. J. Hopfield, Proc. Natl. Acad. Sci. (USA) la, 2554 (1982). \n\n1. \n2. P. Peretto and J. J. Niez, BioI. Cybern. M, 53 (1986). \n\nP. Baldi and S. S. Venkatesh, Phys. Rev. Lett . .5.6 , 913 (1987). \nFor more references see ref.6. \nL. Personnaz, I. Guyon, G. Dreyfus, J. Phys. Lett. ~ , 359 (1985). \nL. Personnaz, I. Guyon, G. Dreyfus, Phys. Rev. A ~ , 4217 (1986). \nI. Guyon, L. Personnaz, G. Dreyfus, in \"Neural Computers\", R. Eckmiller \nand C. von der Malsburg eds (Springer, 1988). \n\n3. \n\n4. \n\n5. E. Gardner, Europhys. Lett . .1, 481 (1987). \n\nG. Poppel and U.Krey, Europhys. Lett.,.1, 979 (1987). \nL. Personnaz, I. Guyon, G. Dreyfus, Europhys. Lett. .1,863 (1987). \n\n6. \n7. D. Psaltis and C. H. Park, in \"Neural Networks for Computing\", J. S. Denker \n\ned., (A.I.P. Conference Proceedings 151, 1986). \n\n8. P. Peretto, J. J. Niez, in \"Disordered Systems and Biological Organization\", \n\nE. Bienenstock, F. Fogelman, G. Weisbush eds (Springer, Berlin 1986). \nS. Dehaene, J. P. Changeux, J. P. Nadal, PNAS (USA)~, 2727 (1987). \nD. Kleinfeld, H. Sompolinsky, preprint 1987. \nJ. Keeler, to appear in J. Cog. Sci. \nFor more references see ref. 9. \nI. Guyon, L. Personnaz, J.P. Nadal and G. Dreyfus, submitted for \npublication. \n\n9. \n\n10. S. Diederich, M. Opper, Phys. Rev. Lett . .5.6, 949 (1987). \n11. T. Maxwell, C. Lee Giles, Y. C. Lee, Proceedings of ICNN-87, San Diego, \n\n1987. \n\n\f", "award": [], "sourceid": 87, "authors": [{"given_name": "G\u00e9rard", "family_name": "Dreyfus", "institution": null}, {"given_name": "Isabelle", "family_name": "Guyon", "institution": null}, {"given_name": "Jean-Pierre", "family_name": "Nadal", "institution": null}, {"given_name": "L\u00e9on", "family_name": "Personnaz", "institution": null}]}