{"title": "Attractor Neural Networks with Local Inhibition: from Statistical Physics to a Digitial Programmable Integrated Circuit", "book": "Advances in Neural Information Processing Systems", "page_first": 805, "page_last": 812, "abstract": "", "full_text": "Attractor Neural Networks with Local \nInhibition: from Statistical Physics to a \nDigital Programmable Integrated Circuit \n\nE. Pasero \n\nR. Zecchina \n\nDipartimento di Elettronica \n\nDipartimento di Fisica Teorica e INFN \n\nPolitecnico di Torino \n1-10129 Torino, Italy \n\nU niversita. di Torino \n1-10125 Torino, Italy \n\nAbstract \n\nNetworks with local inhibition are shown to have enhanced compu(cid:173)\ntational performance with respect to the classical Hopfield-like net(cid:173)\nworks. In particular the critical capacity of the network is increased \nas well as its capability to store correlated patterns. Chaotic dy(cid:173)\nnamic behaviour (exponentially long transients) of the devices in(cid:173)\ndicates the overloading of the associative memory. An implementa(cid:173)\ntion based on a programmable logic device is here presented. A 16 \nneurons circuit is implemented whit a XILINK 4020 device. The \npeculiarity of this solution is the possibility to change parts of the \nproject (weights, transfer function or the whole architecture) with \na simple software download of the configuration into the XILINK \nchip. \n\n1 \n\nINTRODUCTION \n\nAttractor Neural Networks endowed with local inhibitory feedbacks, have been \nshown to have interesting computational performances[I]. Past effort was con(cid:173)\ncentrated in studying a variety of synaptic structures or learning algorithms, while \nless attention was devoted to study the possible role played by different dynamical \nschemes. The definition of relaxation dynamics is the central problem for the study \nof the associative and computational capabilities in models of attractor neural net(cid:173)\nworks and might be of interest also for hardware implementation in view of the \n\n805 \n\n\f806 \n\nPasero and Zecchina \n\nconstraints on the precision of the synaptic weights. \n\nIn this paper, we give a brief discussion concerning the computational and physical \nrole played by local inhibitory interactions which lead to an effective non-monotonic \ntransfer function for the neurons. In the last few years others models characterized \nby non-monotonic neurons have been proposed[2,3]. \n\nFor Hebbian learning we show, numerically, that the critical capacity increases with \nrespect to the Hopfield case and that such result can be interpreted in terms of a \ntwofold task realized by the dynamical process. By means of local inhibition, the \nsystem dynamically selects a subspace (or subnetwork) of minimal static noise with \nrespect to the recalled pattern; at the same time, and in the selected subspace, the \nretrieval of the memorized pattern is performed. The dynamic behaviour of the \nnetwork, for deterministic sequential updating, range from fixed points to chaotic \nevolution, with the storage ratio as control parameter, the transition appearing \nin correspondence to the collapse of the associative performance. Resorting to two \nsimplified versions of the model, we study the problem of their optimal performance \nby the replica method; in particular the role of non-monotonic functions and of \nsubspaces dynamical selection are discussed. \n\nIn a second part of the work, the implementation of the discussed model by means \nofaXILINK programmable gate array is discussed. The circuit implements a 16-32 \nneurons network in which the analogical characteristics (such as a capacitive decay) \nare emulated by digital solutions. As expected, the limited resolution of the weghts \ndoes not represent a limit for the performance of the network. \n\n2 THE MODEL: theory and performance \n\nWe study an attractor neural network composed of N three state \u00b11,O formal \nneurons. The \u00b1 1 values code for the patterns (the patterns are indeed binary) and \nare thus used during the learning phase, while the O-state is a don't care state, not \nbelonging to the patterns code, which has only a dynamical role. The system is \nassumed to be fully connected and its evolution is governed by sequential or parallel \nupdating of the following equations \n\nhi(t + 1) = >'hi(t) + L JijSj(t) \n\nN \n\ni = l, ... ,N \n\nj=l \n\n(1) \n\n(2) \n\nwhere,), is a dynamic threshold of the local inhibitory feedback (typically we take \n')'(t) = fr 2:i Ihi(t - 1)1), the {Jij} are the synaptic conductances and>' is a capac-\nitive decay factor of the input potential (>. = e~, where T = RC). \nThe performance of the network are described in terms of two parameters which \nhave both a dynamical and a computational simple interpretation. In particular we \ndefine the retrieval activity as the fraction of neurons which are not not in the \nzero state \n\n\fAttractor Neural Networks with Local Inhibition \n\n807 \n\nwhile the parameter that defines the retrieval quality is the scaled overlap \n\np. _ \n\n1 \"\"' tP.S \n- N ~,\"i i\u00b7 \na \n\nm \n\n. , \n\n(3) \n\n(4) \n\nwhere the {er = \u00b11, i = 1, N; J.L = 1, P} are the memorized binary patterns. The \nof the active neurons, M = {i / Si # 0, i = 1, N}. \nscaled overlap can be thought simply as the overlap computed in the subspace M \nGiven a set of P random independent binary patterns {er}, the Hebb-Hopfield \nlearning rule corresponds to fix the synaptic matrix Jij by the additive relation \nJij = ~ L ere: (with Jii = 0). The effect of the dynamical process defined by \n\np \n\np.=1 \n\n(1) and (2) is the selection of subspaces M of active neurons in which the static \nnoise is minimized (such subspaces will be hereafter referred to as orthogonal sub(cid:173)\nspaces). Before entering in the description of the results, it is worthwhile to remem(cid:173)\nber that, in Hopfield-like attractor neural networks, the mean of cross correlation \nHuctuations produce in the local fields of the neurons a static noise, referred to as \ncross-talk of the memories. Together with temporal correlations, the static noise is \nresponsible of the phase transition of the neural networks from associative memory \nto spin-glass. More pTecisely, when the Hopfield model is in a fixed point elF which \nbelongs to the set of memories, the local fields are given by hier = 1 + Rf where \nRf = ~ L L er e: er t; is the static noise (gaussian distribution with 0 mean and \nvariance va). \nThe preliminary performance study of the model under discussion have revealed \nseveral new basic features, in particular: (i) the critical capacity, for the Hebb \nlearning rule, results increased up to Q c ::::; 0.33 (instead of 0.14[4]); (ii) the mean \ncross correlation Huctuations computed in the selected subspaces is minimized by \nthe dynamical process in the region Q < Q c ; (iii) in correspondence to the associative \ntransition the system goes through a dynamic transition from fixed points to chaotic \ntrajectories. \n\nP.\u00a2lF j\u00a2i \n\nThe quantitative results concerning associative performance, are obtained by means \nof extended simulations. A typical simulation takes the memorized patterns as \ninitial configurations and lets the system relax until it reaches a stationary point. \nThe quantity describing the performance of the network as an associative memory \nis the mean scaled overlap m between the final stationary states and the memorized \npatterns, used as initial states. As the number of memorized configurations grows, \none observes a threshold at Q = Q c ::::; 0.33 beyond which the stored states become \nunstable. (numerical results were performed for networks of size up to N = 1000). \nWe observe that since the recall of the patterns is performed with no errors (up to \n\n\f808 \n\nPasero and Zecchina \n\na ~ 0.31), also the number of stored bits in the synaptic matrix results increased \nwith respect to the Hopfield case. \n\nThe typical size of the sub-networks DM, like the network capacity, depends on the \nthreshold parameter l' and on the kind of updating: for 1'(t) = fv I:i Ihi(t -1)1 and \nparallel updating we find DM ::: N/2 (a c = 0.33). \nThe static noise reduction corresponds to the minimization of the mean fluctuation \nof the cross correlations (cross talk) in the subspaces, defined by \n\nwhere ei = 1 if i E M in pattern a and zero otherwise, as a function of a. Under \nthe dynamical process (1) and (2), C does not follow a statistical law but undergoes \na minimization that qualitatively explains the increase in the storage capacity. For \na < a c , once the system has relaxed in a stationary subspace, the model becomes \nequivalent (in the subspace) to a Hopfield network with a static noise term which is \nno longer random. The statistical mechanics of the combinatorial task of minimizing \nthe noise-energy term (5) can be studied analytically by the replica method; the \nresults are of general interest in that give an upper bound to the performance \nof networks endowed with Hebb-like synaptic matrices and with the possibility \nselecting optimal subnetworks for retrieval dynamics of the patterns[8]. \n\nAs already stated, the behaviour of the neural network as a dynamical system is \ndirectly related to its performance as an associative memory. The system shows \nan abrupt transition in the dynamics, from fixed points to chaotic exponentially \nlong transients, in correspondence to the value of the storage ratio at which the \nmemorized configurations become unstable. The only (external) control parameter \nof the model as a dynamical system is the storage ratio a = P / N. Dynamic complex \nbehaviour appears as a clear signal of saturation of the attractor neural network \nand does not depend on the symmetry of the couplings. \n\nAs a concluding remark concerning this short description of the network perfor(cid:173)\nmance, we observe that the dynamic selection of subspaces seems to take advantage \nof finite size effects allowing the storage of correlated patterns also with the simple \nHebb rule. Analytical and numerical work is in progress on this point, devoted to \nclarify the performance with spatially correlated patterns[5]. \n\nFinally, we end this theoretical section by addressing the problem of optimal per(cid:173)\nformance for a different choice of the synaptic weights. In this direction, it is of \nbasic interest to understand whether a dynamical scheme which allows for dynamic \nselection of subnetworks provides a neural network model with enhanced optimal \ncapacity with respect to the classical spin models. Assuming that nothing is known \nabout the couplings, one can consider the Jij as dynamical variables and study the \nfractional volume in the space of interactions that makes the patterns fixed points \nof the dynamics. Following Gardner and Derrida[6] , we describe the problem in \nterms of a cost-energy function and study its statistical mechanics: for a generic \nchoice of the {Jij }, the cost function Ei is defined to be the number of patterns \nsuch that a given site i is wrong (with respect to (1)) \n\n\fAttractor Neural Networks with Local Inhibition \n\n809 \n\nEi( {Jij }, {\u20acf}) = L [\u20acf (e(hfer +,,) - e(hfer)) + (1 - \u20acf) eb 2 - hf2)] \n\np \n\n(6) \n\n1-'=1 \n\nwhere e is the step function, the hf = ~ L Jijej \u20ac} are the local fields, \" is the \nthreshold of the inhibitory feedback and with \u20acf = {O, I} being the variables that \nidentify the subspace M (\u20acf = 1 if i E M and zero otherwise). \nIn order to estimate the optimal capacity, one should perform the replica theory on \nthe following partition function \n\nvN . \nJ \n\n(7) \n\nSince the latter task seems unmanageable, as a first step we resort to two simplified \nversion of the model which, separately, retain its main characteristics (subspaces \nand non-monotonicity); in particular: \n(i) we assume that the {\u20acf} are quenched random variables, distributed according \nto P(\u20acf) = (1 - A)6(\u20acf) + A6(\u20acf - 1), A E [0,1]; \n(ii) we consider the case of a two-state (\u00b11) non-monotonic transfer function. \nFor lack of space, here we list only the final results. The expressions of the R.S. \ncritical capacity for the models are, respectively: \n\nQ~.s\u00b7 b; A) = 2(1 - A) Jo D(b - ()2 + \"2 + A\"y D(b - ()2 \n\n(' \n\n{ \n\nA 100 \n\n}-l \n\n(8) \n\n(9) \n\nwhere D( = \n\n1 ~ \nJ7Le 2 d( (for (9) see also Ref.[4]). \nV 271\" \n\nThe values of critical capacity one finds are much higher than the monotonic per(cid:173)\nceptron capacity (Qc = 2). Unfortunately, the latter results are not reliable in that \nthe stability analysis shows that the RS solution are unstable. Replica symmetry \nbreaking is thus required. All the details concerning the computation with one step \nin replica symmetry breaking of the critical capacity and stabilities distribution can \nbe found in Ref.[9]. Here we just quote the final quantitative result concerning op(cid:173)\ntimal capacity for the non-monotonic two-state model: numerical evaluation of the \nsaddle-point equations (for unbiased patterns) ~ives Qcbopt) ~ 4.8 with \"opt ~ 0.8, \nthe corresponding R.S. value from (9) being Q c .5 . ~ 10.5. \n\n\f810 \n\nPasero and Zecchina \n\n3 HARDWARE IMPLEMENTATION: a digital \n\nprogrammable integrated circuit \n\nThe performance of the network discussed in the above section points out the good \nbehavior of the dynamical approach. Our goal is now to investigate the performance \nof this system with special hardware. Commercial neural chips[9] and[10] are not \nfeasible: the featu res of our net require non monotonic transfer characteristic due \nto local inhibitions. These aspects are not allowed in traditional networks. The \nimplementation of a full custom chip is, on the other side, an hasty choice. The \nmodel is still being studied: new developments must be expected in the next future. \nTherefore we decided to build a prototype based on programmable logic circuits. \nThis solution allows us to implement the circuit in a short time not to the detri(cid:173)\nment of the performance. Moreover the same circuit will be easily updated to the \nnext evolutions. After an analysis of the existing logic circuits we decided to use \nthe FPGAs devices[ll]. The reasons are that we need a large quantity of internal \nregisters, to represent both synapses and capacitors, and the fastest interconnec(cid:173)\ntions. Xilinx 4000 family[12] offers us the most interesting approach: up tp 20000 \ngates are programmable and up to 28 K bit of Ram are available. Moreover a 3 \nns propagation delay between internal blocks allow to implement very fast systems. \nWe decided to use a XC4020 circuit with 20000 equivalent gates. Main problems re(cid:173)\nlated to the implementation of our model are the following: (a) number of neurons, \n(b) number of connections and (c) computation time parameters. (a) and (b) are \nobviously related to the logic device we have at our disposal. The number of gates \nwe can use to implement the transfer function of our non monotonic neurons are \nmutually exclusive with the number of bits we decide to assign to the weights. The \n20000 gates must be divided between logic gates and Ram cells. The parameter \n(c) depends on our choices in implementing the neural network. We can decide to \nconnect the logic blocks in a sequential or in a parallel way. The global propagation \ntime is the sum of the propagation delays of each logic block, from the the input \nto the output. Therefore if we put more blocks in parallel we don't increase the \npropagation delay and the time performance is better. Unfortunately the parallel \nsolution clashes with the limitations of available logic blocks of our device. There(cid:173)\nfore we decided to design two chips: the first circuit can implement 16 neurons in a \nfaster parallell implementation and the second circuit allow us to use 32 (or more) \nneurons in a slower serial approach. Here we'll describe the fastest implementation. \n\nFigure 1 shows the 16 neurons of the neural chip. Each neuron, described in figure 2, \nperforms a sequential sum and multiplication of the outputs of the other 15 neurons \nby the synaptic values stored inside the internal Ram. A special circuit implements \nthe activation function described in the previuos section. All the neurons perform \nthese operations in a parallel way: 15 clock pulses are sufficient to perform the \ncomplete operation for the system. Figure 2 shows the circuit of the neuron. Tl \nis a Ram where the synapses Tij are stored after the training phase. Ml and \nAl perform sums and multiplications according to our model. Dl simulates the A \ndecay factor: every 15 clock cycles, which correspond to a complete cycle of sum and \nmultiplication for all the 16 neurons, this circuit decreases the input of the neuron \nof a factor A. The activation function (1) is realized by Fl. Such circuit emulates \na three levels logic, based on -1, 0 and +1 values by using two full adder blocks. \nLimitation due to the electrical characteristic of the circuit, impose a maximum \n\n\fAttractor Neural Networks with Local Inhibition \n\n811 \n\nclock cycle of 20 MHz . The 16 neurons verSlOn of the chip takes from 4 to 6 \ncomplete computations to gain stability and every computation is 16 clock cycles \nlong. Therefor e the network gives a stable state after 3 J.1.S at maximum. The \nsecond version of this circuit allows to use more neurons at a lower speed. We used \nthe Xilinx device to implement one neuron while the synapses and the capacitors \nare stored in an external fast memory. The single neuron is time multiplexed in \norder to emulate a large number identical devices. At each step, both synapses \nand state variables are downloaded and uploaded from an external memory. This \nsolution is obviously slower than the tirst one but a larger number of nerons can be \nimplemented. A 32 neurons version takes about 6 J.1.S to reach a stable configuration. \n\nn(1:0) \n\n/I I \n\nout(LO) \n\nn{ I :0) \n\n/I 2 \n\nout(l :0) \n\nn{I,:O) \n\n/I 3 \n\noutO \u00b7O) \n\nL--.-\",,\"'{I:O) \n\n/I \n\n/I \n\nout(l :O) \n\nOUTl'UT(2n: 1) \n\nCO/olTROtlIR \n\nFigure 1 - Neural Chip \n\n: \n: \n\n. \nIf Ul,ll \n\nU.QJ \n\n.44.vU2 \n\n~lUl.I!IJ \n\nAl \n\nI T- 8111.81 \n\n( \n\nL.. 1(11.81 \nr= ( \nr-\n\n~ (0\"\"012 \n\nQ (J I. 11 \n\nII \n\nc::: W 17: \nSOl (\\ \n013 Cl \n:::: oulf \n\n131 \n1 :el \n1 :81 \nIII : 131 \n\noutr \n\n1\\ ;: (3) \n\nI \n\nOlle, \n\n~~ \"o12 CIlI.al \n\npo t (I: \n\nD) \n\nodCI~:O) \n\nTJ \n\n!.I-\n~c \n-\n\nMJ \n\n& \n\n..... e.l \n\n\u2022\u2022 1I1.1Ir- I ... (l,e: \n\n.4 N,.\" \n\nr ... ll.1 \n\n.~d.u =-\n\n\u2022 ..,l' 1, OJ \n\nC0t\"'l\"'O L3 : \nlood \n\nDJ~ \n_oD \n(K \n\nu.~ \n\nloon, \u2022\u2022 , \n\n~(1I'811 \n\n4 \u2022\u2022\u2022 \",'_' \n\nDJ \n\nI \nf\"\"\" \n\n....... (], eJ \n\n\u2022 ..,t (I, el \n\n' __ at \n\nDou\\!I:OI \n\nFigure 2 - Neuron \n\n\f812 \n\nPasero and Zecchina \n\n4 CONCLUSION \n\nA modified approach to attractor neural networks and its implementation on a \nXILINK XC4020 FPGA was discussed. The chip is now under test. Six /-L8 are \nsufficient for the relaxation of the system in a stable state, and the recognition of \nan input pattern is thus quite fast. A next step will be the definition of a multiple \nchip system endowed with more than 32 neurons, with the weights stored in an \nexternal fast memory. \n\nAcknowledgements \n\nThis work was partially supported by the Annethe-INFN Italian project and by \nProgetto /inalizzato sistemi informatici e calcolo parallelo of CNR under grant N. \n91.00884.PF69. \n\nReferences \n\n[1] R. Zecchina, \"Computational and Physical Role of Local Inhibition in At(cid:173)\n\ntractor Neural Networks: a Simple Model,\" Parallel Architectures and Neural \nNetworks, edt. E.R. Caianiello, World Scientific (1992). \n\n[2] M. Morita, S. Yoshizawa, H. Nakano, \"Analysis and Improvement of the Dy(cid:173)\n\nnamics of Autocorrelated Associative Memory,\" IEleE Trans. J73-D-ll, 232 \n(1990). \n\n[3] K. Kobayashi, \"On the Capacity of a Neuron with a Non-Monotone Output \n\nFunction,\" Network, 2, 237 (1991). \n\n[4] D.J. Amit, M. Gutfreund, H. Sompolinsky, \"Storing Infinite Numbers of Pat(cid:173)\n\nterns in a Spin-Glass Model of Neural Networks,\" Phy&. Rev. Lett., 55, 1530 \n(1985). \n\n[5] G. Boifetta, N. BruneI, R. Monasson, R. Zecchina, in preparation (1993). \n[6] E. Gardner, B. Deridda, \"Optimal Storage Properties of Neural Network Mod(cid:173)\n\nels,\" J. Phys., A21, 271 (1988). \n\n[7] G. Boifetta, R. Monasson, R. Zecchina, \"Symmetry Breaking in Non(cid:173)\n\nMonotonic Neural Networks,\" in preparation (1992). \n\n[8] N. BruneI, R. Zecchina, \"Statistical Mechanics of Optimal Memory Retrieval \n\nin the Space of Dynamic Neuronal Activities,\" pre print (1993). \n\n[9] \"An electrical Trainable Artificial Neural Network\", proceedings of IJCNN, \n\n1989, S. Diego. \n\n[10] M. Dzwonczyk, M.Leblanc, \"INCA: An Integrated Neurocomputing Architec(cid:173)\n\nture\", proceedings of AlA A Computing in Aerospace, October 1991 \n\n[11] W.R. Moore, W. Luk, \"FPGAs\", Abingdon EE-CS Books, 1991 \n[12] \"The XC4000 Data Book\", Xilinx, 1991 \n\n\f", "award": [], "sourceid": 630, "authors": [{"given_name": "E.", "family_name": "Pasero", "institution": null}, {"given_name": "R.", "family_name": "Zecchina", "institution": null}]}