{"title": "Optimal Signalling in Attractor Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 485, "page_last": 492, "abstract": null, "full_text": "Optimal signalling in Attractor Neural \n\nNetworks \n\nIsaac Meilijson \n\nEytan Ruppin ... \n\nSchool of Mathematical Sciences \n\nRaymond and Beverly Sackler Faculty of Exact Sciences \n\nTel-Aviv University, 69978 Tel-Aviv, Israel. \n\nAbstract \n\nIn [Meilijson and Ruppin, 1993] we presented a methodological \nframework describing the two-iteration performance of Hopfield(cid:173)\nlike attractor neural networks with history-dependent, Bayesian \ndynamics. We now extend this analysis in a number of directions: \ninput patterns applied to small subsets of neurons, general con(cid:173)\nnectivity architectures and more efficient use of history. We show \nthat the optimal signal (activation) function has a slanted sigmQidal \nshape, and provide an intuitive account of activation functions with \na non-monotone shape. This function endows the model with some \nproperties characteristic of cortical neurons' firing. \n\n1 \n\nIntroduction \n\nIt is well known that a given cortical neuron can respond with a different firing pat(cid:173)\ntern for the same synaptic input, depending on its firing history and on the effects \nof modulator transmitters (see [Connors and Gutnick, 1990] for a review). The time \nspan of different channel conductances is very broad, and the influence of some ionic \ncurrents varies with the history of the membrane potential [Lytton, 1991]. Moti(cid:173)\nvated by the history-dependent nature of neuronal firing, we continue .our previous \ninvestigation [Meilijson and Ruppin, 1993] (henceforth, M & R) describing the per(cid:173)\nformance of Hopfield-like attract or neural networks (ANN) [Hopfield, 1982] with \nhistory-dependent dynamics. \n\n\u00b7Currently in the Dept. of Computer science, University of Maryland \n\n485 \n\n\f486 \n\nMeilijson and Ruppin \n\nBuilding upon the findings presented in M & R, we now study a more general \nframework: \n\n\u2022 We differentiate between 'input' neurons receiving the initial input signal \nwith high fidelity and 'background' neurons that receive it with low fidelity. \n\u2022 Dynamics now depend on the neuron's history of firing, in addition to its \n\nhistory of input fields. \n\n\u2022 The dependence of ANN performance on the network architecture can be \n\nexplicitly expressed. In particular, this enables the investigation of cortical(cid:173)\nlike architectures, where neurons are randomly connected to other neurons, \nwith higher probability of connections formed between spatially proximal \nneurons [Braitenberg and Schuz, 1991]. \n\nOur goal is twofold: first, to search for the computationally most efficient history(cid:173)\ndependent neuronal signal (firing) function, and study its performance with relation \nto memoryless dynamics. As we shall show, optimal history-dependent dynamics \nare indeed much more efficient than memory less ones. Second, to examine the \noptimal signal function from a biological perspective. As will shall see, it shares \nsome basic properties with the firing of cortical neurons. \n\n2 The model \nOur framework is an ANN storing m + 1 memory patterns ~1, e, ... ,~m+l, each \nan N-dimensional vector. The network is composed of N neurons, each of which \nis randomly connected to K other neurons. The (m + l)N memory entries are \nindependent with equally likely \u00b11 values. The initial pattern X, signalled by \nL(~ N) initially active neurons, is a vector of \u00b1l's, randomly generated from one \nof the memory patterns (say ~ = em +!) such that P(Xi = ~i) = Ii! for each of \nthe L initially active neurons and P(Xi = ~i) = li6 for each initially quiescent \n(non-active) neuron. Although f,6 E [0,1) are arbitrary, it is useful to think of f \nas being 0.5 (corresponding to an initial similarity of 75%) and of 6 as being zero \n- a quiescent neuron has no prior preference for any given sign. Let al = mlnl \ndenote the initial memory load, where nl = LK I N is the average number of signals \nreceived by each neuron. \n\nThe notion of 'iteration' is viewed as an abstraction of the overall dynamics for \nsome length of time, during which some continuous input/output signal function \n(such as the conventional sigmoidal function) governs the firing rate of the neuron. \nWe follow a Bayesian approach under which the neuron's signalling and activation \ndecisions are based on the a-posteriori probabilities assigned to its two possible true \nmemory states, \u00b11. \nInitially, neuron i is assigned a prior probability Ai(O) = P(~ = 11Xi, 1/1\u00bb = l~! \nor 1\u00b12 6 which is conveniently expressed as Ai(O) = \n(0)' where, letting get) = \n! log l\u00b11 \nl-t' \n2 \n\n1+e-2g j \n\n1 \n\nif i is active \nif i is silent \n\n\fOptimal Signalling in Attractor Neural Networks \n\n487 \n\nThe input field observed by neuron i as a result of the initial activity is \n\nloCI) - ~ ' \" w, . .J.. 1\u00b7(1) X ' \n\nIJ IJ ] \n\n] \n\n-\n\nJ 1 \n\nN \nLJ \nnl j=1 \n\n(1) \n\nwhere 1/1) = 0, 1 indicates whether neuron j has fired in the first iteration, lij = 0,1 \nindicates whether a connection exists from neuron j to neuron i and Wij denotes \nits magnitude, given by the Hopfield prescription \nWij = L: e JJ ieJJ j \n\n(2) \n\nm+l \n\n. \n\nJJ=1 \n\nAs a result of observing the input field fi(I), which is approximately normally \ndistributed (given ei, Xi and 1/ 1\u00bb), neuron i changes its opinion about {ei = I} \nfrom Ai(O) to \n\nA\u00b7(1) - P (e\u00b7 - llX\u00b7 J.(1) t .(I\u00bb) _ \n\n, I \n\nI , \n\n1 \n\n1 -\n\n1 \n\n-\n\n- 1 + e- 2gi (1) \n\n1 \n\n, \n\n(3) \n\nexpressed in terms of the ( additive) generalized field 9i( 1) = gi(O) + :1 f/l). \n\nWe now get to the second iteration, in which, as in the first iteration, some of the \nneurons become active and signal to the network. We model the signal function \nneuron i emits as h(9i(1), Xi, li(l\u00bb). The field observed by neuron i (with n2 updating \nneurons per neuron) is \n\n1.(2) - 2- \"'W, .. I .. h(g.(I) X' / \u00b7(1\u00bb) \n\n, \n\n], J \n\n'] I] \n\nJI \n\n] \n\n-\n\nN \nLJ \nn2 . 1 \n]= \n\n, \n\n(4) \n\np(ei = \non the basis of which neuron i computes its posterior belief Ai(2) \nllXi, li(I), f/ 1), f/ 2\u00bb) and expresses its final choice of sign as Xi(2) = sign(A/2 ) -\n0.5). The two-iteration performance of the network is measured by the final simi(cid:173)\nlarity \n\nSf = 1 + ff = p(X/2) = ei) = + N L-j=1 \n2 \n\n2 \n\n1 \",N X (2)c \n'-oj \n\nj \n\n1 \n\n(5) \n\n3 Analytical results \n\nThe goals of our analysis have been: A. To present an expression for the performance \nunder arbitrary architecture and activity parameters, for general signal functions \nho and hi. B. Use this expression to find the best choice of signal functions which \nmaximize performance. We show the following: \n\nThe neuron's final decision is given by \n\nXi(2) = Sign [(Ao + Boli(1\u00bb)Xi + Al fi(l) + A 2 fi(2)] \n\n(6) \n\nfor some constants AD, Bo, At and A 2 \u2022 \n\n\f488 \n\nMeilijson and Ruppin \n\nThe performance achieved is \n\nwhere, for some A3 > 0 \n\n'\" m \na =-\nn'\" \n\nm \n\nnl +mA3 ' \n\n(Q\"'~)(x, t) = 1; t ~ (x + g~\u00bb) + 1 2 t ~ (x _ g~\u00bb) \n\nand ~ is the standard normal cumulative distribution function. \nThe optimal analog signal function, illustrated in figure 1, is \n\nho = h(g/1),+1,0) = R(gj(l),O) \nhI = h(gj(1), +1,1) = R(gi(l), \u20ac) - 1 \n\nwhere, for some A4 > \u00b0 and A5 > 0, R(s, t) = A4 tanh(s) - A5(S - g(t\u00bb. \n\n(b) \n\n-\nSilent neurons \n- - - - Active neurons \n\n(7) \n\n(8) \n\n(9) \n\n(10) \n\nI , \n\n2.0 \n\nv \n\n0.0 \n\n-V \n\n~.O \n\nSignal \n\n1 \n\n---~~--~~--n-~------g \n\n4 5' Input field \n\n1 \u2022 \n\n, \n\n\\ \n\n\\ \n\\ \n\n. . \n\n----- -1 \n\n.... 0 L--~---'-~ __ -'---'----'-__ ~~~-----' \n5.0 \n\n-3.0 \n\n1.0 \n\n-1.0 \n\n-5.0 \nFigure 1: (a) A typical plot of the slanted sigmoid, Network parameters are N = \n5000, K = 3000, nl = 200 and m = 50. (b) A sketch of its discretized version. \n\nItllUt field \n\n3.0 \n\nThe nonmonotone form of these functions, illustrated in figure 1, is clear. Neurons \nthat have already signalled +1 in the first iteration have a lesser tendency to send \npositive signals than quiescent neurons. The signalling of quiescent neurons which \nreceive no prior information (6 = 0) has a symmetric form. The optimal signal is \nshown to be essentially equal to the sigmoid modified by a correction term depending \nonly on the current input field. In the limit of low memory load (f./ fol ~ 00), the \nbest signal is simply a sigmoidal function of the generalized input field. \n\n\fOptimal Signalling in Attractor Neural Networks \n\n489 \n\nTo obtain a discretized version of the slanted sigmoid, we let the signal be sign(h(y)) \nas long as Ih(y)1 is big enough - where h is the slanted sigmoid. The resulting signal, \nas a function of the generalized field, is (see figure la and lb) \n\ny < {3I (j) or {34 (j) < y < {3s (j) \ny > {36 (j) or {32 (j) < y < (33 (j) \notherwise \n\n(11) \n\nwhere -00 < (3l(D) < (32(O) ~ (33(O) < (34(O) < (3s(O) < (36(D) < 00 and -00 < (31(l) < \n(32(l) ~ (3/I) < (34(l) ~ (3S(l) < (36(1) < 00 define, respectively, the firing pattern of \nthe neurons that were silent or active in the first iteration. To find the best such \ndiscretized version of the optimal signal, we search numerically for the activity level \nv which maximizes performance. Every activity level v, used as a threshold on Ih(y) I, \ndefines the (at most) twelve parameters (3/j) (which are identified numerically via \nthe Newton-Raphson method) as illustrated in figure lb. \n\n4 Numerical Results \n\n1.00 \n\n,;;-; ;,-:..: ~ ... - ._ ....... _ ....... .... _-----_ .. _ .. .. -.. -\n\n:'~' \n.:: \n\n0.95 \n\n0.90 \n\nPosterior-probability-bned signalling \n\n-\n-- -- . DI_tlzad slgnalDng \n.... . Analog optmal signaling \n\n.... -_ ....... ..... .. -.. -..... . \n\n-\nDiscrete signalling \n.. ....... Analog signalUng \n\n1.000 \n\n0.980 \n\n//'---\nI \nI \n/ , \n/ I \nI \nt \n0.980 I \n\n! \n\n0.85 '-~---'-_~--\"-~_-'--~~_~...J \n\n0.940 '-~---'-_~....l.-~--:-'---~---'-_~...J \n\n0.0 \n\n1000.0 \n\n2000.0 \n\n3000.0 \n\n4000.0 \n\n5000.0 \n\n0.0 \n\n1000.0 \n\n2000.0 \n\n3000.0 \n\n4000.0 \n\n5000.0 \n\nK \n\nK \n\nFigure 2: Two-iteration performance as a function of connectivity K. (a) Network \nparameters are N = 5000, nl = 200, and m = 50. All neurons receive their input \nstate with similar initial overlap f = 6 = 0.5. (b) Network parameters are N = 5000, \nm = 50, ni = 200, f = 0.5 and 6 = O. \n\nUsing the formulation presented in the previous section, we investigated numeri(cid:173)\ncally the two-iteration performance achieved in several network architectures with \noptimal analog signalling and its discretization. Already in small scale networks of a \nfew hundred neurons our theoretical calculations correspond fairly accurately with \n\n\f490 \n\nMeilijson and Ruppin \n\nsimulation results. First we repeat the example of a cortical-like network investi(cid:173)\ngated in M & R, but now with optimal analog and discretized signalling. The nearly \nidentical marked superiority of optimal analog and discretized dynamics over the \nprevious, posterior-probability-based signalling is evident, as shown in figure 2 (a). \nWhile low activity is enforced in the first iteration, the number of neurons allowed \nto become active in the second iteration is not restricted, and best performance is \ntypically achieved when about 70% of the neurons in the network are active (both \nwith optimal signalling and with the previous, heuristic signalling). \n\nFigure 2 (b) displays the performance achieved in the same network, when the input \nsignal is applied only to the small fraction (4%) of neurons which are active in the \nfirst iteration (expressing possible limited resources of input information). We see \nthat (for 1< > 1000) near perfect final similarity is achieved even when the 96% \ninitially quiescent neurons get no initial clue as to their true memory state, if no \nrestrictions are placed on the second iteration activity level. \nNext we have fixed the value of w = fit = 1, and contrasted the case (nl = \n200, f = 0.5) of figure 2 (b) with (nl = 50, f = 1). The overall initial similarity under \n(nl = 50, f = 1) is only half its value under (nl = 200, f = 0.5). In spite of this, we \nhave found that it achieves a slightly higher final similarity. This supports the idea \nthat the input pattern should not be applied as the conventional uniformly distorted \nversion of the correct memory, but rather as a less distorted pattern applied only \nto a small subset of the neurons. \n\n(a) \n\n(b) \n\n0.970 \n\n.. , ..... -_ .. -\n\n-_ .. ---\n\n---------\n\n\",-'-' \n\n-\nDIscrete signaling \n- - - - Analog slgraliing \n\n,-, \n.' \n\n, , , , , , , \n\n: \n, : \n\" \n\n0.950 \n\n0.11400.0 \n\n1.00 \n\n0.98 \n\nf :! 0.98 \n\n~ \n\n0.94 \n\n0.920.0 \n\nI /./ \n\n/7;;;;':::\"?~\"\"'~ \n. ~' \n/' /.-:,' \n,.~/ \n; i' \n. ( \n! I \nI { I \ni I / \n\"/ \nII \n\nUpper bound pertlnNnce \n-\n- . - 3--0 GausaIan connec1IvIty \n- - 2-D GaussIan connectIvl1y \n-\n- - - - Mulll-layered nelWodc \n_ .. '-' Lower bound perlonrence \n\ndf~1 \n\n!~ \n1'1 \n;'~ f,'I, \nI,Ii \n'\" i \n~II \nI. ,I \n'I \nI \n\n2000.0 \n\n4000.0 \n\neooo.O \n\n8000.0 \n\n10000.0 \n\n1000.0 \n\n2000.0 \n\nN \n\nK \n\nFigure 3: (a) Two-iteration performance in a full-activity network as a function of \nnetwork size N. Network parameters are nl = I{ = 200, m = 40 and f = 0.5. \n(b) Two-iteration performance achieved with various network architectures, as a \nfunction of the network connectivity K. Network parameters are N = 5000, nl = \n200, m = 50, f = 0.5 and 6 = O. \n\nFigure 3 (a) illustrates the performance when connectivity and the number of sig-\n\n\fOptimal Signalling in Attractor Neural Networks \n\n491 \n\nnals received by each neuron are held fixed, but the network size is increased. \nA region of decreased performance is evident at mid-connectivity (K ~ N /2) \nvalues, due to the increased residual variance. Hence, for neurons capable of \nforming K connections on the average, the network should either be fully con(cid:173)\nnected or have a size N much larger than K. Since (unavoidable eventually) \nsynaptic deletion would sharply worsen the performance of fully connected net(cid:173)\nworks, cortical ANNs should indeed be sparsely connected. The final similar(cid:173)\nity achieved in the fully connected network (with N = K = 200) should be \nIn this case, the memory load (0.2) is significantly above the criti(cid:173)\nnoted. \ncal capacity of the Hopfield network, but optimal history-dependent dynamics \nstill manage to achieve a rather high two-iterations similarity (0.975) from ini(cid:173)\ntial similarity 0.75. This is in agreement with the findings of [Morita, 1993, \nYoshizawa et a/., 1993], who show that nonmonotone dynamics increase capacity. \n\nFigure 3 (b) illustrates the performance achieved with various network architec(cid:173)\ntures, all sharing the same network parameters N, K, m and input similarity pa(cid:173)\nrameters nl, f, 0, but differing in the spatial organization of the neurons' synapses. \nAs evident, even in low-activity sparse-connectivity conditions, the decrease in per(cid:173)\nformance with Gaussian connectivity (in relation, say, to the upper bound) does not \nseem considerable. Hence, history-dependent ANNs can work well in a cortical-like \narchitecture. \n\n5 Summary \n\nThe main results of this work are as follows: \n\n\u2022 The Bayesian framework gives rise to the slanted-sigmoid as the optimal \nsignal function, displaying the non monotone shape proposed by [Morita, \n1993]. It also offers an intuitive explanation of its form. \n\n\u2022 Martingale arguments show that similarity under Bayesian dynamics per(cid:173)\n\nsistently increases. This makes our two-iteration results a lower bound for \nthe final similarity achievable in ANNs. \n\n\u2022 The possibly asymmetric form of the function, where neurons that have \nbeen silent in the previous iteration have an increased tendency to fire in \nthe next iteration versus previously active neurons, is reminiscent of the \nbi-threshold phenomenon observed in biological neurons [Tam, 1992]. \n\n\u2022 In the limit of low memory load the best signal is simply a sigmoidal func(cid:173)\n\ntion of the generalized input field. \n\n\u2022 In an efficient associative network, input patterns should be applied with \nhigh fidelity on a small subset of neurons, rather than spreading a given \nlevel of initial similarity as a low fidelity stimulus applied to a large subset \nof neurons. \n\n\u2022 If neurons have some restriction on the number of connections they may \nform, such that each neuron forms some K connections on the average, then \nefficient ANNs, converging to high final similarity within few iterations, \nshould be sparsely connected. \n\n\f492 \n\nMeilijson and Ruppin \n\n\u2022 With a properly tuned signal function, cortical-like Gaussian-connectivity \n\nANNs perform nearly as well as randomly-connected ones . \n\n\u2022 Investigating the 0,1 (silent, firing) formulation, there seems to be an in(cid:173)\n\nterval such that only neurons whose field values are greater than some low \nthreshold and smaller than some high threshold should fire. This seemingly \nbizarre behavior may correspond well to the behavior of biological neurons; \nneurons with very high field values have most probably fired constantly in \nthe previous 'iteration', and due to the effect of neural adaptation are now \nsilenced. \n\nReferences \n\n[Braitenberg and Schuz, 1991] V. Braitenberg and A. Schuz. Anatomy of the Cor(cid:173)\n\ntex: Statistics and Geometry. Springer-Verlag, 1991. \n\n[Connors and Gutnick, 1990] B.W. Connors and M.J. Gutnick. Intrinsic firing pat(cid:173)\n\nterns of diverse neocortical neurons. TINS, 13(3):99-104, 1990. \n\n[Hopfield, 1982] J.J. Hopfield. Neural networks and physical systems with emergent \n\ncollective abilities. Proc. Nat. Acad. Sci. USA, 79:2554, 1982. \n\n[Lytton, 1991] W. Lytton. Simulations of cortical pyramidal neurons synchronized \n\nby inhibitory interneurons. J. Neurophysiol., 66(3):1059-1079, 1991. \n\n[Meilijson and Ruppin, 1993] I. Meilijson and E. Ruppin. History-dependent at(cid:173)\n\ntractor neural networks. Network, 4:1-28, 1993. \n\n[Morita, 1993] M. Morita. Associative memory with nonmonotone dynamics. Neu(cid:173)\n\nral Networks, 6:115-126, 1993. \n\n[Tam, 1992] David C. Tam. Signal processing in multi-threshold neurons. \n\nIn \nT. McKenna, J. Davis, and S.F. Zornetzer, editors, Single neuron computation, \npages 481-501. Academic Press, 1992. \n\n[Yoshizawa et al., 1993] S. Yoshizawa, M. Morita, and S.-I. Amari. Capacity of as(cid:173)\nsociative memory using a nonmonotonic neuron model. Neural Networks, 6:167-\n176, 1993. \n\n\f", "award": [], "sourceid": 730, "authors": [{"given_name": "Isaac", "family_name": "Meilijson", "institution": null}, {"given_name": "Eytan", "family_name": "Ruppin", "institution": null}]}