{"title": "A Digital Antennal Lobe for Pattern Equalization: Analysis and Design", "book": "Advances in Neural Information Processing Systems", "page_first": 245, "page_last": 253, "abstract": null, "full_text": "A Digital Antennal Lobe for Pattern \n\nEqualization: Analysis and Design \n\nAlex Holub, Gilles Laurent and Pietro Perona \n\nComputation and Neural Systems, California Institute of Technology \n\nholub@caltech.edu, laurentg@caltech.edu, perona@caltech.edu \n\nAbstract \n\nRe-mapping patterns in order to equalize their distribution may \ngreatly simplify both the structure and the training of classifiers. \nHere, the properties of one such map obtained by running a few \nsteps of discrete-time dynamical system are explored. The system \nis called 'Digital Antennal Lobe' (DAL) because it is inspired by \nrecent studies of the antennallobe, a structure in the olfactory sys(cid:173)\ntem of the grasshopper. The pattern-spreading properties of the \nDAL as well as its average behavior as a function of its (few) de(cid:173)\nsign parameters are analyzed by extending previous results of Van \nVreeswijk and Sompolinsky. Furthermore, a technique for adapting \nthe parameters of the initial design in order to obtain opportune \nnoise-rejection behavior is suggested. Our results are demonstrated \nwith a number of simulations. \n\n1 \n\nIntroduction \n\nThe complexity of classifiers and the difficulty of learning their parameters is affected \nby the distribution of the input patterns. It is easier to obtain simple and accurate \nclassifiers when the patterns associated with different classes are spaced far apart \nand evenly in the input space. Distributions which are lumpy, with classes bunched \nup in some regions of space leaving other regions of space empty may be more \ndifficult to classify. This problem is particularly evident in sensory processing. In \nolfaction numerous odors which we wish to discriminate are chemically very similar, \nfor example the citrus family (orange, lemon, lime ... ), while many odors that are in \nprinciple possible never occur in practice. The uneven chemical spacing for the odors \nof interest is expensive: in biological systems there is a premium in the simplicity \nof the classifiers that will recognize each individual odor. \nWhen the dimension ofthe pattern space is large (e.g. D > 100), and the number of \nclasses to be discriminated is relatively small (e.g. N < 1000), one may transform \nan uneven distribution of patterns into an evenly distributed one by means of a map \nthat 'randomizes' the position of each pattern, i.e. that takes (small) neighborhoods \nof the input space and remaps them to random locations. In large-dimensional \nspaces it is exceedingly likely that two contiguous regions will be remapped to \nlocations whose distance is comparable with the diameter of the space, and thus \nthe distribution of patterns is equalized. \n\n\fWe explore a simple dynamical system which realizes one such map for spreading \npatterns in a high-dimensional space. The input space is the analog D-dimensional \nhypercube (0,1)D and the output space the digital hypercube {0,1}D. The map \nis implemented by iterating a discrete-time first-order dynamical system consisting \nof two steps at each iteration: a first-order linear dynamical system followed by \nmemory less thresholding. The interest of the map is that it makes very parsimonious \nuse of computational hardware (e.g. on the order of D neurons or transistors) and \nyet it achieves good equalization in a few time steps. The ideas that we present are \ninspired by a computation that may take place in the olfactory system as suggested \nin Friedrichs and Laurent [1J and Laurent [2 , 3J. In insects, the anatomical structure \nwhere this computation is presumed to take place is called the 'Antennal Lobe'. \nBecause of this we call the map a 'Digital Antennal Lobe' (DAL). \n\n2 The digital antennal lobe \n\nThe dynamical system we propose is inspired by the overall architecture of the \nantennal lobe and is designed to explore its computational capabilities. We apply \ntwo key simplifications: we discretize time into equally spaced 'epochs', updating \nsynchronously the state of all the neurons in the network at each epoch, and we dis(cid:173)\ncretize the value of the state of each unit to the binary set {O, 1}. The physiological \njustification for these simplifications goes beyond the scope of this paper. \n\nConsider a collection of N binary neurons which are randomly connected and up(cid:173)\ndated synchronously. The network is initially quiescent (i.e. all the neurons have \nconstant state zero). At some time an input is applied causing the network to \ntake values that are different from zero. The state of the network evolves in time. \nThe state of the network after a given constant number of time-steps (e.g. 10-20 \ntime-steps) is the desired output of the system. Let us introduce the following \nnotation: \n\nNumber of excitatory, inhibitory, and external input units. \nTotal number of excitatory and inhibitory units (N = N E + N I ) \nNeuron index: i E {1, ... ,N E} for excitatory and \ni E {N E + 1, ... ,N} for inhibitory. \nValue of unit i at time t. \nVector of values for all excitatory and inhibitory units at time t. \nConnectivity: cN is the number of inputs to a given neuron. \nExcitatory, inhibitory, and external input (i.e. KE = eN E) . \nMatrix of connections. A has eN2 nonzero entries. \nConnection weight of unit j to unit i. \nExcitatory, inhibitory, input weights (Aij E {aI,O,aE}). \nActivation thresholds for all the neurons \nVector of pattern inputs. \nMatrix of excitatory connections from pattern inputs to units. \nVector of neuronal input currents, i.e. gt+l = AXl + Bat - T. \nUpdate equation for x. 1(\u00b7) is the Heaviside function. \nMean activity in the network at time t, i.e. mt = Li xi/No \nFraction of the external inputs which are active. \n\nx~ E {O, 1 }V'i \nXl \nc \nKE,KI,Ku \nA \nAij \naE, aI, au \nT \nit \nB \ngt \nXl = 1(gt) \nmt \nmu \n\nA DAL may be generated once the value of 5 parameters are chosen. Assume exci(cid:173)\ntatory connection weight aE = au = 1 (this is a normalization constant). Choose \na value for aI, c, T, N I , N E. Generate random connection matrices A and B with \naverage connectivity e and connection weights aE, aI. Solve the following dynamical \nsystem forward in time from a zero initial condition: \n\n\f\"\"\"\"''''''\u00b7''' '-\u00ab''''''' 1\\1'_ '1.2_'''\"'''' 1'''''''1''''' '''''''''''''1''''') '''''''' \n\nf\"-\u2022\u2022\u2022 ~\" \u2022\u2022\u2022 -.-.-.e--. ' \nI~\u00b7 \u00b7--e\"\u00b7--\u00b7''-\u00b7>i \n\u2022 \nI \n\\ \nI', r 1\\ \n\n, \n\nI ', \n\"\", \nII\n\n/ \nI /~--.~ \n! -----,:;------;;;------c: . . ' +'\" .-.-; \n\n\"'\" \n\nII \n\nFigure 1: Example of pattern spreading by the a DAL. (Left) Response of a DAL to \n10 uniformly distributed random olfactory input patterns applied at time epoch t = 3. \nEach vertical panel represents the state of excitatory units at a given time epoch (epochs \n2,4,8,10 and excitatory units 1-200 are shown) in response to all stimuli. In a given \npanel the row index refers to a given excitatory unit and the column index to a given \ninput pattern (200 of 1024 excitatory units shown and 10 input patterns). A white dot \nrepresents a state of '1' and a dark dot represent a state of '0'. Around 10% of the neurons \nare active (i.e. state = '1') by the 8th time-epoch. The salt-and-pepper pattern present \nin each panel indicates that excitatory units respond differently to each input pattern. \n(Center) Activity of the DAL in response to 10 stimuli that differ only in one out of 1024 \ninput dimensions, i.e. 0.1%. The horizontal streaks in the panels corresponding to early \nepochs (t = 4 and t = 6) indicate that the excitatory units respond equally or similarly \nto all input patterns. The salt-and-pepper pattern in later epochs indicates that the \ntime course of each excitatory units state becomes increasingly different in time. (Right) \nTime-course of the normalized average distance between the patterns corresponding to \ndifferent families of input patterns: the red curve corresponds to input patterns that \nare very different (average difference 20%), while the green and blue curve correspond \nto families of similar input patterns: 0.1% average difference for the green curve and \n0.2% average difference for the blue curve. The parameters used in this network were \naJ = 10, c = .05, T = 10, NE = 1024, NJ = 256. \n\no \nAxt- 1 + Bit - T, \nl(yt) \n\nt > 0 \n\nzero initial condition \nneuronal input \nstate update \n\nfor some (constant) input pattern it. The notation 1(\u00b7) indicates the Heaviside step \nfunction. \n\nThe overall behavior of the DAL in response to different olfactory inputs is illus(cid:173)\ntrated in Figure 1. Notice the main features of the DAL. (1) In response to an \ninput each unit exhibits a complex temporal pattern of activity. (2) The pattern \nis different for different inputs. \n(3) The average activity rate of the neurons is \napproximately independent of the input pattern. (4) When very different input \npatterns are applied the average normalized Hamming distance between excitatory \nunit states is almost maximal immediately after the onset of the input stimulus. (5) \nWhen very similar input patterns are applied (e.g. 0.1 % average difference), the \naverage normalized Hamming distance between excitatory unit patterns is initially \nvery small, i.e. \ninitially the excitatory units respond similarly to similar inputs. \nThe difference increases with time and reaches almost maximal value within 8-9 \ntime-epochs. \n\nThe 'chaotic' properties of sparsely connected networks of neurons were noticed \nand studied by Van Vreeswijk and Sompolinsky [5] in the limit of 00 neurons. In \nthis paper we study networks with a small number of neurons comparable to the \nnumber observed within the antennal lobe. Additionally, we propose a technique \n\n\ffor the design of such networks, and demonstrate the possibility of 'stabilizing' some \ntrajectories by parameter learning. \n\n2.1 Analytic solution and equilibrium of network \n\nThe use of simplified neural elements, namely McCulloch-Pitts units [4], allows us \nto represent the system as a simple discrete time dynamical system. Furthermore, \nwe are able to create expressions for various network properties. Several distribu(cid:173)\ntions can be used to approximate the number of active units in the population of \nexcitatory, inhibitory, and external units, including: (1) the Binomial distribution, \n(2) the Poisson distribution, and (3) the Gaussian distribution. An approximation \ncommon to all three is that the activities of all units are uncorrelated. The Gaussian \napproximation will yield Van Vreeswijk and Sompolinsky's analysis [5]. \nGiven the population activity at a time t, mt, we can calculate the expected value \nfor the population activity at the next time step, m H1 : \n\nKE KJ Ku \n\nE(mt+1) = 2..= 2..= 2..=p(e)p(i)p(u)l(aEe + ali + auu - T) \n\ne=O i = O u=O \n\nWhere pee), p(i), and p(u) are the probabilities of e excitatory, i inhibitory, and u \nexternal inputs being active. Both e and i are binomially distributed with mean \nactivity m = mt, while the external input is binomially distributed with mean \nactivity m = mu: \n\nThe Poisson distribution can be used to approximate the binomial distribution \nfor reasonable values of A, where for instance Ae = K emt. Using the Poisson \napproximation, the probability of j units being active is given by: \n\nIn the limit as N ---+ 00, the distributions for the sum of the number of excitatory, \ninhibitory, and external units active approach normal distributions. Since the sum \nof Gaussian random variables is itself a Gaussian random variable, we can model \nthe net input to a unit as the sum of the excitatory, inhibitory, and external input \nshifted by a constant representing the threshold. The mean f-L and variance (J2 of \nthe Gaussian representing the input to an individual unit are then: \n\nf-L = aEmt KE + almt Kl + aumuKu - T \n\n(J2 = NE[a~mtc - a~c2mt] + Nl[aJmtc - aJc2mt] + Nu[a~muc - a~c2mu] \n\nThe fraction of active input units can be determined by considering the area under \nthe gaussian corresponding to positive cumulative input: \n\nThe predicted population mean activity was calculated by imposing that the system \nis at equilibrium. The equilibrium condition is satisfied when mt = mHl. \n\n\fFigure 2: Design of a DAL. (Left) Behavior of the system for a given connectivity value. \nLight gray indicates inhibition-threshold values that yield a stable dynamical system. That \nis, small perturbations of firing activity do not result in large fluctuations in activity later \nin time. The dark blue line indicates equilibria, i.e. inhibition-threshold values for which \nthe dynamical system rests at a constant mean-firing rate. (Center) The stable portions of \nthe equilibrium curves for a number of connectivity values. Using this chart one may design \nan antennal lobe: for any given connectivity choose inhibition and threshold values that \nproduce a desired mean firing rate. (Right) The design procedure produces networks that \nbehave as desired. The arrows indicate parameter sets for which Monte Carlo simulation \nwere performed in order to test the accuracy of the predictions. The values indexing the \narrows correspond to the absolute difference ofthe predicted activity (.15) using a binomial \napproximation and the mean simulation activity across 10 random inputs to 10 different \nnetworks with the specified parameters sets. \n\nWe found the binomial approximation to yield the most accurate predictions in \nparameter ranges of interest to us, namely 500-4000 total units and connectivities \nranging from .05-.15 (see Figure 2). The binomial approximation was always within \n1 standard deviation of the Monte Carlo means. The Gaussian approximation \nyielded slightly less accurate predictions but required a fraction of the time to \ncompute. \n\n3 Design of the Antennal Lobe \n\nThe analysis described above allows us to design well behaved DALs. Specifically, \nwe can predict which subsets of parameters in a given parameter range yield good \nnetwork behavior. These predictions are made by solving the update equation for \nmultiple sets of parameters and then determining which parameter ranges yield \nnetworks which are both stable and at equilibrium. \n\nFigure 2 outlines the design technique for a network of 512 excitatory and 512 \ninhibitory units and a population mean activity of .15. The predicted activity of the \nnetwork for different parameter sets corresponds well with that observed in Monte \nCarlo simulations. There is an average difference of .0061 between the predicted \nmean activity and that found in the simulations (see Figure 2, right plot). \n\n4 Learning for trajectory stabilization \n\nConsider a 'physical' implementation of the DAL, either by means of neurons in \na biological system or by transistors in an electronic circuit. The inevitable pres(cid:173)\nence of noise points to a fatal flaw of the DAL as we have seen it so far. The \nkey property of the DAL is input decorrelation. In the presence of noise the same \ninput applied multiple times to the same network will produce divergent trajec(cid:173)\ntories , hence different final conditions, thus making the use of DALs for pattern \n\n\fclassification problematic. \n\nConsider the possibility that noise is present in the system: as a result of fluctuations \nin the level of the input ii, fluctuations in the biophysical properties of the neurons, \netc. We may represent this noise as an additional term fi in the dynamical system: \n\nifAX't + Biit - T \n\nX'tH \n\nl(if + fit) \n\nWhatever the statistics of the noise, it is clear that it may influence the trajectory X' \nof the dynamical system. Indeed, if yf, the nominal input to a neuron, is sufficiently \nclose to zero, then even a small amount of noise may change the state xf of that \nneuron. As we saw in earlier sections this implies that the ensuing trajectory will \ndiverge from the trajectory of the same system with the same inputs and no noise \nor the same inputs and a different realization of the same noise process. This is \nshown in the left panel of Figure 3. On the other hand, if yf is far from zero, then \nxf will not change even with large amounts of noise. This raises the possibility \nthat, if a DAL is appropriately designed, it may exhibit a high degree of robustness \nto noise. Ideally, for any given initial condition and input, and for any E, there \nexists a constant Yo > 0 such that any initial condition and input in a Yo-ball \naround the original input and initial condition will produce trajectories that differ \nat most by E. Clearly, if E = 0 (i.e. the trajectory is required to be identical to \nthe one of the noiseless system) then all trajectories of the system must coincide, \nnot very useful. Similarly, if E <~ Yo the map will not spread different inputs. \nTherefore, this formulation of the problem does not have a satisfactory solution. \nOne may, however, consider a weaker requirement. If the total number of patterns \nto be discriminated is not too large (probably 10-1000 in the case of olfaction) one \ncould think of requiring noise robustness only for the trajectories X'that are specific \nto those patterns. We therefore explored whether it was in principle possible to \nstabilize trajectories corresponding to different odor presentations rather than all \ntrajectories. \n\nWe wish to change the connection weights A, B and thresholds T so that the network \nis robust with respect to noise around a given trajectory X'(ii). In order to achieve \nthis we wish to ensure that at no time t neuron i has an input that is close to the \nthreshold. If neuron i is not firing at time t (i.e. xf = 0) then its input must be \ncomfortably less than zero (i.e. for some constant Yo > 0, yf < -Yo) and viceversa \nfor xf = 1. We do so by minimizing an appropriate cost function: call g(.) an \nappropriate penalty function, e.g. g(y) = exp(y/yo) , then the cost of neuron i at \ntime t if xf = 0 is Cf = g(yf) and if xf = 1 then Cf = g( -yf). Therefore: \n\ncf \nC(A,B,T) \n\ng( (1 - 2xDyf) \nLLCf \n\nThe minimization may proceed by gradient descent. The equations for the gradient \nare: \n\naCf \n--' \naAij \nayf \naAij \nsimilarly, \nayf \naBij \n\n\fDive rgerlCe Of 22 Traje<;tori es 8efore Leam ing \n\nDivergence of Trajectories After Leam ing f O Ti me_Steps \n\nFigure 3: Robustness of trajectories to noise resulting from network learning. \n(Left) \nPattern spreading in a DAL before learning. Each curve corresponds to the divergence \nrate between 10 identical trajectories in the presence of 5% gaussian synaptic noise added \nto each active presynaptic synapse. All patterns achieve maximum spreading in 9-10 \nsteps as also shown in Figure 1. (Right) The divergence rate of the same trajectories after \nlearning the first 10 steps of each trajectory. Each trajectory was learned sequentially, with \nthe trajectory labelled 1 learned first. Note that trajectories learned later, for instance \ntrajectory 20, diverge more slowly than earlier learned trajectories. Thus, the trajectories \nlearned earlier are forgotten while more recently acquired trajectories are maintained. \nFurthermore, the trajectories maintain their stereotyped ability to decorrelate both after \nthey are forgotten (e.g. trajectory 8) and after the 10 step learning period is over (e.g. \ntrajectory 20). Untrained trajectories behave the same as trajectories in the left panel. \n\n-1 \n\nIn Figure 3 the results of one learning experiment are shown. Before learning all \ntrajectories are susceptible to synaptic noise. After learning, those trajectories \nlearned last exhibit robustness to noise, while trajectories learned earlier are slowly \nforgotten. We can compare each learned trajectory to a curve in multi-dimensional \nspace with a 'robustness pipe' surrounding it. Any points lying within this pipe \nwill be part of trajectories that remain within the pipe. In the case of olfactory \nprocessing, different odors correspond to unique trajectories, while trajectories lying \nwithin a common pipe correspond to the same input odor presentation. \n\nA few details on the experiment: The network contained 2048 neurons, half of \nwhich were excitatory and the other half inhibitory. The values of the constants \nwere: c = 0.08, aE = 1, a[ = 1.5, T = 7.2, and the mean firing rate was set at about \n.05. The optimization took 60 gradient-descent steps. \n\n5 Discussion and Conclusions \n\nSparsely connected networks of neurons have 'chaotic' properties which may be \nused for equalizing a set of patterns in order to make their classification easier. In \nstudying the properties of such networks we extend previous results on networks \nwith 00 neurons by van Vreeswijk and Sompolinsky to the case of small number \nof neurons. We also provide techniques for designing networks that have desired \naverage properties. Moreover, we propose a learning technique to make the net(cid:173)\nwork immune to noise around chosen trajectories while preserving the equalization \nproperty elsewhere. \n\n\fA number of issues are left open. A precise characterization of the effects of the DAL \non the distribution of the input parameters, and the consequent improvement in the \nease of pattern classification is still missing. The geometry of the map implemented \nby the DAL is also unclear. Finally, it would be useful to obtain a quantitative \nestimate for the 'capacity' of the DAL, i.e. the number of trajectories which can be \nlearned in any given network before older trajectories are forgotten. \n\nAcknowledgements \n\nWe would like to thank Or Neeman for useful suggestions and feedback. This work \nwas supported in part by the Engineering Research Centers Program of the National \nScience Foundation under Award Number EEC-9402726. \n\nReferences \n\n[1] Friedrich R. & Laurent, G. (2001) Dynamical optimization of odor representations \n\nby slow temporal patterning of mitral cell activity. Science 291:889-894. \n\n[2] Laurent G, Stopfer M, Friedrich RW, Rabinovich MI, Volkovskii A, Abarbanel HD. \n(2001) Odor encoding as an active, dynamical process: experiments, computation, \nand theory. Ann Rev Neurosci. 24:263-97. \n\n[3] Laurent G. (2002) Olfactory network dynamics and the encoding of multidimensional \n\nsignals. Nat Rev Neurosci 3(11):884-95. \n\n[4] McCulloch WS, Pitts W. (1943). A logical calculus of ideas immanent in nervous \n\nactivity. Bulletin of Mathematical Biophysics 5: 115-133. \n\n[5] van Vreeswijk C, Sompolinsky H (1998) Chaotic balanced state in a model of cortical \n\ncircuits. Neural Computation. 10(6): 1321-71. \n\n\f", "award": [], "sourceid": 2243, "authors": [{"given_name": "Alex", "family_name": "Holub", "institution": null}, {"given_name": "Gilles", "family_name": "Laurent", "institution": null}, {"given_name": "Pietro", "family_name": "Perona", "institution": null}]}