{"title": "What Can a Single Neuron Compute?", "book": "Advances in Neural Information Processing Systems", "page_first": 75, "page_last": 81, "abstract": null, "full_text": "What can a single neuron compute? \n\nBlaise Agiiera y Areas, l Adrienne L. Fairhall, 2 and William Bialek2 \n\n1 Rare Books Library, Princeton University, Princeton, New Jersey 08544 \n\n2NEC Research Institute, 4 Independence Way, Princeton, New Jersey 08540 \n\nblaisea@prineeton. edu {adrienne, bialek} @researeh. nj. nee. com \n\nAbstract \n\nIn this paper we formulate a description of the computation per(cid:173)\nformed by a neuron as a combination of dimensional reduction \nand nonlinearity. We implement this description for the Hodgkin(cid:173)\nHuxley model, identify the most relevant dimensions and find the \nnonlinearity. A two dimensional description already captures a \nsignificant fraction of the information that spikes carry about dy(cid:173)\nnamic inputs. This description also shows that computation in the \nHodgkin-Huxley model is more complex than a simple integrate(cid:173)\nand-fire or perceptron model. \n\n1 \n\nIntroduction \n\nClassical neural network models approximate neurons as devices that sum their \ninputs and generate a nonzero output if the sum exceeds a threshold. From our \ncurrent state of knowledge in neurobiology it is easy to criticize these models as over(cid:173)\nsimplified: where is the complex geometry of neurons, or the many different kinds \nof ion channel, each with its own intricate multistate kinetics? Indeed, progress at \nthis more microscopic level of description has led us to the point where we can write \n(almost) exact models for the electrical dynamics of neurons, at least on short time \nscales. These nearly exact models are complicated by any measure, including tens \nif not hundreds of differential equations to describe the states of different channels \nin different spatial compartments of the cell. Faced with this detailed microscopic \ndescription, we need to answer a question which goes well beyond the biological \ncontext: given a continuous dynamical system, what does it compute? \nOur goal in this paper is to make this question about what a neuron computes some(cid:173)\nwhat more precise, and then to explore what we take to be the simplest example, \nnamely the Hodgkin- Huxley model [1],[2] (and refs therein). \n\n2 What do we mean by the question? \n\nReal neurons take as inputs signals at their synapses and give as outputs sequences \nof discrete, identical pulses-action potentials or 'spikes'. The inputs themselves \nare spikes from other neurons, so the neuron is a device which takes N '\" 103 pulse \ntrains as inputs and generates one pulse train as output. If the system operates at 2 \nmsec resolution and the window of relevant inputs is 20 msec, then we can think of \na single neuron as having an input described by a '\" x 104 bit word-the presence \nor absence of a spike in each 2 msec bin for each presynaptic cell-which is then \nmapped to a one (spike) or zero (no spike). More realistically, if the average spike \n\n\frates are'\" 10 sec-1, the input words can be compressed by a factor of ten. Thus we \nmight be able to think about neurons as evaluating a Boolean function of roughly \n1000 Boolean variables, and then characterizing the computational function of the \ncell amounts to specifying this Boolean function. \nThe above estimate, though crude, makes clear that there will be no direct empirical \nattack on the question of what a neuron computes: there are too many possibilities \nto learn the function by brute force from any reasonable set of experiments. Progress \nrequires the hypothesis that the function computed by a neuron is not arbitrary, but \nbelongs to a simple class. Our suggestion is that this simple class involves functions \nthat vary only over a low dimensional subspace of the inputs, and in fact we will \nstart by searching for linear subspaces. \nSpecifically, we begin by simplifying away the spatial structure of neurons and take \ninputs to be just injected currents into a point- like neuron. While this misses some \nof the richness in real cells, it allows us to focus on developing our computational \nmethods. Further, it turns out that even this simple problem is not at all trivial. If \nthe input is an injected current, then the neuron maps the history of this current, \nI(t < to), into the presence or absence of a spike at time to. More generally we might \nimagine that the cell (or our description) is noisy, so that there is a probability of \nspiking P[spike@toII(t < to)] which depends on the current history. We emphasize \nthat the dependence on the history of the current means that there still are many \ndimensions to the input signal even though we have collapsed any spatial variations. \nIf we work at time resolution flt and assume that currents in a window of size T \nare relevant to the decision to spike, then the inputs live in a space of D = T / flt, \nof order 100 dimensions in many interesting cases. \nIf the neuron is sensitive only to a low dimensional linear subspace, we can define \na set of signals S1, S2,\u00b7\u00b7\u00b7, SK by filtering the current, \n\ns,.. = 100 dtf,..(t)I(to - t), \n\n(1) \n\nso that the probability of spiking depends only on this finite set of signals, \n\nP[spike@toII(t < to)] = P[spike@to]g(s1,s2,\u00b7 .. ,SK), \n\n(2) \nwhere we include the average probability of spiking so that 9 is dimensionless. If we \nthink of the current I(t < to) as a vector, with one dimension for each time sample, \nthen these filtered signals are linear projections of this vector. \nIn this formulation, characterizing the computation done by a neuron means esti(cid:173)\nmating the number of relevant stimulus dimensions (K, hopefully much less than \nD), identifying the filters which project into this relevant subspace,! and then char(cid:173)\nacterizing the nonlinear function g(8) . The classical perceptron- like cell of neural \nnetwork theory has only one relevant dimension and a simple form for g. \n\n3 \n\nIdentifying low-dimensional structure \n\nThe idea that neurons might be sensitive only to low-dimensional projections of \ntheir inputs was developed explicitly in work on a motion sensitive neuron of the \nfly visual system [3]. Rather than looking at the distribution P[spike@tols(t < to)], \nwith s(t) the input signal (velocity of motion across the visual field in [3]), that \nwork considered the distribution of signals conditional on the response, P[s(t < \nto)lspike@to]; these are related by Bayes' rule, \n\nP[spike@tols(t < to)] = P[s(t < to)lspike@to] \n\n(3) \n\n__ _ ___ ----'P=-[=sp=ik=e:..::@=to]P[s(t tc the distribution decays \nexponentially, which means that the system has lost memory of the previous spike; \nthus spikes which are more than tc after the previous spike are isolated. \nIn what follows we consider the response of the Hodgkin- Huxley model to currents \nI(t) with zero mean, 0.275 nA standard deviation, and 0.5 msec correlation time. \n\n5 How many dimensions? \n\nFig. 1 shows the change in covariance matrix f1C( r, r') for isolated spikes in our HH \nsimulation, and fig. 2(a) shows the resulting spectrum of eigenvalues as a function \nof sample size. The result strongly suggests that there are many fewer than D \nrelevant dimensions. In particular, there seem to be two outstanding modes; the \nSTA itself lies largely in the subspace of these modes, as shown in Fig. 2(b). \n\n0.01 \n\n~ 0.00 \nS \n~ \n\nt' ({l\\sec) \n\nFigure 1: The isolated spike triggered covariance matrix f1C(r,r'). \n\nThe filters themselves, shown in fig. 3, have simple forms; in particular the second \nmode is almost exactly the derivative of the first. If the neuron filtered its inputs \nand generated a spike when the output of the filter crosses threshold, we would \nfind that there are two significant dimensions, corresponding to the filter and its \nderivative. It is tempting to suggest, then, that this is a good approximation to the \nHH model, but we will see that this is not correct. Notice also that both filters have \nsignificant differentiating components- the cell is not simply integrating its inputs. \nAlthough fig. 2(a) suggests that two modes dominate, it also demonstrates that the \nsmaller nonzero eigenvalues of the other modes are not just noise. The width of any \nspectral band of eigenvalues near zero due to finite sampling should decline with \nincreasing sample size. However, the smaller eigenvalues seen in fig. 2(a) are stable. \nThus while the system is primarily sensitive to two dimensions, there is something \n\n\f02 \n\n(a) \n\n0.5 \n\n2 \n\n(b) \n\nCl'=O_ iQ- =-- - _ _ \"\"\"'_ \n\n20 \n\n.\u00a7 0.0 \n13 \nOJ \"e-\n\"(cid:173)\n\n-0.5 \n\n-1.0 \n\n1 \n\n10+3 \n\n10+4 \n\n10+5 \n\n10+6 \n\nnumber of spikes accu mulated \n\nFigure 2: (a) Convergence ofthe largest 32 eigenvalues of the isolated spike triggered \ncovariance with increasing sample size_ (b) Projections of the isolated STA onto \nthe covariance modes_ \n\neigenmodes 1 and 2 \n\n-\n...... .. normalized derivative of mode 1 \n\n-30 \n\n-25 \n\n-20 \n\nFigure 3: Most significant two modes of the spike-triggered covariance_ \n\nmissing in this picture. To quantify this, we must first characterize the nonlinear \nfunction g(81' 82). \n\n6 Nonlinearity and information \n\nAt each instant of time we can find the relevant projections of the stimulus 81 and \n82. By construction, the distribution of these signals over the whole experiment, \nP(81, 82), is Gaussian. On the other hand, each time we see a spike we get a sample \nfrom the distribution P(81' 82Ispike@to), leading to the picture in fig. 4. The prior \nand spike conditional distributions clearly are better separated in two dimensions \nthan in one, which means that our two dimensional description captures more than \nthe spike triggered average. Further, we see that the spike conditional distribution \nis curved, unlike what we would expect for a simple thresholding device. \nCombining eq's. (2) and (3), we have \n\n( \n\n9 81, 82 -\n\n) _ P(81,82Ispike@to) \n, \n\nP( \n\n) \n81,82 \n\n(9) \n\nso that these two distributions determine the input/output relation of the neuron \nin this 2D space. We emphasize that although the subspace is linear, 9 can have \narbitrary nonlinearity. Fig. 4 shows that this input/output relation has sharp \nedges, but also some fuzziness. The HH model is deterministic, so in principle the \ninput/output relation should be a c5 function: spikes occur only when certain exact \nconditions are met. Of course we have blurred things a bit by working at finite time \n\n\f-w 2 \no \n~ \n.~ \n\"0 \n\"E a \n'\" \"0 \n~ \n:\u00a7.. \n\nN \nen \n\n-2 \n\n-4 \n\n4 \n\n2 \n~ \ns, (standard deviations) \n\na \n\nFigure 4: 104 spike-conditional stimuli projected along the first 2 covariance modes. \nThe circles represent the cumulative radial integral of the prior distribution from \n00; the ring marked 10-4, for example, encloses 1 - 10-4 of the prior. \n\nresolution. Given that we work at finite llt, spikes carry only a finite amount of \ninformation, and the quality of our 2D approximation can be judged by asking how \nmuch of this information is captured by this description. \nAs explained in [5], the arrival time of a single spike provides an information \n\nlonespike = ( r~) log2 [r~)] ), \n\n(10) \n\nwhere r(t) is the time dependent spike rate, f is the average spike rate, and ( . . . ) \ndenotes an average over time. With a deterministic model like HH, the rate r(t) \neither is zero or corresponds to one spike occurring in one bin of size llt, that is \nr = l/11t. The result is that lonespike = -log2(fllt). \nOn the other hand, if the probability of spiking really depends only on the stimulus \ndimensions 81 and 82, we can substitute \n\nr(t) \n-\nf \n\nP(81,82Ispike@t) \n-+ ---'--=::::-:-=--'--=----:---\"-\nP(81,82)' \n\n(11) \n\nand use the ergodicity of the stimulus to replace time averages in Eq. (10). Then \nwe find [3, 5] \n\n(12) \n\nIf our two dimensional approximation were exact we would find l~~~s:pike = lone spike; \nmore generally we will find 1~~~ss2pike ~ lone spike, and the fraction of the information \nwe capture measures the quality of the approximation. This fraction is plotted in \nfig. 5 as a function of time resolution. For comparison, we also show the information \ncaptured by considering only the stimulus projection along the STA. \n\n\f-+- Covariance modes 1 and 2 (2D) \n\n02 ~~----~~----~----~--~ \n\n6 \n\nB \n\n10 \n\ntime discretization (msec) \n\nFigure 5: Fraction of spike timing information captured by STA (lower curve) and \nprojection onto covariance modes 1 and 2 (upper curve). \n\n7 Discussion \n\nThe simple, low-dimensional model described captures a substantial amount of \ninformation about spike timing for a HH neuron. The fraction is maximal near \nbot = 5.5msec, reaching nearly 70%. However, the absolute information captured \nsaturates for both the 1D and 2D cases, at RJ 3.5 and 5 bits respectively, for smaller \nbot. Hence the information fraction captured plummets; recovering precise spike \ntiming requires a more complex, higher dimensional representation of the stimulus. \nIs this effect important, or is timing at this resolution too noisy for this extra \ncomplexity to matter in a real neuron? Stochastic HH simulations have suggested \nthat, when realistic noise sources are taken into account, the timing of spikes in \nresponse to dynamic stimuli is reproducible to within 1- 2 msec [6]. This suggests \nthat such timing details may indeed be important. \nEven in 2D, one can observe that the spike conditional distribution is curved (fig. 4); \nit is likely to curve along other dimensions as well. It may be possible to improve our \napproximation by considering the computation to take place on a low-dimensional \nbut curved manifold, instead of a linear subspace. The curvature in Fig. 4 also \nimplies that the computation in the HH model is not well approximated by an \nintegrate and fire model, or a perceptron model limited to linear separations. \nCharacterizing the complexity of the computation is an important step toward \nunderstanding neural systems. How to quantify this complexity theoretically is \nan area for future work; here, we have made progress toward this goal by describing \nsuch computations in a compact way and then evaluating the completeness of the \ndescription using information. The techniques presented are applicable to more \ncomplex models, and of course to real neurons. How does the addition of more \nchannels increase the complexity of the computation? Will this add more relevant \ndimensions or does the non-linearity change? \n\nReferences \n\n[1] A. Hodgkin and A. Huxley. J. Physiol., 117, 1952. \n[2] C. Koch. Biophysics of computation. New York: Oxford University Press, 1999. \n[3] W. Bialek and R. de Ruyter van Steveninck. Proc. R. Soc. Lond. B, 234, 1988. \n[4] F. Rieke, D. Warland, R. de Ruyter van Steveninck and W. Bialek. Spikes: exploring \n\nthe neural code. Cambridge, MA: MIT Press, 1997. \n\n[5] N. Brenner, S. Strong, R. Koberle, W. Bialek and R. de Ruyter van Steveninck. \n\nNeural Comp., 12, 2000. \n\n[6] E. Schneidman, R. Freedman and I. Segev. Neural Comp., 10, 1998. \n\n\f", "award": [], "sourceid": 1867, "authors": [{"given_name": "Blaise", "family_name": "Ag\u00fcera y Arcas", "institution": null}, {"given_name": "Adrienne", "family_name": "Fairhall", "institution": null}, {"given_name": "William", "family_name": "Bialek", "institution": null}]}