{"title": "Binary Tuning is Optimal for Neural Rate Coding with High Temporal Resolution", "book": "Advances in Neural Information Processing Systems", "page_first": 205, "page_last": 212, "abstract": "", "full_text": "Binary Thning is Optimal for\n\neural Rate\nCoding with High Temporal Resolution\n\nMatthias Bethge:David Rotermund, and Klaus Pawelzik\n\nInstitute of Theoretical Physics\n\nUniversity of Bremen\n\n28334 Bremen\n\n{mbethge,davrot,pawelzik}@physik.uni-bremen.de\n\nAbstract\n\nHere we derive optimal gain functions for minimum mean square re(cid:173)\nconstruction from neural rate responses subjected to Poisson noise. The\nshape of these functions strongly depends on the length T of the time\nwindow within which spikes are counted in order to estimate the under(cid:173)\nlying firing rate. A phase transition towards pure binary encoding occurs\nif the maximum mean spike count becomes smaller than approximately\nthree provided the minimum firing rate is zero. For a particular function\nclass, we were able to prove the existence of a second-order phase tran(cid:173)\nsition analytically. The critical decoding time window length obtained\nfrom the analytical derivation is in precise agreement with the numerical\nresults. We conclude that under most circumstances relevant to informa(cid:173)\ntion processing in the brain, rate coding can be better ascribed to a binary\n(low-entropy) code than to the other extreme of rich analog coding.\n\n1 Optimal neuronal gain functions for short decoding time windows\n\nThe use of action potentials (spikes) as a means of communication is the striking feature of\nneurons in the central nervous system. Since the discovery by Adrian [1] that action poten(cid:173)\ntials are generated by sensory neurons with a frequency that is substantially determined by\nthe stimulus, the idea of rate coding has become a prevalent paradigm in neuroscience [2].\nIn particular, today the coding properties of many neurons from various areas in the cortex\nhave been characterized by tuning curves, which describe the average firing rate response\nas a function of certain stimulus parameters. This way of description is closely related to\nthe idea of analog coding, which constitutes the basis for many neural network models.\nReliablv inference from the observed number of spikes about the underlying firing rate of\na neuronal response, however, requires a sufficiently long time interval, while integration\ntimes of neurons in vivo [3] as well as reaction times of humans or animals when per(cid:173)\nforming classification tasks [4, 5] are known to be rather short. Therefore, it is important\nto understand, how neural rate coding is affected by a limited time window available for\ndecoding.\n\nWhile rate codes are usually characterized by tuning functions relating the intensity of the\n\n,f\n\n*http://www.neuro.urn-bremen.dermbethge\n\n\fneuronal response to a particular stimulus parameter, the question, how relevant the idea of\nanalog coding actually is does not depend on the particular entity represented by a neuron.\nInstead it suffices to determine the shape of the gain function, which displays the mean fir(cid:173)\ning rate as a function of the actual analog signal to be sent to subsequent neurons. Here we\nseek for optimal gain functions that minimize the minimum average squared reconstruction\nerror for a uniform source signal transmitted through a Poisson channel as a function of the\nmaximum mean number of spikes.\n\nIn formal terms, the issue is to optimally encode a real random variable x in the number\nof pulses emitted by a neuron within a certain time window. Thereby, x stands for the\nintended analog output of the neuron that shall be signaled to subsequent neurons. The\nlatter, however, can only observe a number of spikes k integrated within a time interval of\nlength T. The statistical dependency between x and k is specified by the assumption of\nPoisson noise\n\np(kIJL(x)) = (JL~))k exp{-JL(X)} ,\n\n(1)\n\n(2)\n\nand the choice of the gain function f(x), which together with T determines the mean spike\ncount J.L(x) == T f(x) . An important additional constraint is the limited output range of the\nneuronal firing rate, which can be included by the requirement of a bounded gain function\n(fmin :::; f (x) :::; f max, VX). Since inhibition can reliably prevent a neuron from firing,\nwe will here consider the case f min == 0 only.\nInstead of specifying f max, we impose\n/l), because f max constitutes a\nJ.L(x)\na bound directly on the mean spike count (i.e.\nmeaningful constraint only in conjunction with a fixed time window length T.\n\n:::;\n\nAs objective function we consider the minimum mean squared error (MMSE) with respect\nto Lebesgue measure for x E [0, 1],\n\n2 X _ E x2 _ E (i2 _ ~ _ (Xl\nX [jt( )] -\n\n[]\n\n[] - 3 ~ J01p(kIJL(x)) dx'\n\n(J01 xp(kIJL(x)) dxr\n\nwhere x(k) == E[xlk] denotes the mean square estimator, which is the conditional expec(cid:173)\ntation (see e.g. [6]).\n\n1.1 Tunings and errors\n\nAs derived in [7] on the basis of Fisher information the optimal gain function for a single\nneuron in the asymptotic limit T -+ 00 has a parabolic shape:\n\nfasymp(x) == fmaxx2 .\n\n(3)\n\nFor any finite /l, however, this gain function is not necessarily optimal, and in the limit\nT -+ 0, it is straight forward to show that the optimal tuning curve is a step function\n\nf step (xl'19) == fmax 8 (x - {)) ,\n\n(4)\nwhere 8(z) denotes the Heaviside function that equals one, if z > 0 and zero if z < O.\nThe optimal threshold 'l9(p,) of the step tuning curve depends on /l and can be determined\nanalytically\n\n11(-) = 1 _ 3 - V8e-J.' + 1\n\nIt\n\n4(1 - e-il )\n\nas well as the corresponding MMSE [8]:\n\n2[fstep] _ 1 (\n\n- 12\n\nX\n\n1 -\n\n3'19\n\n2 (p,)\n\n[(1 -11(p))(l - e-iL)]-1 - 1\n\n(5)\n\n(6)\n\n)\n\n.\n\n\f1\n\nS\n+1 0.5\nCJ;)\n\no'------'-----'---'---'--'~----'----'--...............~---'---'---'--'~\n10-1 ~---,.---,---.,...............---.----.---.---.-.......-.-.--.-~ ...............~\n\nFigure 1: The upper panel shows a bifurcation plot for {}(Jt) - wand {}(Jt) + w of the\noptimal gain function in 51 as a function of {t illustrating the phase transition from binary\nto continuous encoding. The dotted line separates the regions before and after the phase\ntransition in all three panels. Left from this line (i.e. for Jt < Jt C) the step function given by\nEq. 4+5 is optimal. The middle panel shows the MMSE of this step function (dashed) and\nof the optimal gain function in 52 (solid), which becomes smaller than the first one after\nthe phase transition. The relative deviation between the minimal errors of 51 and 52 (i.e.\n(X~l - X~2)/X~2) is displayed in the lower panel and has a maximum below 0.035.\n\nThe binary shape for small {t and the continuous parabolic shape for large {t implies that\nthere has to be a transition from discrete to analog encoding with increasing {to Unfortu(cid:173)\nnately it is not possible to determine the optimal gain function within the set of all bounded\nfunctions B :== {fli : [0, 1] -+ [0, fmax]} and hence, one has to choose a certain param(cid:173)\neterized function space 5 c B in advance that is feasible for the optimization. In [8], we\ninvestigated various such function-'spaces and for {t < 2.9, we did not find any gain func(cid:173)\ntion with an error smaller than the MMSE of the step function. Furthermore, we always\nobserved a phase transition from binary to analog encoding at a critical {t C that depends\nonly slightly on the function space. As one can see in Fig. 1 (upper) pc is approximately\nthree.\n\n\fIn this paper, we consider two function classes 51, 52, which both contain the binary gain\nfunction as well as the asymptotic optimal parabolic function as special cases. Furthermore\n51 is a proper subset of 52. Our interest in 51 results from the fact that we can analyze the\nphase transition in this subset analytically, while 52 is the most general parameterization\nfor which we have. determined the optimal encoding numerically. The latter has six free\nparameters a :::; b :::; c E [0, 1], fmid E (0, fmax), a, f3 E [0,00) and the parameterization\nof the gain functions is given by\n\nfS2 (xla, b, c, fmid, a, (3) ==\n\no\nfmid ( ~=:)<>\nfmid + (Imam -\nfmax\n\nfmid) (~=:)f3\n\n, Oid (k + ~)\nvrr;;;;a.\nk'\n.\n(c - b) r !n>id,f\", a .. (k + ~)\n\nm~d\n\na\n\nfJ( ijfmam - {Ifmid)\n\n+ (1 - c)fmame\n\nk\n\n-!n>a.. }\n\n,\n\n(7)\n\n(8)\n\n(9)\n\nwhere ru,v(z) == J~ sz-l e-s.ds denotes the truncated Gamma function. Numericalop(cid:173)\ntimization leads to the minimal MMSE as a function of Jl as displayed in Fig. 1 (middle).\nThe parameterization of the gain functions in 51 is given by\n\no< x < 'l9(p) - w\niJ(jj) - W < x < f}(Jl) + w\niJ(Jl) + w < x < 1\n\n,\n,\n\n(10)\n\nwith W E [0, 1] and, E [0, 00). The integrals entering Eq. 2 for the MMSE in case of the\n\n\fgain function i S1 read\n\n11\n\nx p(klx) dx\n\n+\n\n+\n\n11\n\np(klx) dx\n\n2\n\n4w\n\n1 {(1?(jl) - W)2\nk!\n2w(1?(jl) - w)ro,f\",az (k + ~)\n\n80 ,k +\n\n2rO,t\",az (k + ~)\n,( ~)2\n\n,?/fmax\n(i1(JL) + W)2 fk\n\n2\n\nmax e\n\n-f'TTLa~}\n\n2wro,t\",az (k + ~)\n(1?(J.t) - W)80 ,k + ~ .\n\n_\n\n,\n\nmax\n\n1 -\n\n1 {\nk l\n.\n\n(11)\n\n+\n\n(1 - i1(JL) - w)f~axe-fTnatD }\n\n(12)\n\nThe minimal MMSE for these gain functions is only slightly worse than that for 52. The\nrelative difference between both is plotted in Fig. 1 (lower) showing a maximum deviation\nof 3.2%. In particular, the relative deviation is extremely small around the phase transition.\nThis comparison suggests that a restriction to 51, which is a necessary simplification for\nthe following analytical investigation, does not change the qualitative results.\n\n2 A phase transition\n\nThe phase transition from binary to analog encoding corresponds to a structural change of\nthe objective function X2 (w, ,). In particular, the optimality of binary encoding for JL < JL C\nimplies that X2 (w, ,) has a minimum at w == O. The existence of a phase transition implies\nthat with increasing JL this minimum changes into a local maximum at a certain critical\npoint JL == fic. Therefore, the critical point can be determined by a local expansion of\n\nX2(w\",jl) - X2(O\",jl) = L9k(A,jl) ~!\n\n00\n\nk\n\n(13)\n\nk==1\n\naround w == 0, because the sign of its leading coefficient A, (JL) (i.e. the coefficient 9k with\nminimal k that does not vanish identically) determines, whether X2 (w\", p,) has a local\nminimum or maximum at w == O. Accordingly, the critical point is given as the solution of\nA,(JL) == O.\nWith quite a bit of efforts one can prove that the first derivative of X2 (w, \"\nfi) vanishes for\nall fi. The second derivative, however, is a decreasing function of JL and hence constitutes\nthe wanted leading coefficient\n\n_I\n\n{8 _ 7eli + I6e2li + e3li\n\n4(eP - 1)2\nVI + 8e-P (2 + eP (-3 + eP(6 + eP)))\n\n+ (I6eli - 48e2li - 4e3li + VI + 8e-~ (4eli - 8 (4 + eli))) jl~~ rO,li (~)\n+ (8e2li + 2(5 - 3VI + 8e-li) e3li) jl~2~r~'li (~)\n\n\f~c\n\n5\n\n4\n\n3\n\n2\n\n1\n\n2\n\n3\n\nV\n\nFigure 2: The critical maximum mean spike count J-lc is shown as a function of, (numerical\nevaluation at, E {O.5, 0.505, 0.51, ... , 3.5}). The minimum J-lc = 2.98291 \u00b1 10-7 at\n, = 1.9 determines the phase transition in 8 1 .\n\n16eft (eft - 1) (VI + 8e-ft - 3) p~~ rO,ft (~)\n\n(14)\n\n+ 2e2ft (eft - 1) (VI + 8e- 1t - 3) P,;2'i\n\n2\n\n1ft e-S/~ry (1-~)-~ rO,ft-S (~) dS}\n\nObviously, it is not possible to write the zeros of A,(p,) in a closed form. The numerical\nevaluation of the critical point jJ C ( , ) as a function of, is displayed in Fig. 2. Note, that we\nhave treated, as a fixed parameter, which means that we determine the critical point of the\nphase transition in all subsets 8 1 (,) of 8 1 that correspond to a fixed,. It is straight forward\nto show that the critical point [t C with respect to the entire class 8 1 is given by the minimum\nof [tC(,). We determined this value up to a precision of \u00b1O.OOOl to be pc = 2.9857.\n\n3 Conclusion\n\nOur study reveals that optimal encoding with respect to the minimum mean squared error\nis binary for maximum mean spike counts smaller than approximately three. Within the\nfunction class 8 1 we determined a second-order phase transition from binary to continuous\nencoding analytically. With respect to mutual information the advantage of binary encod(cid:173)\ning holds even up to a maximum mean spike count of about 3.5 (results not shown) and\nremains discrete also for larger [t.\nIn a related work [9], Softky compared the informa(cid:173)\ntion capacity of the Poisson channel with the information rate of a (noiseless) binary pulse\ncode. The rate of the latter turned out to exceed the capacity of the former at a factor of\nat least 72 demonstrating a clear superiority of binary coding over analog rate coding. Our\nrate-distortion analysis of the Poisson channel differs from that comparison in a twofold\nway: First, we do not change-the noise model and second, the MMSE is often more appro(cid:173)\npriate to account for the coding efficiency than the channel capacity [10]. In particular, the\nassumption of a real random variable to be encoded with minimal mean squared error loss\nappears to introduce a bias for analog coding rather than for binary coding. Nevertheless,\n\n\fassuming a high temporal precision (i.e. small integration times T), our results hint into a\nsimilar direction, namely that binary coding seems to be a more reasonable choice even if\none supposes that the only means of neuronal communication would be the transmission of\nPoisson distributed spike counts.\n\nMethodologically, our analysis is similar to many theoretical studies of population coding\nif f(x) == J-l(x)/T is not interpreted as the neuron's gain function, but as a tuning function\nwith respect to a stimulus parameter x. Though conceptually different, s9me readers may\ntherefore wish to know whether binary coding is still advantageous if many neurons, say N,\ntogether encode for a single analog. value. While the approach chosen in this paper is not\nfeasible in case of large N, a partial answer can be given: For the efficiency of population\ncoding redundancy reduction is most important [7,8, 11]. Smooth tuning curves, which\nhave a dynamic range at about the same size as the signal range always lead to a large\namount of redundancy so that the MMSE can not decrease faster than N-1 . In contrast the\nMMSE of binary tuning functions scales proportional to N- 2 or even faster. This holds\nalso true for tuning functions, which are not perfectly binary, but have a dynamic range that\nis at least smaller than the signal range divided by N. Independent from jj this implies that\na small dynamic range is always advantageous in case of population coding.\n\nIn contrast, most experimental studies do not report on binary or steep tuning functions,\nbut show smooth tuning curves only. However, the shape of a tuning function always de(cid:173)\npends on the stimulus set used. Only recently, experimental studies under natural stimulus\nconditions provided evidence for the idea that neuronal encoding is essentially binary [12J.\nParticularly striking is this observation for the HI neuron of the fly [13J, for which the\nfunctional role is probably better understood than for most other neurons that have been\ncharacterized by tuning functions.\n\nWhile the noise level of the Poisson channel studied in this paper is rather large, the HI\nneuron can respond very reliably under optimal stimulus conditions [13J. Another example\nof a low-noise binary code has been found in the auditory cortex [14J.\nIf we drop the\nrestriction to Poisson noise and impose a hard constraint on the maximum number of spikes\ninstead, optimal encoding is always discrete with J-l(x) taking integer values only [15]. This\nis easy to grasp, because any rational J-l can not serve to increase the entropy of the available\nsymbol set (i.e. the candidate spike counts), but only increases the noise entropy instead.\nIn other words, it is the simple fact that spike counts are discrete by nature, which already\nseverely limits the possibility of graded rate coding. Clearly, this is not so obvious in case\nof the Poisson channel, if there is no hard constraint imposed on the maximum spike count.\n\nA remarkable aspect of the neuronal response of HI shown in [13J is that it becomes the\nmore binary the less noisy the stimulus conditions are (the noise level is determined by\nthe different light conditions at midday, half an hour before, and half an hour after sun(cid:173)\nset). This suggests an interesting hypothesis why choosing a binary code with very high\ntemporal precision might be advantageous even if the signal of interest by itself does not\nchange at that time scale: the sensory inputmay sometimes be too noisy, so that repeated,\nindependent samples from the signal of interest may sometimes lead to neuronal firing and\nsometimes not. In other words, a binary code at the short time scale is useful independent\nfrom the correlation time of the signal to be encoded, if uncertainties have to be taken into\naccount, because any surplus available amount of temporal precision is maximally used for\nuncertainty representation in a self-adjusting manner. Furthermore, this Monte-Carlo type\nof uncertainty representation features several computational advantages [16]. Finally, it is\na remarkable fact that this property is unique for a binary code, because the representation\nof uncertainty is necessary for many information processing tasks solved by the brain.\n\nAdditional support for the potential relevance of a binary neural code comes from intracel(cid:173)\nlular recordings in vivo revealing that the subthreshold membrane potential of many cortical\ncells switches between up and down states [17J depending on the stimulus. Furthermore,\n\n\fthe dynamics of bursting cells plays an important role for neuronal signal transmission [18]\nand may also be seen as evidence for binary rate coding. In light of these experimental\nfacts, we conclude from our results that the idea of binary tuning constitutes\u00b7 an important\nhypothesis for neural coding.\n\nAcknowledgments\n\nThis work/was supported by the Deutsche Forschungsgesellschaft SFB 517.\n\nReferences\n\n[1] E.D. Adrian. The impulses produced by sensory nerve endings: Part i.\n\nJ. Physiol.\n\n(London), 61:49-72,1926.\n\n[2] D.H. Perkel and T.H. Bullock. Neural coding: a report based on an nrp work session.\n\nNeurosci. Research Prog. Bull., 6:220-349, 1968.\n\n[3] W.R. Softky and C. Koch. The hihgly irregular firing of cortical cells is inconsistent\n\nwith temporal integration of random epsps. J Neurosci., 13:334-350,1993.\n\n[4] C. Keysers, D. Xiao, P. Foldiak, and D. Perrett. The speed of sight. J. Cog. Neurosci.,\n\n13:90-101,2001.\n\n[5] S. Thorpe, D. Fize, and Marlot. Speed of processing in the human visual system.\n\nNature, 381:520-522,1996.\n\n[6] E.L. Lehmann and G. Casella. Theory ofpoint estimation. Springer, New York, 1999.\n[7] M. Bethge, D. Rotermund, and K. Pawelzik. Optimal short-term population coding:\n\nwhen fisher information fails. Neural Comput., 14(10):2317-2351,2002.\n\n[8] M. Bethge, D. Rotermund, and K. Pawelzik. Optimal neural rate coding leads to\n\nbimodal firing rate distributions. Network: Comput. Neural Syst., 2002. in press.\n\n[9] W.R. Softky. Fine analog coding minimizes information transmission. Neural Net(cid:173)\n\nworks, 9:15-24, 1996.\n\n[10] D.H. Johnson. Point process models of single-neuron discharges. J. Comput. Neu(cid:173)\n\nrosci., 3:275-299,1996.\n\n[11] M. Bethge and K. Pawelzik. Population coding with unreliable spikes. Neurocom(cid:173)\n\nputing, 44-46:323-328,2002.\n\n[12] P. Reinagel. How do visual neurons respond in the real world. Curro Gp. Neurobiol.,\n\n11:437-442,2001.\n\n[13] G.D. Lewen, W. Bialek, and R.R. de Ruyter van Steveninck. Neural coding of natural\n\nstimuli. Network: Comput. Neural Syst., 12:317-329,2001.\n\n[14] M.R. DeWeese and A.M. Zador. Binary coding in auditory cortex.\n\nIn S. Becker,\nS. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing\nSystems, volume 15, 2002.\n\n[15] A. Gersho and R.M. Grey. Vector quantization and signal compression. Kluwer,\n\nBoston, 1992.\n\n[16] P.O. Hoyer and A. Hyvarinen. Interpreting neural response variability as monte carlo\nIn S. Becker, S. Thrun, and K. Obermayer, editors, Ad(cid:173)\n\nsampling of the posterior.\nvances in Neural Information Processing Systems, volume 15, 2002.\n\n[17] J. Anderson, 1. Lampl, 1. Reichova, M. Carandini, and D. Ferster. Stimulus depen(cid:173)\ndence of two-state fluctuations of membrane potential in cat visual cortex. Nature\nNeurosci., 3:617-621,2000.\n\n[18] J.E. Lisman. Bursts as a unit of neural information processing: making unreliable\n\nsynapses reliable. TINS, 20:38-43, 1997.\n\n\f", "award": [], "sourceid": 2201, "authors": [{"given_name": "Matthias", "family_name": "Bethge", "institution": null}, {"given_name": "David", "family_name": "Rotermund", "institution": null}, {"given_name": "Klaus", "family_name": "Pawelzik", "institution": null}]}