{"title": "Efficient Neural Codes under Metabolic Constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 4619, "page_last": 4627, "abstract": "Neural codes are inevitably shaped by various kinds of biological constraints, \\emph{e.g.} noise and metabolic cost. Here we formulate a coding framework which explicitly deals with noise and the metabolic costs associated with the neural representation of information, and analytically derive the optimal neural code for monotonic response functions and arbitrary stimulus distributions. For a single neuron, the theory predicts a family of optimal response functions depending on the metabolic budget and noise characteristics. Interestingly, the well-known histogram equalization solution can be viewed as a special case when metabolic resources are unlimited. For a pair of neurons, our theory suggests that under more severe metabolic constraints, ON-OFF coding is an increasingly more efficient coding scheme compared to ON-ON or OFF-OFF. The advantage could be as large as one-fold, substantially larger than the previous estimation. Some of these predictions could be generalized to the case of large neural populations. In particular, these analytical results may provide a theoretical basis for the predominant segregation into ON- and OFF-cells in early visual processing areas. Overall, we provide a unified framework for optimal neural codes with monotonic tuning curves in the brain, and makes predictions that can be directly tested with physiology experiments.", "full_text": "Ef\ufb01cient Neural Codes under Metabolic Constraints\n\nZhuo Wang \u2217\u2020\n\nDepartment of Mathematics\nUniversity of Pennsylvania\n\nwangzhuo@nyu.edu\n\nXue-Xin Wei \u2217\u2021\n\nDepartment of Psychology\nUniversity of Pennsylvania\n\nweixxpku@gmail.com\n\nAlan A. Stocker\n\nDepartment of Psychology\nUniversity of Pennsylvania\nastocker@sas.upenn.edu\n\nDepartment of Electrical and System Engineering\n\nDaniel D. Lee\n\nUniversity of Pennsylvania\n\nddlee@seas.upenn.edu\n\nAbstract\n\nNeural codes are inevitably shaped by various kinds of biological constraints, e.g.\nnoise and metabolic cost. Here we formulate a coding framework which explicitly\ndeals with noise and the metabolic costs associated with the neural representation of\ninformation, and analytically derive the optimal neural code for monotonic response\nfunctions and arbitrary stimulus distributions. For a single neuron, the theory\npredicts a family of optimal response functions depending on the metabolic budget\nand noise characteristics. Interestingly, the well-known histogram equalization\nsolution can be viewed as a special case when metabolic resources are unlimited.\nFor a pair of neurons, our theory suggests that under more severe metabolic\nconstraints, ON-OFF coding is an increasingly more ef\ufb01cient coding scheme\ncompared to ON-ON or OFF-OFF. The advantage could be as large as one-fold,\nsubstantially larger than the previous estimation. Some of these predictions could\nbe generalized to the case of large neural populations. In particular, these analytical\nresults may provide a theoretical basis for the predominant segregation into ON-\nand OFF-cells in early visual processing areas. Overall, we provide a uni\ufb01ed\nframework for optimal neural codes with monotonic tuning curves in the brain, and\nmakes predictions that can be directly tested with physiology experiments.\n\n1\n\nIntroduction\n\nThe ef\ufb01cient coding hypothesis [1, 2] plays a fundamental role in understanding neural codes,\nparticularly in early sensory processing. Going beyond the original idea of redundancy reduction by\nHorace Barlow [2], ef\ufb01cient coding has become a general conceptual framework for studying optimal\nneural coding [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. Ef\ufb01cient coding theory hypothesizes that the\nneural code is organized in a way such that maximal information is conveyed about the stimulus\nvariable. Notably, any formulation of ef\ufb01cient coding necessarily relies on a set of constraints due\nto real world limitations imposed on neural systems. For example, neural noise, metabolic energy\nbudgets, tuning curve characteristics and the size of the neural population all can have impacts on the\nquality of the neural code.\nMost previous studies have only considered a small subset of these constraints. For example, the\noriginal redundancy reduction argument proposed by Barlow has focused on utilizing the dynamical\n\n\u2217equal contribution\n\u2020current af\ufb01liation: Center for Neural Science, New York University\n\u2021current af\ufb01liation: Department of Statistics and Center for Theoretical Neuroscience, Columbia University\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\frange of the neurons ef\ufb01ciently [2, 15], but did not take neural noise model and energy consumption\ninto consideration. Some studies explicitly dealt with the metabolic costs of the system but did not\nconsider the constraints imposed by the limited \ufb01ring rates of neurons as well as their detailed tuning\nproperties [16, 7, 17, 18]. As another prominent example, histogram equalization has been proposed\nas the mechanism for determining the optimal tuning curve of a single neuron with monotonic\nresponse characteristics [19]. However, this result only holds for a speci\ufb01c neural noise model and\ndoes not take metabolic costs into consideration either. In terms of neural population, most previous\nstudies have focused on bell-shaped tuning curves. Optimal neural coding for neural population with\nmonotonic tuning curves have received much less attention [20, 21].\nWe develop a formulation of ef\ufb01cient coding that explicitly deals with multiple biologically relevant\nconstraints, including neural noise, limited range of the neural output, and metabolic consumption.\nWith this formulation, we can study neural codes based on monotonic response characteristics that\nhave been frequently observed in biological neural systems. We are able to derive analytical solutions\nfor a wide range of conditions in the small noise limit. We present results for neural populations of\ndifferent sizes, including the cases of a single neuron, pairs of neurons, as well as a brief treatment for\nlarger neural populations. The results are in general agreements with observed coding schemes for\nmonotonic tuning curves. The results also provide various quantitative predictions which are readily\ntestable with targeted physiology experiments.\n\n2 Optimal Code for a Single Neuron\n\n2.1 Models and Methods\n\nWe start with the simple case where a scalar stimulus s with prior p(s) is encoded by a single neuron.\nTo model the neural response for a stimulus s, we \ufb01rst denote the mean output level as a deterministic\nfunction h(s). Here h(s) could denote the mean \ufb01ring rate in the context of rate coding or just the\nmean membrane potential. In either case, the actual response r is noisy and can be modeled by a\nprobabilistic model P (r|h(s)). Throughout the paper, we limit the neural codes to be monotonic\nfunctions h(s). The mutual information between the input stimulus r and the neural response is\ndenoted as MI(s, r).\nWe formulate the ef\ufb01cient coding problem as the maximization of the mutual information between the\nstimulus and the response, e.g., MI(s, r) [3]. To complete the formulation of this problem, it is crucial\nto choose a set of constraints which characterizes the limited resource available to the neural system.\nOne important constraint is the \ufb01nite range of the neural output [19]. Another constraint is on the\nmean metabolic cost [16, 7, 17, 18], which limits the mean activity level of neural output, averaged\nover the stimulus prior. Under these constraints, the ef\ufb01cient coding problem can mathematically be\nformulated as following:\n\nmaximize MI(s, r)\nsubject to 0 \u2264 h(s) \u2264 rmax,\n\n(s) \u2265 0\n\nh(cid:48)\nEs[K(h(s))] \u2264 Ktotal\n\n(range constraint)\n(metabolic constraint)\nWe seek the optimal response function h(s) under various choices of the neural noise model P (r|h(s))\nand certain metabolic cost function K(h(s)), as discussed below.\nNeural Noise Models: Neural noise can often be well characterized by a Poisson distribution at\nrelatively short time scale [22]. Under the Poisson noise model, the number of spikes NT over a\nduration of T is a Poisson random variable with mean h(s)T and variance h(s)T . In the long T limit,\nthe mean response r = NT /T approximately follows a Gaussian distribution\n\nr \u223c N (h(s), h(s)/T )\n\n(1)\nNon-Poisson noise have also been observed physiologically. In these cases, the variance of response\nNT can be greater or smaller than the mean \ufb01ring rate [22, 23, 24, 25]. We thus consider a more\ngeneric family of noise models parametrized by \u03b1\n\n(2)\nThis generalized family of noise model naturally includes the additive Gaussian noise case (when\n\u03b1 = 0), which is useful for describing the stochasticity of the membrane potential of a neuron.\n\nr \u223c N (h(s), h(s)\u03b1/T )\n\n2\n\n\fMetabolic Cost: We model the metabolic cost K is a power-law function of the neural output\n\nK(h(s)) = h(s)\u03b2\n\n(3)\n\nwhere \u03b2 > 0 is a parameter to model how does the energy cost scale up as the neural output is\nincreasing. For a single neuron we will demonstrate with the general energy cost function but when\nwe generalize to the case of multiple neurons, we will assume \u03b2 = 1 for simplicity. Note that\nit does not require extra effort to solve the problem if the cost function takes the general form of\n\u02dcK(h(s)) = K0 + K1h(s)\u03b2, as reported in [26]. This is because of the linear nature of the expectation\nterm in the metabolic constraint.\n\n2.2 Derivation of the Optimal h(s)\n\nThis ef\ufb01cient coding problem can be greatly simpli\ufb01ed due to the fact that it is invariant under any\nre-parameterization of the stimulus variable s. We take this advantage by mapping s to another\nuniform random variable u \u2208 [0, 1] via the cumulative distribution function u = F (s) [27]. If we\nchoose g(u) = g(F (s)) = h(s), it suf\ufb01ces to solve the following new problem which optimizes g(u)\nfor a re-parameterized input u with uniform prior\n\nmaximize MI(u, r)\nsubject to 0 \u2264 g(u) \u2264 rmax,\n\ng(cid:48)\nEu[K(g(u))] \u2264 Ktotal\n\n(u) \u2265 0\n\nOnce the optimal form of g\u2217(u) is obtained, the optimal h\u2217(s) is naturally given by g\u2217(F (s)). To\nsolve this simpli\ufb01ed problem, \ufb01rst we express the objective function in terms of g(u). In the small\nnoise limit (large integration time T ), the Fisher information IF (u) of the neuron with noise model\nin Eq. (2) is calculated and the mutual information can be approximated as (see [28, 14])\n\n(cid:90)\n\n1\n2\n\nMI(u, r) = H(U ) +\n\nIF (u) = T\n\ng(cid:48)(u)2\ng(u)\u03b1 + O(1)\n\n(cid:90) 1\n\np(u) log IF (u) du =\n\n1\n2\n\nlog\n\n0\n\ng(cid:48)(u)2\ng(u)\u03b1 du +\n\n(4)\n\n(5)\n\n1\n2\n\nlog T + O(1/T )\n\nwhere H(U ) = 0 is the entropy and p(u) = 1{0\u2264u\u22641} is the density of the uniform distribution.\nFurthermore, each constraints can be rewritten as integrals of g(cid:48)(u) and g(u) respectively:\n\ng(1) \u2212 g(0) =\n\nEu[K(g(u))] =\n\ng(cid:48)\n\n(u) du \u2264 rmax\n\ng(u)\u03b2 du \u2264 Ktotal\n\n(6)\n\n(7)\n\n(cid:90) 1\n(cid:90) 1\n\n0\n\n0\n\n(cid:20) 1\n\na\n\n(cid:21)1/\u03b2\n(cid:90) x\n\n,\n\nThis form of the problem (Eq. 5-7) can be analytically solved by using the Lagrangian multiplier\nmethod and the optimal response function must take the form\n\ng(u) = rmax \u00b7\n\n\u03b3\u22121\n\nq\n\n(u\u03b3q(a))\n\nwhere q\n\ndef\n\n= (1 \u2212 \u03b1/2)/\u03b2,\n\n\u03b3q(x)\n\ndef\n=\n\nh(s) = g(F (s))\n\nzq\u22121 exp(\u2212z) dz.\n\n(8)\n\n(9)\n\n0\n\nThe function \u03b3q(x) is called the incomplete gamma function and \u03b3\u22121\nis its inverse. Due to space\nlimitation we only present a sketch derivation. Readers who are interested in the detailed proof are\nreferred to the supplementary materials.\nLet us now turn to some intuitive conclusions behind this solution (also see Fig.1, in which we\nhave assumed rmax = 1 for simplicity). From Eq. (8), it is clear that the optimal solution g(u)\ndepend on the constant a which should be determined by equalizing the metabolic constraint (see the\nhorizontal dash lines in Fig.1a). Furthermore, the optimal solution h(s) depends on the speci\ufb01c input\ndistribution p(s). Depending on the relative magnitude of rmax and Ktotal:\n\nq\n\n3\n\n\f\u2022 Range constraint dominates: This is the case when there is more than suf\ufb01cient energy to\nachieve the optimal solution so that the metabolic constraint becomes completely redundant.\nDetermined by \u03b1, \u03b2 and rmax, Kthre is the energy consumption of the optimal code with\nunconstrained metabolic budget. When the available metabolic cost exceeds this threshold\nKtotal \u2265 Kthre, the constant a is very close to zero and the optimal g(u) is proportional to a\npower function g(u) = rmax \u00b7 u1/q. See red curves in Fig.1.\n\u2022 Both constraints: This is the general case when Ktotal (cid:46) Kthre. The constant a is set to the\nminimum value for which the metabolic constraint is satis\ufb01ed. See purple curves in Fig.1.\n\u2022 Metabolic constraint dominates: This happens when Ktotal (cid:28) Kthre. In this case a is\n\noften very large. See blue curves in Fig.1.\n\nFigure 1: Deriving optimal tuning curves g(u) and corresponding h(s) for different prior distributions\nand different noise models. Top row: constant Gaussian noise (\u03b1, \u03b2, q) = (0, 1, 1); Bottom row:\nPoisson noise (\u03b1, \u03b2, q) = (1, 1, 1/2). (a) A segment of the inverse incomplete gamma function\nis cropped out by dashed boxes. The higher the horizontal dash lines (constant a), the lower the\naverage metabolic cost, which corresponds to a more substantial metabolic constraint. We thus\nuse \u201clow\",\u201chigh\" and \u201cmax\" to label the energy costs under different metabolic constraints. (b) The\noptimal solution g(u) for a uniform variable u. (c) The corresponding optimal h(s) for Gaussian prior.\n(d) The corresponding optimal h(s) for Gamma distribution p(s) \u221d sq\u22121 exp(\u2212s). Speci\ufb01cally for\nthis prior, the optimal tuning curve is exactly linear without maximum response constraint. (e-h)\nSimilar to (a-d), but for Poisson noise.\n\n2.3 Properties of the Optimal h(s)\n\nWe have predicted the optimal response function for arbitrary values of \u03b1 (which corresponds to the\nnoise model) and \u03b2 (which quanti\ufb01es the metabolic cost model). Here we speci\ufb01cally focus on a few\nsituations with most biological relevance.\nWe begin with the simple additive Gaussian noise model, i.e. \u03b1 = 0. This model could provide a\ngood characterization of the response mapping from the input stimulus to the membrane potential\nof a neuron [19]. With more than suf\ufb01cient metabolic supply, the optimal solution falls back to the\nhistogram equalization principle where each response magnitude is utilized to the same extent (red\ncurve in Fig. 1b and Fig.2a). With less metabolic budget, the optimal tuning curve bends downwards\nto satisfy this constraint and large responses will be penalized, resulting in more density at smaller\nresponse magnitude (purple curve in Fig.2a). In the other extreme, when the available metabolic\nbudget Ktotal is diminishing, the response magnitude converges to the max-entropy distribution under\nthe metabolic constraint E[g(u)\u03b2] = const (blue curve in Fig.2a).\nNext we discuss the case of Poisson spiking neurons. In the extreme case when the range constraint\ndominates, the model predicts a square tuning curve for uniform input (red curve in Fig.1f), which is\nconsistent with previous studies [29, 30]. We also found that Poisson noise model leads to heavier\n\n4\n\nx\u03b3\u22121q(x) low costhigh costmax cost00.5100.51ug(u)p(u)\u221220200.51sh(s)p(s)02400.51sh(s)p(s)x\u03b3\u22121q(x)00.5100.51ug(u)p(u)\u221220200.51sh(s)p(s)02400.51sh(s)p(s)efghabcdGaussiannoisePoissonnoiseuniformpriorGaussianpriorheavy-tailprior\fpenalization on large response magnitude compared to Gaussian noise, suggesting an interaction\nbetween noise and metabolic cost in shaping the optimal neural response distribution. In the other\nextreme when Ktotal goes to 0, the response distribution converges to a gamma distribution, with heavy\ntail (see Fig.2). Our analytical result gives a simple yet quantitative explanation of the emergence of\nsparse coding [7] from an energy-ef\ufb01ciency perspective.\n\nFigure 2: Probability of generating certain response g(u) based on the optimal tuning of a single\nneuron under (a) Gaussian noise model and (b) Poisson noise model. In the extreme case of Gaussian\nnoise with effectively no metabolic constraint, the distribution is uniformly distributed on the whole\nrange.\n\n3 Optimal Code for a Pair of Neurons\n\nWe next study the optimal coding in the case of two neurons with monotonic response functions. We\ndenote the neural responses as r = (r1, r2). Therefore the ef\ufb01cient coding problem becomes:\n\nmaximize MI(s, r)\nsubject to 0 \u2264 hi(s) \u2264 rmax,\n\n(range constraint)\n(metabolic constraint)\nAssuming the neural noise is independent across neurons, the system of two neurons has total Fisher\ninformation just as the linear sum of Fisher information contributed from each neuron IF (s) =\nI1(s) + I2(s).\n\nEs [K(h1(s)) + K(h2(s))] \u2264 2Ktotal\n\ni = 1, 2.\n\n3.1 Optimal response functions\n\nPrevious studies on neural coding with monotonic response functions have typically assumed that each\nhi(s) has sigmoidal shape. It is important to emphasize that we do not make any a priori assumptions\non the detailed shape of the tuning curve other than being monotonic and smooth. We de\ufb01ne each\ni(s) > 0}. Due\nneuron\u2019s active region Ai = A+\nto the monotonicity of tuning curve, either A+\nWe \ufb01nd the following results (proof in the supplementary materials)\n\ni(s) > 0}, A\u2212\n\ni has to be empty.\n\ni = {s| \u2212 h(cid:48)\n\ni or A\u2212\n\ni \u222a A\u2212\n\ni , where A+\n\ni = {s|h(cid:48)\n\n1. Different neurons should have non-overlapping active regions.\n2. If the metabolic constraint is binding, ON-OFF coding is better than ON-ON coding or OFF-\nOFF coding. Otherwise all three coding schemes can achieve the same mutual information.\n\n3. For ON-OFF coding, it is better to have ON regions on the right side.\n4. For ON-ON coding (or OFF-OFF), each neuron should have roughly the same tuning curve\nhi(s) \u2248 hj(s) while still have disjoint active regions. Note that a conceptually similar\ncoding scheme has been previously discussed by [29]. Within the ON-pool or OFF-pool,\nthe optimal tuning curve is same as the optimal solution from the single neuron case.\n\nIn Fig.3a-d, we illustrate how these conclusions can be used to determine the optimal pair of neurons,\nassuming additive Gaussian noise \u03b1 = 0 and linear metabolic cost \u03b2 = 1 (for other \u03b1 and \u03b2 the\nprocess is similar). Our analytical results allow us to predict the precise shape of the optimal response\nfunctions, which goes beyond previous work on ON-OFF coding schemes [13, 31].\n\n5\n\n00.51response magnitudeprobability low costhigh costmax cost00.51response magnitudeprobabilityabGaussiannoisePoissonnoise\f3.2 Comparison between ON-OFF and ON-ON codes\n\nWe aim to compare the coding performance of ON-OFF and ON-ON codes. In Fig.3e we show how\nthe mutual information depends on the available metabolic budget. For both ON-FF and ON-ON\nscheme, the mutual information is monotonically increasing as a function of energy available. We\ncompare these two curves in two different ways. First, we notice that both mutual information curve\nsaturate the limit at KON-ON = 0.5rmax and KON-OFF = 0.25rmax respectively (see the red tuning\ncurves in Fig.3a-d). Note that this speci\ufb01c saturation limit is only valid for \u03b1 = 0 and \u03b2 = 1. For\nany other mutual information, we \ufb01nd out that the optimal ON-ON pair (or OFF-OFF pair) always\ncost twice energy compared to the optimal ON-OFF pair. Second, one can compare the ON-ON and\nON-OFF scheme by \ufb01xing the energy available. The optimal mutual information achieved by ON-ON\nneurons is always smaller than that achieved by ON-OFF neurons and the difference is plotted in\nFig.3. When the available energy is extremely limited Ktotal (cid:28) rmax, such difference saturates at \u22121\nin the logarithm space of MI (base 2). This shows that, in the worst scenario, the ON-ON code is only\nhalf as ef\ufb01cient as the ON-OFF code from mutual information perspective. In other words, it would\ntake twice the amount of time T for the ON-ON code to convey same amount of mutual information\nas the ON-OFF code under same noise level.\nThese analyses quantitatively characterize the advantage of ON-OFF over ON-ON and show how it\nvaries when the relative importance of the metabolic constraint changes. The encoding ef\ufb01ciency of\nON-OFF ranges from double (with very limited metabolic budget) to equal amount of the ON-ON\nef\ufb01ciency (with unlimited metabolic budget). This wide range includes the previous conclusion\nreported by Gjorgjieva et.al., where a mild advantage (\u223c 15%) of ON-OFF scheme is found under\nshort integration time limit [31]. It is well known that the split of ON and OFF pathways exists in\n\nFigure 3: The optimal response functions for a pair of neurons assuming Gaussian noise. (a) The\noptimal response functions for a uniform input distribution assuming ON-OFF coding scheme. Solid\nred curve and dash red curve represent the optimal response functions for a pair of neurons with no\nmetabolic constraint (\u201cmax cost\"). Solid blue and dash blue curves are the optimal response functions\nwith substantial metabolic constraint (\u201clow cost\"). (b) Similar to panel a, but for input stimuli with\nheavy tail distribution. (c) The optimal response functions for a uniform input distribution assuming\nON-ON coding scheme. Solid and dash red curves are for no metabolic constraint. Notice that two\ncurves appear to be identical but are actually different at \ufb01ner scales (see the inserted panel). Solid\nand dash blue are for substantial metabolic constraint. (d) Similar to panel c, but for input stimuli\nwith heavy tail distribution. (e) A comparison between the ON-ON and ON-OFF schemes. The\nx-axis represents how substantial the metabolic constraint is \u2013 any value greater than the threshold\n0.5 implies no metabolic constraint in effect. The y-axis represents the mutual information, relative\nto the maximal achievable mutual information without metabolic constraints (which is the same\nfor ON-ON and ON-OFF schemes). The green dash line represents the difference between the\ninformation transmitted by the two schemes. Negative difference indicates an advantage of ON-OFF\nover ON-ON.\n\n6\n\n00.5100.51ug(u) low cost, ONlow cost, OFFmax cost, ONmax cost, OFFp(u)\u22124\u2212202400.51sh(s)p(s)00.5100.51ug(u) low cost, ON\u22121low cost, ON\u22122max cost, ON\u22121max cost, ON\u22122p(u)\u22124\u2212202400.51sh(s)p(s)00.250.5\u22123\u22122\u221210Ktotal/rmaxlog2(MI) ON\u2212ONON\u2212OFFdifferencecdabeON-OFFON-ONuniformpriorheavy-tailprior\fthe retina of many species [32, 33]. The substantial increase of ef\ufb01ciency under strong metabolic\nconstraint we discovered supports the argument that metabolic constraint may be one of the main\nreasons for such pathway splitting in evolution.\nIn a recent study by Karklin and Simoncelli [13], it is observed numerically that ON-OFF coding\nscheme can naturally emerge when a linear-nonlinear population of neurons are trained to maximize\nmutual information with image input and under metabolic constraint. It is tempting to speculate a\ngeneric connection of these numerical observations to our theoretical results, although our model is\nmuch more simpli\ufb01ed in the sense that we do not directly model the higher dimensional stimulus\n(natural image) but just a one dimensional projection (local contrast). Intriguingly, we \ufb01nd that if the\ninputs follow certain heavy tail distribution ( Fig.3b), the optimal response functions are two recti\ufb01ed\nnon-linear functions which split the encoding range. Such recti\ufb01ed non-linearity is consistent with\nboth the non-linearity observed physiologically[34] and the numerical results in [13] .\n\n4 Discussion\n\nIn this paper we presented a theoretical framework for studying optimal neural codes under biologi-\ncally relevant constraints. Compared to previous works, we emphasize the importance of two types of\nconstraint \u2013 the noise characteristics of the neural responses and the metabolic cost. Throughout the\npaper, we have focused on neural codes with smooth monotonic response functions. We demonstrated\nthat, maybe surprisingly, analytical solutions exist for a wide family of noise characteristics and\nmetabolic cost functions. These analytical results rely on the techniques of approximating mutual\ninformation using Fisher information. There are cases when such approximation would bread down,\nin particular for short integration time or non-Gaussian noise. For a more detailed discussion on the\nvalidity of Fisher approximation, see [29, 14, 35].\nWe have focused on the cases of a single neuron and a pair of neurons. However, the framework\ncan be generalized to the case of larger population of neurons. For the case of N = 2k (k is large)\nneurons, we again \ufb01nd the corresponding optimization problem could be solved analytically by\nexploiting the Fisher information approximation of mutual information [28, 14]. Interestingly, we\nfound the optimal codes should be divided into two pools of neurons of equal size k. One pool\nof neuron with monotonic increasing response function (ON-pool), and the other with monotonic\ndecreasing response function (OFF-pool). For neurons within the same pool, the optimal response\nfunctions appear to be identical on the macro-scale but are quite different when zoomed in. In fast,\nthe optimal code must have disjoint active regions for each neuron. This is similar to what has been\nillustrated in the inset panel of Fig.3c, where two seemingly identical tuning curves for ON-neurons\nare compared. We can also quantify the increase of the mutual information by using optimal coding\nschemes versus using all ON neurons (or all OFF). Interestingly, some of the key results presented in\nthe Fig 3e for the a pair of neurons generalize to 2K case. When N = 2k + 1, the optimal solution is\nsimilar to N = 2k for a large pool of neurons. However, when k is small, the difference caused by\nasymmetry between ON/OFF pools can substantially change the con\ufb01guration of the optimal code.\nDue to the limited scope of the paper, we have ignored several important aspects when formulating\nthe ef\ufb01cient coding problem. First, we have not modeled the spontaneous activity (baseline \ufb01ring rate)\nof neurons. Second, we have not considered the noise correlations between the responses of neurons.\nThird, we have ignored the noise in the input to the neurons. We think that the \ufb01rst two factors\nare unlikely to change our main results. However, incorporating the input noise may signi\ufb01cantly\nchange the results. In particular, for the cases of multiple neurons, our current results predict that\nthere is no overlap between the active regions of the response functions for ON and OFF neurons.\nHowever, it is possible that this prediction does not hold in the presence of the input noise. In that\ncase, it might be bene\ufb01cial to have some redundancy by making the response functions partially\noverlap. Including these factors into the framework should facilitate a detailed and quantitative\ncomparison to physiologically measured data in the future. As for the objective function, we have\nonly considered the case of maximizing mutual information; it is interesting to see whether the results\ncan be generalized to other objective functions such as, e.g., minimizing decoding error[36, 37]. Also,\nour theory is based on a one dimensional input. To fully explain the ON-OFF split in visual pathway,\nit seems necessary to consider a more complete model with the images as the input. To this end, our\ncurrent model lacks the spatial component, and it doesn\u2019t explain the difference between the number\nof ON and OFF neurons in retina [38]. Nonetheless, the insight from these analytical results based\non the simple model may prove to be useful for a more complete understanding of the functional\n\n7\n\n\forganization of the early visual pathway. Last but not least, we have assumed a stationary input\ndistribution. However, in the natural environment the input distribution often \ufb02uctuate at different\ntime scales, it remains to be investigated how to incorporate these dynamical aspects into a theory of\nef\ufb01cient coding.\n\nReferences\n[1] Fred Attneave. Some informational aspects of visual perception. Psychological review,\n\n61(3):183, 1954.\n\n[2] Horace B Barlow. Possible principles underlying the transformation of sensory messages.\n\nSensory communication, pages 217\u2013234, 1961.\n\n[3] Ralph Linsker. Self-organization in a perceptual network. Computer, 21(3):105\u2013117, 1988.\n\n[4] Joseph J Atick and A Norman Redlich. Towards a theory of early visual processing. Neural\n\nComputation, 2(3):308\u2013320, 1990.\n\n[5] Joseph J Atick. Could information theory provide an ecological theory of sensory processing?\n\nNetwork: Computation in neural systems, 3(2):213\u2013251, 1992.\n\n[6] F Rieke, DA Bodnar, and W Bialek. Naturalistic stimuli increase the rate and ef\ufb01ciency of\ninformation transmission by primary auditory afferents. Proceedings of the Royal Society of\nLondon. Series B: Biological Sciences, 262(1365):259\u2013265, 1995.\n\n[7] Bruno Olshausen and David Field. Emergence of simple-cell receptive \ufb01eld properties by\n\nlearning a sparse code for natural images. Nature, 381:607\u2013609, 1996.\n\n[8] Anthony J Bell and Terrence J Sejnowski. The \u201cindependent components\" of natural scenes are\n\nedge \ufb01lters. Vision research, 37(23):3327\u20133338, 1997.\n\n[9] Eero P Simoncelli and Bruno A Olshausen. Natural image statistics and neural representation.\n\nAnnual review of neuroscience, 24(1):1193\u20131216, 2001.\n\n[10] Allan Gottschalk. Derivation of the visual contrast response function by maximizing information\n\nrate. Neural computation, 14(3):527\u2013542, 2002.\n\n[11] Nicol S Harper and David McAlpine. Optimal neural population coding of an auditory spatial\n\ncue. Nature, 430(7000):682\u2013686, 2004.\n\n[12] Mark D McDonnell and Nigel G Stocks. Maximally informative stimuli and tuning curves for\nsigmoidal rate-coding neurons and populations. Physical review letters, 101(5):058103, 2008.\n\n[13] Yan Karklin and Eero P Simoncelli. Ef\ufb01cient coding of natural images with a population of\nnoisy linear-nonlinear neurons. Advances in neural information processing systems, 24:999,\n2011.\n\n[14] Xue-Xin Wei and Alan A Stocker. Mutual information, \ufb01sher information, and ef\ufb01cient coding.\n\nNeural computation, 2016.\n\n[15] Horace Barlow. Redundancy reduction revisited. Network: computation in neural systems,\n\n12(3):241\u2013253, 2001.\n\n[16] William B Levy and Robert A Baxter. Energy ef\ufb01cient neural codes. Neural computation,\n\n8(3):531\u2013543, 1996.\n\n[17] Simon B Laughlin, Rob R de Ruyter van Steveninck, and John C Anderson. The metabolic cost\n\nof neural information. Nature neuroscience, 1(1):36\u201341, 1998.\n\n[18] Vijay Balasubramanian, Don Kimber, and Michael J Berry II. Metabolically ef\ufb01cient informa-\n\ntion processing. Neural Computation, 13(4):799\u2013815, 2001.\n\n[19] Simon B Laughlin. A simple coding procedure enhances a neuron\u2019s information capacity. Z.\n\nNaturforsch, 36(910-912):51, 1981.\n\n8\n\n\f[20] Deep Ganguli and Eero P Simoncelli. Ef\ufb01cient sensory encoding and Bayesian inference with\n\nheterogeneous neural populations. Neural Computation, 26(10):2103\u20132134, 2014.\n\n[21] David B Kastner, Stephen A Baccus, and Tatyana O Sharpee. Critical and maximally informative\nencoding between neural populations in the retina. Proceedings of the National Academy of\nSciences, 112(8):2533\u20132538, 2015.\n\n[22] George J Tomko and Donald R Crapper. Neuronal variability: non-stationary responses to\n\nidentical visual stimuli. Brain research, 79(3):405\u2013418, 1974.\n\n[23] DJ Tolhurst, JA Movshon, and ID Thompson. The dependence of response amplitude and\nvariance of cat visual cortical neurones on stimulus contrast. Experimental brain research,\n41(3-4):414\u2013419, 1981.\n\n[24] Mark M Churchland et al. Stimulus onset quenches neural variability: a widespread cortical\n\nphenomenon. Nature neuroscience, 13(3):369\u2013378, 2010.\n\n[25] Moshe Gur and D Max Snodderly. High response reliability of neurons in primary visual cortex\n\n(v1) of alert, trained monkeys. Cerebral cortex, 16(6):888\u2013895, 2006.\n\n[26] David Attwell and Simon B Laughlin. An energy budget for signaling in the grey matter of the\n\nbrain. Journal of Cerebral Blood Flow & Metabolism, 21(10):1133\u20131145, 2001.\n\n[27] Xue-Xin Wei and Alan A Stocker. A bayesian observer model constrained by ef\ufb01cient coding\n\ncan explain\u2019anti-bayesian\u2019percepts. Nature Neuroscience, 2015.\n\n[28] Nicolas Brunel and Jean-Pierre Nadal. Mutual information, Fisher information, and population\n\ncoding. Neural Computation, 10(7):1731\u20131757, 1998.\n\n[29] Matthias Bethge, David Rotermund, and Klaus Pawelzik. Optimal short-term population coding:\n\nwhen \ufb01sher information fails. Neural Computation, 14(10):2317\u20132351, 2002.\n\n[30] Don H Johnson and Will Ray. Optimal stimulus coding by neural populations using rate codes.\n\nJournal of computational neuroscience, 16(2):129\u2013138, 2004.\n\n[31] Julijana Gjorgjieva, Haim Sompolinsky, and Markus Meister. Bene\ufb01ts of pathway splitting in\n\nsensory coding. The Journal of Neuroscience, 34(36):12127\u201312144, 2014.\n\n[32] Peter H Schiller. The on and off channels of the visual system. Trends in neurosciences,\n\n15(3):86\u201392, 1992.\n\n[33] Heinz W\u00e4ssle. Parallel processing in the mammalian retina. Nature Reviews Neuroscience,\n\n5(10):747\u2013757, 2004.\n\n[34] Matteo Carandini. Ampli\ufb01cation of trial-to-trial response variability by neurons in visual cortex.\n\nPLoS Biol, 2(9):e264, 2004.\n\n[35] Zhuo Wang, Alan A Stocker, and Daniel D Lee. Ef\ufb01cient neural codes that minimize lp\n\nreconstruction error. Neural Computation, 2016.\n\n[36] Tvd Twer and Donald IA MacLeod. Optimal nonlinear codes for the perception of natural\n\ncolours. Network: Computation in Neural Systems, 12(3):395\u2013407, 2001.\n\n[37] Zhuo Wang, Alan A Stocker, and Daniel D Lee. Optimal neural tuning curves for arbitrary\nstimulus distributions: Discrimax, infomax and minimum Lp loss. In Advances in Neural\nInformation Processing Systems NIPS, pages 2177\u20132185, 2012.\n\n[38] Charles P Ratliff, Bart G Borghuis, Yen-Hong Kao, Peter Sterling, and Vijay Balasubramanian.\nRetina is structured to process an excess of darkness in natural scenes. Proceedings of the\nNational Academy of Sciences, 107(40):17368\u201317373, 2010.\n\n9\n\n\f", "award": [], "sourceid": 2304, "authors": [{"given_name": "Zhuo", "family_name": "Wang", "institution": "University of Pennsylvania"}, {"given_name": "Xue-Xin", "family_name": "Wei", "institution": "University of Pennsylvania"}, {"given_name": "Alan", "family_name": "Stocker", "institution": "University of Pennsylvania"}, {"given_name": "Daniel", "family_name": "Lee", "institution": "University of Pennsylvania"}]}