{"title": "Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals", "book": "Advances in Neural Information Processing Systems", "page_first": 277, "page_last": 284, "abstract": null, "full_text": "Maximally Informative Dimensions: Analyzing\n\nNeural Responses to Natural Signals\n\n Sloan\u2013Swartz Center for Theoretical Neurobiology, Department of Physiology\nUniversity of California at San Francisco, San Francisco, California 94143\u20130444\n\n\u0001 Center for Neural Science, New York University, New York, NY 10003\n\u0004 Department of Physics, Princeton University, Princeton, New Jersey 08544\n\nTatyana Sharpee , Nicole C. Rust\u0001 , and William Bialek\u0003\u0002\n\nsharpee@phy.ucsf.edu, rust@cns.nyu.edu, wbialek@princeton.edu\n\nWe propose a method that allows for a rigorous statistical analysis of\nneural responses to natural stimuli, which are non-Gaussian and exhibit\nstrong correlations. We have in mind a model in which neurons are se-\nlective for a small number of stimulus dimensions out of the high di-\nmensional stimulus space, but within this subspace the responses can\nbe arbitrarily nonlinear. Therefore we maximize the mutual information\nbetween the sequence of elicited neural responses and an ensemble of\nstimuli that has been projected on trial directions in the stimulus space.\nThe procedure can be done iteratively by increasing the number of direc-\ntions with respect to which information is maximized. Those directions\nthat allow the recovery of all of the information between spikes and the\nfull unprojected stimuli describe the relevant subspace.\nIf the dimen-\nsionality of the relevant subspace indeed is much smaller than that of the\noverall stimulus space, it may become experimentally feasible to map\nout the neuron\u2019s input-output function even under fully natural stimulus\nconditions. This contrasts with methods based on correlations functions\n(reverse correlation, spike-triggered covariance, ...) which all require\nsimpli\ufb01ed stimulus statistics if we are to use them rigorously.\n\n1 Introduction\n\nFrom olfaction to vision and audition, there is an increasing need, and a growing number\nof experiments [1]-[8] that study responses of sensory neurons to natural stimuli. Natural\nstimuli have speci\ufb01c statistical properties [9, 10], and therefore sample only a subspace of\nall possible spatial and temporal frequencies explored during stimulation with white noise.\nObserving the full dynamic range of neural responses may require using stimulus ensem-\nbles which approximate those occurring in nature, and it is an attractive hypothesis that\nthe neural representation of these natural signals may be optimized in some way. Finally,\nsome neuron responses are strongly nonlinear and adaptive, and may not be predicted from\na combination of responses to simple stimuli. It has also been shown that the variability\nin neural response decreases substantially when dynamical, rather than static, stimuli are\nused [11, 12]. For all these reasons, it would be attractive to have a rigorous method of\nanalyzing neural responses to complex, naturalistic inputs.\n\nThe stimuli analyzed by sensory neurons are intrinsically high-dimensional, with dimen-\n\n\u0004\n\f\u0015 , \n\n (\"\u0014\"\u001b\"\n\u0011\u0014)\u0018\u0015\n\n <\"\u001b\"\u001b\"\n\n\u0011\u0014)\u0018\u0015+* :\n\n\u0001\b\u0007\n\n\u0003\t\u0005\n\nsions \u0002\u0001\u0004\u0003\u0006\u0005\n\u0004 . For example, in the case of visual neurons, input is speci\ufb01ed as light\nintensity on a grid of at least \u0003\u0006\u0005\u000b\n\f\u0003\u0006\u0005 pixels. The dimensionality increases further if the\n\ntime dependence is to be explored as well. Full exploration of such a large parameter space\nis beyond the constraints of experimental data collection. However, progress can be made\nprovided we make certain assumptions about how the response has been generated. In the\nsimplest model, the probability of response can be described by one receptive \ufb01eld (RF)\nin the stimulus space\n\n[13]. The receptive \ufb01eld can be thought of as a special direction \r\n\u000e .\nsuch that the neuron\u2019s response depends only on a projection of a given stimulus \u000f onto \r\nThis special direction \r\n\u0011\u001b\u0013\u0016\u0015\u001d\u001c\u001e\u000f ,\nmore general case, the probability of the response depends on projections \u0010\u0012\u0011\u0014\u0013\u0016\u0015\u0018\u0017\u0019\n\nis the one found by the reverse correlation method [13, 14]. In a\n\n\u0003! #\"\u0014\"\u001b\"\u0014 %$\n\n, of the stimulus \u000f on a set of vectors &'\r\n\u000f\u001e8\n\n,.-0/+132\u001b465!7\n\n,.-0/+132\u001b465\n\n\u000f\t89\u0017\n\n8;:\n\nis the probability of a spike given a stimulus \u000f and ,.-=/;1?2\u00144!5\n\nis the av-\nerage \ufb01ring rate. In what follows we will call the subspace spanned by the set of vectors\nthe relevant subspace (RS). Even though the ideas developed below can be used to\nanalyze input-output functions with respect to different neural responses, we settle on a\nsingle spike as the response of interest.\n\nwhere ,.-=/;132\u001b465>7\n\u0011\u0014\u0013\u0016\u0015%*\n&'\r\nEq. (1) in itself is not yet a simpli\ufb01cation if the dimensionality $\ndimensionality@\nreduction [15, 16] and assume that $BA\n\nof the RS is equal to the\nof the stimulus space. In this paper we will use the idea of dimensionality\nin Eq. (1) can be\nstrongly nonlinear, but it is presumed to depend only on a small number of projections. This\nassumption appears to be less stringent than that of approximate linearity which one makes\nwhen characterizing neuron\u2019s response in terms of Wiener kernels. The most dif\ufb01cult part\n\n. The input-output function :\n\nis just as valid, since we did not make any\n. We might however prefer one\ncoordinate system over another if it, for example, leads to sparser probability distributions\nor more statistically independent variables.\n\nin reconstructing the input-output function is to \ufb01nd the RS. For $DCE\u0003 , a description in\nterms of any linear combination of vectors &'\r\nassumptions as to a particular form of nonlinear function :\nOnce the relevant subspace is known, the probability ,.-=/;1?2\u00144!5>7\n\nonly few parameters, and it becomes feasible to map this function experimentally, inverting\nthe probability distributions according to Bayes\u2019 rule:\n\n\u000f\t8 becomes a function of\n\n\u0011\u0014\u0013\u0016\u0015+*\n\n(1)\n\n(2)\n\n&\u001e\u0010\n\n\u0011\u0014\u0013\u0016\u0015\n\n*\u00168F\u0017\n\n,.-\n\n&\u001e\u0010\n,.-\n\n/+132\u00144!5\n*\u00168\n\u0011\u0014\u0013\u0016\u0015\n\n\u0011\u0014\u0013\u0016\u0015\n&\u001e\u0010\n\nIf stimuli are correlated Gaussian noise, then the neural response can be characterized by\nthe spike-triggered covariance method [15, 16]. It can be shown that the dimensionality of\nthe RS is equal to the number of non-zero eigenvalues of a matrix given by a difference\nbetween covariance matrices of all presented stimuli and stimuli conditional on a spike.\nMoreover, the RS is spanned by the eigenvectors associated with the non-zero eigenval-\nues multiplied by the inverse of the a priori covariance matrix. Compared to the reverse\ncorrelation method, we are no longer limited to \ufb01nding only one of the relevant directions\n\n\u0011\u0014\u0013\u0016\u0015 . However because of the necessity to probe a two-point correlation function, the spike-\n\ntriggered covariance method requires better sampling of distributions of inputs conditional\non a spike.\n\nIn this paper we investigate whether it is possible to lift the requirement for stimuli to be\nGaussian. When using natural stimuli, which are certainly non-Gaussian, the RS cannot be\nfound by the spike-triggered covariance method. Similarly, the reverse correlation method\ndoes not give the correct RF, even in the simplest case where the input-output function (1)\ndepends only on one projection. However, vectors that span the RS are clearly special di-\nrections in the stimulus space. This notion can be quanti\ufb01ed by Shannon information, and\n\n\u000e\n\u000e\n\u001a\n\u001f\n\u0017\n\u001a\n\u0011\n\n\u001a\n\u0011\n\u0001\n\u0015\n\n\u001a\n-\n\u0010\n\u0011\n\n\u0015\n \n\u0010\n\u0011\n\u0001\n\u0015\n\u0010\n8\n \n8\n\u001a\n@\n\u001a\n:\n-\n*\n7\n8\n\n\u001a\n\fan optimization problem can be formulated to \ufb01nd the RS. Therefore the current imple-\nmentation of the dimensionality reduction idea is complimentary to the clustering of stim-\nuli done in the information bottleneck method [17]; see also Ref. [18]. Non\u2013information\n\nbased measures of similarity between probability distributions ,.-\n\nalso been proposed [19]. We illustrate how the optimization scheme of maximizing infor-\nmation as function of direction in the stimulus space works with natural stimuli for model\norientation sensitive cells with one and two relevant directions, much like simple and com-\nplex cells found in primary visual cortex. It is also possible to estimate average errors in\nthe reconstruction. The advantage of this optimization scheme is that it does not rely on\nany speci\ufb01c statistical properties of the stimulus ensemble, and can be used with natural\nstimuli.\n\n\u000f\u001e8 and ,.-\n\n8 have\n\n/;132\u001b465\n\n2 Information as an objective function\n\nWhen analyzing neural responses, we compare the a priori probability distribution of all\npresented stimuli with the probability distribution of stimuli which lead to a spike. For\nGaussian signals, the probability distribution can be characterized by its second moment,\nthe covariance matrix. However, an ensemble of natural stimuli is not Gaussian, so that\nneither second nor any other \ufb01nite number of moments is suf\ufb01cient to describe the prob-\nability distribution. In this situation, the Shannon information provides a convenient way\nof comparing two probability distributions. The average information carried by the arrival\ntime of one spike is given by [20]\n\n,.-\n\n/;132\u001b465\n\n,.-\n\n/;1?2\u00144!5\n\n\u0002\u0001\u0004\u0003\u0006\u0005\n\n\u0007\t\b\n\n8\r\f\u000f\u000e\u0011\u0010\n\n\u0001\u0013\u0012\n\n\u000f\t8\n\n(3)\n\n(4)\n\n,.-\n\n8\t\u0014\n\n\u000f\u001e8\u0016\u0015\n\u000f\u001e8 . Then,\n8 \u001f\"!#\u000b\n\n/+132\u001b465\n\n8 or\n\n)+*\u0006,\n\n\u0001('\u0004\u0005\n\n)+*\u0006,\n\n\u0002\u0001\u0018\u0003\u0019\u0005\n\n8 , with\n\n\u0007\u001a\b\n\u0001\u0018\u0003\u0006\u0005\n\ndifference \u0001\n\nThe information per spike, as written in (3) is dif\ufb01cult to estimate experimentally, since\n\na model of how spikes were generated, i.e. the knowledge of low-dimensional RS. How-\nin a model-independent way, if stimuli are presented\n\nit requires either sampling of the high-dimensional probability distribution ,.-\nmultiple times to estimate the probability distribution ,.-0/+132\u001b465>7\n,.-0/+132\u00144!5>7\n\u000f\t8\n,.-0/+132\u001b465\n\n,.-=/;1?2\u00144!5>7\n,.-=/;1?2\u00144!5\n\nHaving in mind a model in which spikes are generated according to projection onto a low-\ndimensional subspace, we start by projecting all of the presented stimuli on a particular\n\nis the number of different stimuli,\nis the number of elicited spikes [21] across all of the repetitions. The true value\n[22]. The knowledge of the total\ninformation per spike will characterize the quality of the reconstruction of the neuron\u2019s\ninput-output relation.\n\never it is possible to calculate\u0017\u0001\u0004\u0003\u0006\u0005\n\u0007\t\b\n\u0001\u001e\u001d\n\u0017\u001c\u001b\n\f\u000f\u000e\u0011\u0010\nwhere the average is taken over all presented stimuli. Note that for a \ufb01nite dataset of$\n\u0007\t\b\u0016-\nrepetitions, the obtained value\u0017\u0001\u0004\u0003\u0006\u0005\n8 will be on average larger than\u0006\u0001\u0018\u0003\u0006\u0005\n\u0001('\u0004\u0005\n\u0007\u001a\b.-\n8 , where$\n\f\u000f/\n\u0001\u0004\u0003\u0006\u0005\n\u0007\t\b\nand$\n\u0007\t\b can also be found by extrapolating to$10\n\u0002\u0001\u0018\u0003\u0019\u0005\nin the stimulus space, and form probability distributions ,32\n/+132\u001b46598\n-\u00044\n\u000b , ,\n-\u00044\n\u000b . The information\n\u001765\u00047\n4',\n-\u00184\n-\u00044\n-\u00044\n\u0001#\u0012\n8\r\f\u000f\u000e\u0011\u0010\n\u0017;:\n8\u0016\u0015\n8\t\u0014\nprojection on the direction \r\ndoes not change when vector \u000e\n-\u00044\nfor any probability distribution and any constant< , ,#=\u00162\u0012-\u00184\n,A2\n\u0014B<\u00068 . When evaluated\n\u0017><@?\nalong any vector,'-\n\u0007\t\b . The total information\n\u0007\u001a\b can be recovered along one\n\u0001\u0018\u0003\u0006\u0005\n\u0001\u0018\u0003\u0019\u0005\n8DC\nparticular direction only if \u000e\n\u0015 , and the RS is one-dimensional.\n\nprovides an invariant measure of how much the occurrence of a spike is determined by\n. It is a function only of direction in the stimulus space and\nis multiplied by a constant. This can be seen by noting that\n\ndirection \u000e\n\u000f\u0018\u001c\n\n\u001765\u00047\n\n\u0007\u001a\b\u0016-&%\n\n/+132\u00144!5\n\n-\u00184\n\n(5)\n\n/;1?2\u00144!5\n\n/;1?2\u00144!5\n\n-\u00044\n\n\u000f\u0018\u001c\n\n\u000f\n7\n\u0017\n\n\u000b\n\u000f\n7\n\u000f\n7\n\"\n\u000f\n7\n8\n \n$\n$\n\u0005\n\u0014\n-\n$\n-\n\u0005\n%\n7\n8\n\u0007\n\u000e\n8\n7\n2\n8\n\u0007\n\u000e\n8\n8\n\n-\n\u000e\n8\n\n2\n7\n,\n2\n7\n,\n2\n \n\u000e\n8\n\n\u000e\n\n\u0017\n\n\u001a\n\u0011\n\n\f\u0006\n\t\n\n\u000f>\u001c\n\n\u0006\n\t\n\n\u000f>\u001c\n\n\u0011\u001b\u0013\u0016\u0015\n\n4\u0007\u0006\n\n \u0006\"\u0014\"\u001b\"\u0014 \n\n\u000e\u0001\n\n*\u00168\n\n4\u000b\u0006\n\n-\u00044\u0007\u0006\n\n-\u00184\u000b\u0006\n\n \u0006\"\u0014\"\u001b\"\n/;1?2\u00144!5\n\n8 along a set of several\n\nBy analogy with (5), one could also calculate information'-\ndirections &\n\u0002\u0004\u0003\u0004\u0003\u0004\u0003\n\u0002\u0004\u0003\u0004\u0003\u0004\u0003\n\n* based on the multi-point probability distributions:\n\nIf we are successful in \ufb01nding all of the $\n\nin the input-output relation (1),\nthen the information evaluated along the found set will be equal to the total information\n\nThe information does not increase if more vectors outside the RS are included into the\ncalculation. On the other hand, the result of optimization with respect to the number of\n\n/+132\u00144!598\ndirections \r\n\u0001\u0018\u0003\u0019\u0005\n\u0007\t\b . When we calculate information along a set of \f vectors that are slightly off from the\n\u0006 . One can\n\u0007\u001a\b and is quadratic in deviations7\nRS, the answer is, of course, smaller than\u0006\u0001\u0018\u0003\u0019\u0005\ntherefore \ufb01nd the RS by maximizing information with respect to \r vectors simultaneously.\n$ may deviate from the RS if stimuli are correlated. The deviation is also\nvectors \r\u000f\u000e\nproportional to a weighted average of ,.-=/;1?2\u00144!5!7\n8 . For\n\u00106\u0011\nuncorrelated stimuli, any vector or a set of vectors that maximizes\n8 belongs to the RS.\nTo \ufb01nd the RS, we \ufb01rst maximize'-\n8 , and compare this maximum with\n\u0007\u001a\b , which is\n\u0001\u0018\u0003\u0006\u0005\nThe information'-\n8 as de\ufb01ned by (5) is a continuous function, whose gradient can be\n2@\n\nestimated according to (4). If the difference exceeds that expected from \ufb01nite sampling\ncorrections, we increment the number of directions with respect to which information is\nsimultaneously maximized.\n\nSince information does not change with the length of the vector,\n(which can\nalso be seen from (6) directly), unnecessary evaluations of information for multiples of\n\n\u000e are avoided by maximizing along the gradient. As an optimization algorithm, we have\n\nused a combination of gradient ascent and simulated annealing algorithms: successive line\nmaximizations were done along the direction of the gradient. During line maximizations, a\npoint with a smaller value of information was accepted according to Boltzmann statistics,\n\n/+132\u001b46598\n\n-\u00184\u001d7\n,A2\u0012-\u00184\n\n2B\u000f\u0013\n\n,.-=/;1?2\u00144!5!7\n\n4',+2\n\n <\"\u001b\"\u0014\"\u001b \n\n\u00106\u0011\u0011\u0010\n\n\u0015;8\n\n\u0010>\u0011\n\n <\"\u001b\"\u001b\"\u0014 \n\n\u0011\u001b)\u0018\u0015\n\n4.8\n\n5=\u000f\n\ncomputed\n\n/+132\u001b465\n\n-\u00184\n\n50\u000f\n\n(6)\n\nwith probability\u0014\n\ncompletion of each line maximization.\n\n5\u0016\u0015\n\n\u0006\n\u0017\n\n-\u0004\n\n3 Discussion\n\n8;8\t\u0014\u0019\u0018\n\n\u0015 . The effective temperature T is reduced upon\n\nWe tested the scheme of looking for the most informative directions on model neurons that\nrespond to stimuli derived from natural scenes. As stimuli we used patches of digitized to\n8-bit scale photos, in which no corrections were made for camera\u2019s light intensity transfor-\nmation function. Our goal is to demonstrate that even though spatial correlations present\nin natural scenes are non-Gaussian, they can be successfully removed from the estimate of\nvectors de\ufb01ning the RS.\n\n3.1 Simple Cell\n\nOur \ufb01rst example is taken to mimic properties of simple cells found in the primary visual\n\nin the presence of noise:\n\ncortex. A model phase and orientation sensitive cell has a single relevant direction \r\nshown in Fig. 1(a). A given frame \u000f\nthreshold value\u001a\n\u000f\t8\n8\u001c\u001b\n\nleads to a spike if projection \u0010\u0012\u0011\n\u001a \u001f\"!68\n\n,.-0/+132\u00144!5!7\n,.-0/+132\u00144!5\n\n\u00159\u0017\n\n\u001c'\n\n\u0015 reaches a\n\n(7)\n\n5\u001e\u001d\n\n\u000e\n\n\u000e\n\n\u000e\n\n,\n2\n\u0002\n\u0002\n2\n\u0005\n-\n&\n*\n7\n8\n\u0017\n5\n\n\b\n\n7\n\u0007\n\u000e\n\u0006\n8\n7\n\u000b\n \n,\n2\n\u0002\n\u0002\n2\n\u0005\n-\n&\n\u0017\n5\n\n\b\n\n7\n\u0007\n\u000e\n\u0006\n8\n8\n\u000b\n\"\n\u001a\n\n\u000e\n\n\u0015\n\u0007\n\n\u0015\n\u0010\n-\n\u000e\n\u000e\n\u000e\n\u0012\n\u0017\n:\n\n8\n\u0012\n7\n4\n \n\u0007\n7\n\u0015\n\n\n4\n,\n2\n8\n8\n\"\n\u0012\n\u000e\n1\n\u0012\n-\n\u000e\n\n8\n\u0007\n\n-\n\u000e\n\u0006\n\u001a\n\u0011\n\n\u0015\n\n\u000f\n\u001a\n\u0011\n\n:\n-\n\u0010\n\u0011\n\n\u0015\n8\n\u0017\n-\n\u0010\n\u0011\n\n\u0015\n\u0007\n8\n \n\f\u0003\u0005\u0004\u0007\u0006\n\n\u000e\u000e\r\n\u001cF\n\n\u0003\u0010\u000f\n\n\u0005?\"\u0014\u0003\n\n8 [\u0010\n\n\u001c!\n\ning to reverse correlation method,\n\nThe spike-triggered average (STA), shown in Fig. 1(b), is broadened because of spatial\ncorrelations present in natural stimuli. If stimuli were drawn from a Gaussian probability\n\nover the ensemble of presented stimuli.] (f) Convergence of the algorithm according to\n .\n\nFigure 1: Analysis of a model simple cell with RF shown in (a). The spike-triggered\nis shown in (b). Panel (c) shows an attempt to remove correlations accord-\nfound by maximizing\n(crosses) is compared to\n\n\u0001('\u0001\naverage \u000e\n\u0006\t\b\n)\n\f\u000b\n\u0001\u0018'\u0001 ; (d) vector \u000e\ninformation; (e) The probability of a spike ,.-0/+132\u00144!5!7\n,.-0/+132\u001b465!7\n)\n\u0012\u000b\n)3\u0005\n8 used in generating spikes (solid line). Parameters \u0011\n\u0013 are the maximum and minimum values of \u0010>\u0011\n)\n\u0012\u000b and \u0010\n)\n\u0012\u000b\n)3\u0005\nand \u001a.\u0017\ninformation'-\n\u0015 as a function of inverse effective temperature \u0018\n8 and projection \u000e\nwhere Gaussian random variable ! of variance \u0011 models additive noise, and function\n-\u00044\nfor4\n\u0015 , the parameters \u001a\nfor threshold and the noise variance \u0011 determine the input-output function.\n\u0001\u0018'\u0001 by the inverse of the a priori\n\n)3\u0005\nCE\u0005 , and zero otherwise. Together with the RF \n\ncovariance matrix, according to the reverse correlation method. The procedure is not valid\nfor non-Gaussian stimuli and nonlinear input-output functions (1). The result of such a\ndecorrelation is shown in Fig. 1(c). It is clearly missing the structure of the model \ufb01lter.\nHowever, it is possible to obtain a good estimate of it by maximizing information directly,\nsee panel (d). A typical progress of the simulated annealing algorithm with decreasing\nis shown in panel (e). There we plot both the information along the vector,\n\ntemperature \u0018\nand its projection on \r\nset, see below. In the example shown in Fig. 1 there were \u0015\u0017\u0016\nprobability of spike \u0015\n,.-\n8 and ,.-\n)\n\u0012\u000b\n8 of projections onto vector \r\ninformation, and taking their ratio. In Fig. 1(e) we compare ,.-=/;1?2\u00144!5>7\nwith the probability ,.-=/;1?2\u00144!5!7\n8 used in the model (solid line).\n\n\u0015 . The \ufb01nal value of projection depends on the size of the data\n\u00053 %\u00056\u0005!\u0005 spikes with average\n\u0014 per frame. Having reconstructed the RF, one can proceed to\nsample the nonlinear input-output function. This is done by constructing histograms for\n)\n\f\u000b?7\n/;1?2\u00144!5\nfound by maximizing\n(crosses)\n\u0010!\u0011\n\ndistribution, they could be decorrelated by multiplying \u000e\n\n\u0005?\"\n\n8.\u0017\n\n\u001c#\n\n)\n\u0012\u000b\n\n\u00053\"\n\n)\n\f\u000b\n\n3.2 Estimated deviation from the optimal direction\n\nprobability distributions are estimated from experimental histograms and differ from the\n\nWhen information is calculated with respect to a \ufb01nite data set, the vector \r\nmizes will deviate from the true RF \n\n\u0015 . The deviation7\n\n\u000e which maxi-\n\u0015 arises because the\n\n\u0002\n?\n\n\u0006\n\u0006\n\u000e\n\u000f\n8\n\u0010\n\u0011\n\n\u0015\n\u0017\n-\n\u0010\n\u0007\n\u0010\n\u0013\n8\n\u0014\n-\n\u0010\n\u0007\n\u0010\n\u0013\n\n\u0015\n\u000e\n\u001a\n\u0011\n\n?\n\u001d\n\u0003\n\u001a\n\u0011\n\n\u001a\n\u0011\n\n-\n\u000f\n\u000e\n\u000f\n\u001c\n\n\u000e\n\u000e\n\u000f\n\u001c\n\n\u000e\n8\n\n\u0015\n\u001a\n\u0011\n\n\u000e\n\u0017\n\n\u000e\n\u0007\n\n\u001a\n\u0011\n\n\f1\n\n0.95\n\n (cid:215) v\ne\n1\n\n \nmax\n\n0.9\n\n0.85\n\n0.8\n\n0 \n\n1 \n\n2 \n\nN\u22121\n\n \nspike\n\n3\n\n \n10\u22125 \n\nis plotted as a\n\n(8)\n\n(9)\n\n\u0001\u0018\u0003\u0019\u0005\n\n\f\u000f/\n\n/+132\u001b465\n\n)\n\u0012\u000b\n\n\u001c3\n\n, and assuming that the probability of generating a spike is indepen-\n\nis the Hessian of information. Its structure is similar to that of a covariance matrix:\n\nWhen averaged over possible outcomes of N trials, the gradient of information is zero for\n\ndistributions found in the limit on in\ufb01nite data size. For a simple cell, the quality of recon-\n\nFigure 2: Projection of vector \u000e\nfunction of the number of spikes to show the linear scaling in \u0003\n\nan expected error in the reconstruction of the optimal \ufb01lter is inversely proportional to the\nnumber of spikes and is given by:\n\nthat maximizes information on RF \r\n\u0007\u001a\b (solid line is a \ufb01t).\n\u0001\u0018\u0003\u0019\u0005\n\u0014\u0017$\n\u0001 , where both \r\n\u000e and \r\nstruction can be characterized by the projection \r\n\u001c\u0012\r\n\u0015F\u0017\n\u0001\u0001\n ,\n\u0015 . The deviation7\nare normalized, and7\nis by de\ufb01nition orthogonal to \r\nwhere\n,.-\u00184\n/+132\u00144!5\n \u0006\u0003\u0002\n,.-\u00184\u001d7\n4.8\n4.8\n,.-\u00184\n5=\u0010\n5=\u0010\n\f\u000f/\n\u000b\n\n\u0015 , we need\nthe optimal direction. Here in order to evaluate5\u00047\n\u0017\u0007\u0006\t\b\nto know the variance of the gradient of . By discretizing both the space of stimuli and\npossible projections4\n\u0001\r\n\f\u0002B8\n\u0007\u001a\b\n8 . Therefore\ndent for different bins, one could obtain that 5\n\u0006\u000e\b\f\u000f\n\u0007\t\b\n\u0001\u0018\u0003\u0019\u0005\n\f\u000f/\nwhere\u0006\u000e\b\u0010\u000f means that the trace is taken in the subspace orthogonal to the model \ufb01lter, since\nby de\ufb01nition7\u0012\u0011\nprojecting each of the stimuli on vectors that differ by\u0013A\u0014\nto a logical OR, that is if either \u0010!\u0011\nthreshold value \u0010\u0015\u0014\n,.-0/+132\u001b465>7\n\u000f\u001e8\n8\u0017\u0016\u001c\u001d\n,.-0/+132\u001b465\n\u001765\nwhere !\n and !\n\u0015 and \u00106\u0011\nlarge, combinations of values of \u0010>\u0011\n\nA sequence of spikes from a model cell with two relevant directions was simulated by\nin their spatial phase, taken to\nmimic properties of complex cells, see Fig. 3. A particular frame leads to a spike according\n\n\u0001 are independent Gaussian variables. The sampling of this input-output\nfunction by our particular set of natural stimuli is shown in Fig. 3(c). Some, especially\n\n\u0015.\u0017\n\u000e on the RF \n\n\u0005 . In Fig. 2 we plot the average projection of the normalized\n\u0015 , and show that it scales with the number of spikes.\n\n\u001c(\r\nreconstructed vector \n\n3.3 Complex Cell\n\n\u0015 are not present in the ensemble.\n\n\u0015 , or \u0007\n\n\u00106\u0011\n\n\u0015 exceeds a\n\n(10)\n\nin the presence of noise. Similarly to (7),\n\n\u0015 , \u0007\n\n\u0015 , \u00106\u0011\n\n\u0006\u0003\u0002\n\n-%7\n\n-%7\n\n\u001a\n\u0011\n\n\u0015\n\u000e\n\u001a\n\u0011\n\n\u0003\n\u0007\n\n\u0001\n7\n\u000e\n\u001a\n\u0011\n\n\u0015\n\u000e\n\u001a\n\u0011\n\n\u000e\n?\n\n\u0012\n\u0017\n\u0003\n\f\n/\n-\n:\n\n4\n8\n\u0004\n\n\n4\n7\n8\n8\n\u0005\n\u0001\n-\n5\n\u0010\n\u0006\n\u0010\n\u0002\n7\n\u0007\n\u0006\n7\n4\n8\n\u0002\n7\n8\n\u000e\n\u0001\n8\n\u0012\n\n?\n\n5\n\u0012\n\n\u0012\n8\n\n?\n\n\u0012\n\n\u0006\n\u0012\n\u0014\n-\n$\n-\n\u0003\n\u0007\n\n\u000e\n\u001a\n\u0011\n\n\u0015\n\u0015\n\u0003\n-\n5\n7\n\u000e\n\u0001\n8\n\u0017\n\u0012\n\n?\n\n\u0015\n-\n$\n-\n \n\u001a\n\u0011\n\n\u001a\n\u0011\n\n-\n\n\u0015\n\u0017\n\u000f\n\u001c\n\n\u001a\n\u0011\n\n\u0010\n\u0011\n\n\u0001\n\u0015\n\u0017\n\u000f\n\u001c\n\n\u001a\n\u0011\n\u0001\n\u0001\n8\n\u0017\n:\n-\n\u0010\n\u0011\n\n\u0015\n \n\u0010\n\u0011\n\u0001\n\u0015\n8\n\u001d\n\u0010\n\u0011\n\n\u0015\n7\n\u0007\n\u001a\n\u0007\n!\n\n\u0010\n\u0011\n\u0001\n\u0015\n7\n\u0007\n\u001a\n\u0007\n!\n\u0001\n8\n8\n \n\n\u0001\n\fWe start by maximizing information with respect to one direction. Contrary to analysis for\na simple cell, one optimal direction recovers only about 60% of the total information per\nfor stimuli drawn from natural\nscenes, where due to correlations even a random vector has a high probability of explain-\ning 60% of total information per spike. We therefore go on to maximize information with\nrespect to two directions. An example of the reconstruction of input-output function of a\n\nspike. This is signi\ufb01cantly different from the total\n and \u000e\ncomplex cell is given in Fig. 3. Vectors \u000e\n\u0015 and \r\nnal, and are also rotated with respect to \n\n\u0001\u0018\u0003\u0006\u0005\n\u0007\u001a\b\nthat maximize'-\n8 are not orthogo-\n\u0015 . However, the quality of reconstruction\n\nis independent of a particular choice of basis with the RS. The appropriate measure of sim-\nilarity between the two planes is the dot product of their normals. In the example of Fig. 3,\n\nMaximizing information with respect to two directions requires a signi\ufb01cantly slower cool-\ning rate, and consequently longer computational times. However, an expected error in the\n\nroughly twice that for a simple cell given the same number of spikes. In this calculation\n\n, follows a$\n\ne(2) \n\n\u0001\u0018\u0003\u0019\u0005\n\n\u0007\t\b behavior, similarly to (9), and is\n\nP(spike|s(1),s(2)) \n\n(c) \n\n .\n\n\u0002\u0006\u0005\n\n\u001c?\n\n\u0002\u0004\u0003\n\n\u0002\u0004\u0003\b\u0007\n\n\u0011\t\n\u0011\u0001\nreconstruction, \u0003\nthere were \u0015\n\n(a) \n\n\u0005?\"\n\u0002\u000b\u0005\n\u0011\u0001\n\u0003\u0006\u0005\u000e\r spikes.\n\ne(1) \n\n\u0003\f\u0007\n\n\u001c'\n\n\u0011\u0001\n\n \nl\ne\nd\no\nm\n\n10\n\n20\n\n30\n\n(d) \n\n \nn\no\ni\nt\nc\nu\nr\nt\ns\nn\no\nc\ne\nr\n\n10\n\n20\n\n30\n\n10\n\n20\n\n30\n\nv\n \n1\n\n10\n\n20\n\n30\n\n(b) \n\n10\n\n20\n\n30\n\n(e) \n\n10\n\n20\n\n30\n\n10\n\n20\n\n30\n\nv\n\n \n2\n\nP(spike|s(cid:215) v\n\n,s(cid:215) v\n1\n\n) \n2\n\n(f) \n\n10\n\n20\n\n30\n\n\u00106\u0011\n\n\u0015 shown in\n)3\u0005\n8 .\n\u0005 . Below,\n\nPanel (c) shows how the input-output function is sampled by our ensemble of stimuli. Dark\n\n\u0015 and \r\nFigure 3: Analysis of a model complex cell with relevant directions \r\n(a) and (b). Spikes are generated according to an \u201cOR\u201d input-output function :\n\u0005?\"\n\u00053\"\n)\n\f\u000b\n)\n\f\u000b\n)3\u0005\nwith the threshold \u001a\n8 and noise variance \u0011\n\u0015 correspond to cases where ,.-\npixels for large values of \u0010!\u0011\n\u0015 and \u00106\u0011\nfound by maximizing information'-\nwe show vectors \u000e\n and \u000e\ncorresponding input-output function with respect to projections \u000f\u0018\u001c\n\nIn conclusion, features of the stimulus that are most relevant for generating the response\nof a neuron can be found by maximizing information between the sequence of responses\nand the projection of stimuli on trial vectors within the stimulus space. Calculated in this\nmanner, information becomes a function of direction in a stimulus space. Those directions\nthat maximize the information and account for the total information per response of interest\nspan the relevant subspace. This analysis allows the reconstruction of the relevant subspace\nwithout assuming a particular form of the input-output function. It can be strongly nonlin-\near within the relevant subspace, and is to be estimated from experimental histograms.\nMost importantly, this method can be used with any stimulus ensemble, even those that are\nstrongly non-Gaussian as in the case of natural images.\n\n\u0010!\u0011\n\u00106\u0011\n and \u000f\u0018\u001c\n\ntogether with the\n\n\u0001 .\n\nAcknowledgments\n\nWe thank K. D. Miller for many helpful discussions. Work at UCSF was supported in part\nby the Sloan and Swartz Foundations and by a training grant from the NIH. Our collab-\n\n\u0001\n\u000e\n\n \n\u000e\n\u0001\n\u001a\n\u0011\n\n\u001a\n\u0011\n\u0001\n\n\f\n\u0002\n\n\u0005\n\u0015\n\f\n2\n\u0002\n\u0002\n\n2\n\u0007\n\u0015\n\u0015\n\u0007\n\n\f\n\u0002\n\u0003\n\u0002\n\n\u0002\n\u0005\n\u0015\n\f\n2\n\u0002\n\u0002\n\n2\n\u0007\n\u0015\n?\n\n\u001a\n\u0011\n\n\u001a\n\u0011\n\u0001\n-\n\u0010\n\u0011\n\n\u0015\n \n\u0001\n\u0015\n8\n\u0017\n\u0014\n-\n\u0010\n\u0007\n\u0010\n\u0013\n\u0017\n\u0005\n\u0016\n-\n\u0010\n\u0007\n\u0010\n\u0013\n\n\u0001\n\n\u0015\n \n\u0001\n\u0015\n8\n\u0017\n\u0001\n\u000e\n\n \n\u000e\n\u0001\n8\n\u000e\n\u000e\n\foration began at the Marine Biological Laboratory in a course supported by grants from\nNIMH and the Howard Hughes Medical Institute.\n\nReferences\n[1] F. Rieke, D. A. Bodnar, and W. Bialek. Naturalistic stimuli increase the rate and ef\ufb01ciency of\ninformation transmission by primary auditory afferents. Proc. R. Soc. Lond. B, 262:259\u2013265,\n(1995).\n\n[2] W. E. Vinje and J. L. Gallant. Sparse coding and decorrelation in primary visual cortex during\n\nnatural vision. Science, 287:1273\u20131276, 2000.\n\n[3] F. E. Theunissen, K. Sen, and A. J. Doupe. Spectral-temporal receptive \ufb01elds of nonlinear\n\nauditory neurons obtained using natural sounds. J. Neurosci., 20:2315\u20132331, 2000.\n\n[4] G. D. Lewen, W. Bialek, and R. R. de Ruyter van Steveninck. Neural coding of naturalistic\n\nmotion stimuli. Network: Comput. Neural Syst., 12:317\u2013329, 2001.\n\n[5] N. J. Vickers, T. A. Christensen, T. Baker, and J. G. Hildebrand. Odour-plume dynamics\n\nin\ufb02uence the brain\u2019s olfactory code. Nature, 410:466\u2013470, 2001.\n\n[6] K. Sen, F. E. Theunissen, and A. J. Doupe. Feature analysis of natural sounds in the songbird\n\nauditory forebrain. J. Neurophysiol., 86:1445\u20131458, 2001.\n\n[7] D. L. Ringach, M. J. Hawken, and R. Shapley. Receptive \ufb01eld structure of neurons in monkey\nvisual cortex revealed by stimulation with natural image sequences. Journal of Vision, 2:12\u201324,\n2002.\n\n[8] W. E. Vinje and J. L. Gallant. Natural stimulation of the nonclassical receptive \ufb01eld increases\n\ninformation transmission ef\ufb01ciency in V1. J. Neurosci., 22:2904\u20132915, 2002.\n\n[9] D. L. Ruderman and W. Bialek. Statistics of natural images: scaling in the woods. Phys. Rev.\n\nLett., 73:814\u2013817, 1994.\n\n[10] D. J. Field. Relations between the statistics of natural images and the response properties of\n\ncortical cells. J. Opt. Soc. Am. A, 4:2379\u20132394, 1987.\n\n[11] P. Kara, P. Reinagel, and R. C. Reid. Low response variability in simultaneously recorded\n\nretinal, thalamic, and cortical neurons. Neuron, 27:635\u2013646, 2000.\n\n[12] R. R. de Ruyter van Steveninck, G. D. Lewen, S. P. Strong, R. Koberle, and W. Bialek. Repro-\n\nducibility and variability in neural spike trains. Science, 275:1805\u20131808, 1997.\n\n[13] F. Rieke, D. Warland, R. R. de Ruyter van Steveninck, and W. Bialek. Spikes: Exploring the\n\nneural code. MIT Press, Cambridge, 1997.\n\n[14] E. de Boer and P. Kuyper. Triggered correlation. IEEE Trans. Biomed. Eng., 15:169\u2013179, 1968.\n[15] N. Brenner, W. Bialek, and R. R. de Ruyter van Steveninck. Adaptive rescaling maximizes\n\ninformation transmission. Neuron, 26:695\u2013702, 2000.\n\n[16] R. R. de Ruyter van Steveninck and W. Bialek. Real-time performance of a movement-sensitive\nneuron in the blow\ufb02y visual system: coding and information transfer in short spike sequences.\nProc. R. Soc. Lond. B, 234:379\u2013414, 1988.\n\n[17] N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proceedings of\nthe 37th Allerton Conference on Communication, Control and Computing, edited by B. Hajek\n& R. S. Sreenivas. University of Illinois, 368\u2013377, 1999.\n\n[18] A. G. Dimitrov and J. P. Miller. Neural coding and decoding: communication channels and\n\nquantization. Network: Comput. Neural Syst., 12:441\u2013472, 2001.\n\n[19] L. Paninski. Convergence properties of some spike-triggered analysis techniques. In Advances\nin Neural Information Processing 15, edited by S. Becker, S. Thrun, and K. Obermayer, 2003.\n[20] N. Brenner, S. P. Strong, R. Koberle, W Bialek, and R. R. de Ruyter van Steveninck. Synergy\n\nin a neural code. Neural Comp., 12:1531-1552, 2000.\n\n[21] A. Treves and S. Panzeri. The upward bias in measures of information derived from limited\n\ndata samples. Neural Comp., 7:399, 1995.\n\n[22] S. P. Strong, R. Koberle, R. R. de Ruyter van Steveninck, and W. Bialek. Entropy and informa-\n\ntion in neural spike trains. Phys. Rev. Lett., 80:197\u2013200, 1998.\n\n\f", "award": [], "sourceid": 2312, "authors": [{"given_name": "Tatyana", "family_name": "Sharpee", "institution": null}, {"given_name": "Nicole", "family_name": "Rust", "institution": null}, {"given_name": "William", "family_name": "Bialek", "institution": null}]}