Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)
Marcelo Montemurro, Stefano Panzeri
A typical neuron in visual cortex receives most inputs from other cortical neurons with a roughly similar stimulus preference. Does this arrange- ment of inputs allow efficient readout of sensory information by the tar- get cortical neuron? We address this issue by using simple modelling of neuronal population activity and information theoretic tools. We find that efficient synaptic information transmission requires that the tuning curve of the afferent neurons is approximately as wide as the spread of stim- ulus preferences of the afferent neurons reaching the target neuron. By meta analysis of neurophysiological data we found that this is the case for cortico-cortical inputs to neurons in visual cortex. We suggest that the organization of V1 cortico-cortical synaptic inputs allows optimal in- formation transmission.
A typical neuron in visual cortex receives most of its inputs from other visual cortical neu- rons. The majority of cortico-cortical inputs arise from afferent cortical neurons with a preference to stimuli which is similar to that of the target neuron [1, 2, 3]. For exam- ple, orientation selective neurons in superficial layers in ferret visual cortex receive more than 50% of their cortico-cortical excitatory inputs from neurons with orientation prefer- ence which is less than 30o apart. However, this input structure is rather broad in terms of stimulus-specificity: cortico-cortical connections between neurons tuned to dissimilar stimulus orientation also exist . The structure and spread of the stimulus specificity of cortico-cortical connections has received a lot of attention because of its importance for understanding the mechanisms of generation of orientation tuning (see  for a review). However, little is still known on whether this structure of inputs allows efficient transmis- sion of sensory information across cortico-cortical synapses.
It is likely that efficiency of information transmission across cortico-cortical synapses also depends on the width of tuning curves of the afferent cortical neurons to stimuli. In fact, theoretical work on population coding has shown that the width of the tuning curves has
an important influence on the quality and the nature of the information encoding in cortical populations [5, 6, 7, 8]. Another factor that may influence the efficiency of cortico-cortical synaptic information transmission is the biophysical capability of the target neuron. To conserve all information during synaptic transmission, the target neuron must conserve the `label' of the spikes arriving from multiple input neurons at different places on its dendritic tree . Because of biophysical limitations, a target neuron that e.g. can only sum inputs at the soma may lose a large part of the information present in the afferent activity. The optimal arrangement of cortico-cortical synapses may also depend on the capability of postsynaptic neurons in processing separately spikes from different neurons.
In this paper, we address the problem of whether cortico-cortical synaptic systems encode information efficiently. We introduce a simple model of neuronal information processing that takes into account both the selective distribution of stimulus preferences typical of cortico-cortical connections and the potential biophysical limitations of cortical neurons. We use this model and information theoretic tools to investigate whether there is an opti- mal trade-off between the spread of distribution of stimulus preference across the afferent neurons and the tuning width of the afferent neurons itself. We find that efficient synaptic information transmission requires that the tuning curve of the afferent neurons is approx- imately as wide as the spread of stimulus preferences of the afferent fibres reaching the target neuron. By reviewing anatomical and physiological data, we argue that this optimal trade-off is approximately reached in visual cortex. These results suggest that neurons in visual cortex are wired to decode optimally information from a stimulus-specific distribu- tion of synaptic inputs.
2 Model of the activity of the afferent neuronal population
We consider a simple model for the activity of the afferent neuronal population based on the known tuning properties and spatial and synaptic organisation of sensory areas.
2.1 Stimulus tuning of individual afferent neurons
We assume that the the population is made of a large number N of neurons (for a real cortical neuron, the number N of afferents is in the order of few thousands ). The response of each neuron rk(k = 1, , N) is quantified as the number of spikes fired in a salient post-stimulus time window of a length . Thus, the overall neuronal population response is represented as a spike count vector r = (r1, , rN ).
We assume that the neurons are tuned to a small number D of relevant stimulus parameters [11, 12], such as e.g. orientation, speed or direction of motion of a visual object. The stimulus variable will thus be described as a vector s = (s1, . . . , sD) of dimension D. The number of stimulus features that are encoded by the neuron will be left as a free parameter to be varied within the range 1-5 for two reasons. First, although there is evidence that the number of stimulus features encoded by a single neuron is limited [11, 12], more research is still needed to determine exactly how many stimulus parameters are encoded in different areas. Second, a previous related study  has shown that, when considering large neuronal populations with a uniform distribution of stimulus preferences (such as an hypercolumn in V1 containing all stimulus orientations) the tuning width of individual neurons which is optimal for population coding depends crucially on the number of stimulus features being encoded. Thus, it is interesting to investigate how the optimal arrangement of cortico- cortical synaptic systems depends on the number of stimulus features being encoded.
The neuronal tuning function of the k - th neuron (k = 1, , N ), which quantifies the mean spike count of the k - th neuron to the presented stimulus, is modelled as a Gaussian distribution, characterised by the following parameters: preferred stimulus s(k), tuning
width f , and response modulation m:
- (s-s(k))2 f (k)(s) = me 2f 2 (1)
The Gaussian tuning curve is a good description of the tuning properties of e.g. V1 or MT neurons to variables such as stimulus orientation motion direction [13, 14, 15], and is hence widely used in models of sensory coding [16, 17]. Large values of f indicate coarse coding, whereas small values of f indicate sharp tuning.
Spike count responses of each neuron on each trial are assumed to follow a Poisson distri- bution whose mean is given by the above neuronal tuning function (Eq. 1). The Poisson model is widely used because it is the simplest model of neuronal firing that captures the salient property of neuronal firing that the variance of spike counts is proportional to its mean. The Poisson model neglects all correlations between spikes. This assumption is certainly a simplification but it is sufficient to account for the majority of the information transmitted by real cortical neurons [18, 19, 20] and, as we shall see later, it is mathemati- cally convenient because it makes our model tractable.
2.2 Distribution of stimulus preferences among the afferent population
Neurons in sensory cortex receive a large number of inputs from other neurons with a vari- ety of stimulus preferences. However, the majority of their inputs come from neurons with roughly similar stimulus preference [1, 2, 3]. To characterise correctly this type of spread of stimulus preference among the afferent population, we assume (unlike in previous stud- ies ), that the probability distribution of the preferred stimulus among afferent neurons follows a Gaussian distribution:
1 - (^s-^s0)2 P (^s) = 22 p (2 e (2) )D/2D p
In Eq. (2) the parameter ^ s0 represents the the center of the distribution, thus being the most represented preferred stimulus in the population. (we set, without loss of general- ity, ^ s0 = 0.) The parameter p controls the spread of stimulus preferences of the afferent neuronal population: a small value of p indicates that a large fraction of the population have similar stimulus preferences, and a large value of p indicates that all stimuli are represented similarly. A Gaussian distribution of stimulus preferences of the afferent pop- ulation fits well empirical data on distribution of preferred orientations of synaptic inputs of neurons in both deep and superficial layers of ferret primary visual cortex .
3 Width of tuning and spread of stimulus preferences in visual cortex
To estimate the width of tuning f and the spread of stimulus preferences p of cortico- cortical afferent populations in visual cortex, we reviewed critically published anatomical and physiological data. We concentrated on excitatory synaptic inputs, which form the majority of inputs to a cortical pyramidal neuron . We computed p by fitting (by a least square method) the published histograms of synaptic connections as function of stimulus preference of the input neuron to Gaussian distributions. Similarly, we determined f by fitting spike count histograms to Gaussian tuning curves.
When considering a target neuron in ferret primary visual cortex and using orientation as the stimulus parameters, the spread of stimulus preferences p of its inputs is 20o for layer 5/6 neurons , and 16o  to 23o  for layer 2/3 neurons. The orientation tuning width f of the cortical inputs to the V1 target neuron is that of other V1 neurons that project to it. This f is 17o for Layer 4 neurons , and it is similar for neurons in deep and superficial layers . When considering a target neuron in Layer 4 of cat visual cortex
and orientation tuning, the spread of stimulus preference p is 20o  and f is 17o. When considering a target neuron in ferret visual cortex and motion direction tuning, the spread of tuning of its inputs p is 30 o . Motion direction tuning widths of macaque neurons is 28o, and this width is similar across species (see ).
The most notable finding of our meta-analysis of published data is that p and f appear to be approximately of the same size and their ratio f /p is distributed around 1, in the range 0.7 to 1.1 for the above data. We will use our model to understand whether this range of f /p corresponds to an optimal way to transmit information across a synaptic system.
4 Information theoretic quantification of population decoding
To characterise how a target neuronal system can decode the information about sensory stimuli contained in the activity of its afferent neuronal population, we use mutual infor- mation . The mutual information between a set of stimuli and the neuronal responses quantifies how well any decoder can discriminate among stimuli by observing the neuronal responses. This measure has the advantage of being independent of the decoding mecha- nism used, and thus puts precise constraints on the information that can be decoded by any biological system operating on the afferent activity.
Previous studies on the information content of an afferent neuronal population [7, 8] have assumed that the target neuronal decoding system can extract all the information during synaptic transmission. To do so, the target neuron must conserve the "label" of the spikes arriving from multiple neurons at different sites on its dendritic tree . Given the poten- tial biophysical difficulty in processing each spike separately, a simple alternative to spike labelling has been proposed, - spike pooling [10, 24]. In this scheme, the target neuron simply sums up the afferent activity. To characterize how the decoding of afferent informa- tion would work in both cases, we compute both the information that can be decoded by either a system that processes separately spikes from different neurons (the "labeled-line information") and the information available to a decoder that sums all incoming spikes (the "pooled information") [9, 24]. In the next two subsections we define these quantities and we explain how we compute it in our model.
4.1 The information available to the the labeled-line decoder
The mutual information between the set of the stimuli and the labeled-line neuronal popu- lation activity is defined as follows [9, 24]:
ILL(S, R) = dsP (s) P (r|s) log P (r|s) r P (r) (3)
where P (s) is the probability of stimulus occurrence (here taken for simplicity as a uni- form distribution over the hypersphere of D dimensions and `radius' s). P (r|s) is the probability of observing a neuronal population response r conditional to the occurrence of stimulus s, and P (r) = dsP (s)P (r|s). Since the response vector r keeps separate the spike counts of each neuron, the amount of information in Eq. (3) is only accessible to a decoder than can keep the label of which neuron fired which spike [9, 24]. The probability P (r|s) is computed according to the Poisson distribution, which is entirely determined by the knowledge of the tuning curves . The labeled-line mutual information is difficult to compute for large populations, because it requires the knowledge of the probability of the large-dimensional response vector r. However, since in our model we assume that we have a very large number of independent neurons in the population and that the total activity of the system is of the order of its size, then we can use the following simpler (but still exact)
expression[16, 25]: 1 ILL(S, R) = H(S) - D ln (2 2 e) + 2 ds P(s) ln (|J (s)|) (4) where H(S) is the entropy of the prior stimulus presentation distribution P (S), J (s) is the Fisher information matrix and | . . . | stands for the determinant. The Fisher information matrix is a D D matrix who's elements i, j are defined as follows: Ji,j(s) = - P (r|s) 2 log P(r|s) , (5) r si sj Fisher information is a useful measure of the accuracy with which a particular stimulus can be reconstructed from a single trial observation of neuronal population activity. However, in this paper it is used only as a step to obtain a computationally tractable expression for the labeled-line mutual information. The Fisher information matrix can be computed by taking into account that for a population of Poisson neurons is just the sum of the Fisher informa- tion for individual neurons, and the latter has a simple expression in terms of tuning curves . Since the neuronal population size N is is large, the sum over Fisher information of individual neurons can be replaced by an integral over the stimulus preferences of the neurons in the population, weighted by their probability density P (^s). After performing the integral over the distribution of preferred stimuli, we arrived at the following result for the elements of the Fisher information matrix:
J D-2 - 2 i,j(s) = N m i,j + 2 (i,j + ij) e 2(1+2) (6) 2p (1 + 2)D2 +2 where we have introduced the following short-hand notation f /p and s/p ; i,j stands for the Kroneker Delta. From Eq. (6) it is possible to compute explicitly the determinant |J (s)|, which has the following form: D |J (s)| = i = ()D(1 + 2)D-1 1 + 2(1 + 2) (7) i=1 where () is given by: D-2 - 2 () = N m e 2(1+2) (8) 2p (1 + 2)D2 +1 Inserting Eq. (7) into Eq. (4), one obtains a tractable but still exact expression for the mutual information , which has the advantage over Eq. (3) of requiring only an integral over a D-dimensional stimulus rather than a sum over an infinite population.
We have studied numerically the dependence of the labeled-line information on the pa- rameters f and p as a function of the number of encoded stimulus features D 1. We investigated this by fixing p and then varying the ration f /p over a wide range. Results (obtained for p = 1 but representative of a wide f range) are reported in Fig. 1. We found that, unlike the case of a uniform distribution of stimulus preferences , there is a finite value of the width of tuning f that maximizes the information for all D 2. Inter- estingly, for D 2 the range 0.7 f /p 1.1 found in visual cortex either contains the maximum or corresponds to near optimal values of information transmission. For D = 1, information is maximal for very narrow tuning curves. However, also in this case the in- formation values are still efficient in the cortical range f /p 1, in that the tail of the D = 1 information curve is avoided in that region. Thus, the range of values of f and p found in visual cortex allows efficient synaptic information transmission over a wide range of number of stimulus features encoded by the neuron.
1We found (data not shown) that other parameters such as m and , had a weak or null effect on the optimal configuration; see  for a D = 1 example in a different context.
D=1 (S,R) LL I D=5 0 2 4 6 8 / f p
Figure 1: Mutual labeled-line information as a function of the ratio of tuning curve width and stimulus preference spread f /p. The curves for each stimulus dimensionality D were shifted by a constant factor to separate them for visual inspection (lower curves cor- respond to higher values of D). The y-axis is thus in arbitrary units. The position of the maximal information for each stimulus dimension falls either inside the range of values of f /p found in visual cortex, or very close to it (see text) . Parameters are as follows: s = 2, rmax = 50Hz, = 10ms.
4.2 The information available to the the pooling decoder
We now consider the case in which the target neuron cannot process separately spikes from different neurons (for example, a neuron that just sums up post-synaptic potentials of approximately equal weight at the soma). In this case the label of the neuron that fired each spike is lost by the target neuron, and it can only operate on the pooled neuronal signal, in which the identity of each spike is lost. Pooling mechanisms have been proposed as simple information processing strategies for the nervous system. We now study how pooling changes the requirements for efficient decoding by the target neuron.
Mathematically speaking, pooling maps the vector r onto a scalar equal to the sum of the individual activities: = rk. Thus, the mutual information that can be extracted by any decoder that only pools it inputs is given by the following expression:
IP (S, R) = dsP (s) P (|s) log P (|s) P () (9)
where P (|s) and P () are the the stimulus-conditional and stimulus-unconditional proba- bility of observing a pooled population response on a single trial. The probability P (|s) can be computed by noting that a sum of Poisson-distributed responses is still a Poisson- distributed response whose tuning curve to stimuli is just the sum of the individual tuning curves. The pooled mutual information is thus a function of a single Poisson-distributed response variables and can be computed easily also for large populations.
The dependence of the pooled information on the parameters f and p as a function of the number of encoded stimulus features D is reported in Fig. 2. There is one important difference with respect to the labeled-line information transmission case. The difference is that, for the pooled information, there is a finite optimal value for information transmission also when the neurons are tuned to one-dimensional stimulus feature. For all cases of stim- ulus dimensionality considered, the efficient information transmission though the pooled
D=1 (S,R) D=3 P I 0 1 2 3 4 / f p
Figure 2: Pooled mutual information as a function of the ratio of tuning curve width and stimulus preference spread f /p. The maxima are inside the range of experimental values of f /p found in the visual cortex, or very close to it (see text). As for Fig. 1, the curves for each stimulus dimensionality D were shifted by a constant factor to separate them for visual inspection (lower curves correspond to higher values of D). The y-axis is thus in arbitrary units. Parameters are as follows: s = 2, rmax = 50 Hz, = 10ms.
neuronal decoder can still be reached in the visual cortical range 0.7 f p 1.1. This finding shows that the range of values of f and p found in visual cortex allows effi- cient synaptic information transmission even if the target neuron does not rely on complex dendritic processing to conserve the label of the neuron that fired the spike.