{"title": "Quantifying how much sensory information in a neural code is relevant for behavior", "book": "Advances in Neural Information Processing Systems", "page_first": 3686, "page_last": 3696, "abstract": "Determining how much of the sensory information carried by a neural code contributes to behavioral performance is key to understand sensory function and neural information flow. However, there are as yet no analytical tools to compute this information that lies at the intersection between sensory coding and behavioral readout. Here we develop a novel measure, termed the information-theoretic intersection information $\\III(S;R;C)$, that quantifies how much of the sensory information carried by a neural response $R$ is used for behavior during perceptual discrimination tasks. Building on the Partial Information Decomposition framework, we define $\\III(S;R;C)$ as the part of the mutual information between the stimulus $S$ and the response $R$ that also informs the consequent behavioral choice $C$. We compute $\\III(S;R;C)$ in the analysis of two experimental cortical datasets, to show how this measure can be used to compare quantitatively the contributions of spike timing and spike rates to task performance, and to identify brain areas or neural populations that specifically transform sensory information into choice.", "full_text": "Quantifying how much sensory information in a\n\nneural code is relevant for behavior\n\nGiuseppe Pica1,2\n\ngiuseppe.pica@iit.it\n\nEugenio Piasini1\n\neugenio.piasini@iit.it\n\nHouman Safaai1,3\n\nhouman_safaai@hms.harvard.edu\n\nCaroline A. Runyan3,4\n\nrunyan@pitt.edu\n\nMathew E. Diamond5\ndiamond@sissa.it\n\nTommaso Fellin2,6\n\ntommaso.fellin@iit.it\n\nChristoph Kayser7,8\n\nchristoph.kayser@uni-bielefeld.de\n\nChristopher D. Harvey3\n\nChristopher_Harvey@hms.harvard.edu\n\nStefano Panzeri1,2\n\nstefano.panzeri@iit.it\n\n1 Neural Computation Laboratory, Center for Neuroscience and Cognitive Systems@UniTn,\n\nIstituto Italiano di Tecnologia, Rovereto (TN) 38068, Italy\n\n2 Neural Coding Laboratory, Center for Neuroscience and Cognitive Systems@UniTn,\n\nIstituto Italiano di Tecnologia, Rovereto (TN) 38068, Italy\n\n3 Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA\n\n4 Department of Neuroscience, University of Pittsburgh,\nCenter for the Neural Basis of Cognition, Pittsburgh, USA\n\n5 Tactile Perception and Learning Laboratory,\n\nInternational School for Advanced Studies (SISSA), Trieste, Italy\n\n6 Optical Approaches to Brain Function Laboratory,\nIstituto Italiano di Tecnologia, Genova 16163, Italy\n\n7 Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK\n\n8 Department of Cognitive Neuroscience, Faculty of Biology,\n\nBielefeld University, Universit\u00e4tsstr. 25, 33615 Bielefeld, Germany\n\nAbstract\n\nDetermining how much of the sensory information carried by a neural code con-\ntributes to behavioral performance is key to understand sensory function and neural\ninformation \ufb02ow. However, there are as yet no analytical tools to compute this infor-\nmation that lies at the intersection between sensory coding and behavioral readout.\nHere we develop a novel measure, termed the information-theoretic intersection\ninformation III(S; R; C), that quanti\ufb01es how much of the sensory information\ncarried by a neural response R is used for behavior during perceptual discrimi-\nnation tasks. Building on the Partial Information Decomposition framework, we\nde\ufb01ne III(S; R; C) as the part of the mutual information between the stimulus S\nand the response R that also informs the consequent behavioral choice C. We\ncompute III(S; R; C) in the analysis of two experimental cortical datasets, to show\nhow this measure can be used to compare quantitatively the contributions of spike\ntiming and spike rates to task performance, and to identify brain areas or neural\npopulations that speci\ufb01cally transform sensory information into choice.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\f1\n\nIntroduction\n\nPerceptual discrimination is a brain computation that is key to survival, and that requires both\nencoding accurately sensory stimuli and generating appropriate behavioral choices (Fig.1). Previous\nstudies have mostly focused separately either on the former stage, called sensory coding, by analyzing\nhow neural activity encodes information about the external stimuli [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], or on\nthe latter stage, called behavioral readout, by analyzing the relationships between neural activity and\nchoices in the absence of sensory signal or at \ufb01xed sensory stimulus (to eliminate spurious choice\nvariations of neural response due to stimulus-related selectivity) [11, 12, 13]. The separation between\nstudies of sensory coding and readout has led to a lack of consensus on what is the neural code, which\nhere we take as the key set of neural activity features for perceptual discrimination. Most studies have\nin fact de\ufb01ned the neural code as the set of features carrying the most sensory information [1, 2, 8],\nbut this focus has left unclear whether the brain uses the information in such features to perform\nperception [14, 15, 16].\nRecently, Ref. [17] proposed to determine if neural sensory representations are behaviorally relevant\nby evaluating the association, in single trials, between the information about the sensory stimuli\nS carried by the neural activity R and the behavioral choices C performed by the animal, or, in\nother words, to evaluate the intersection between sensory coding and behavioral readout. More\nprecisely, Ref. [17] suggested that the hallmark of a neural feature R being relevant for perceptual\ndiscrimination is that the subject will perform correctly more often when the neural feature R\nprovides accurate sensory information. Ref.[17] proposed to quantify this intuition by \ufb01rst decoding\nsensory stimuli from single-trial neural responses and then computing the increase in behavioral\nperformance when such decoding is correct. This intersection framework provides several advantages\nwith respect to earlier approaches based on computing the correlations between trial-averaged\npsychometric performance and trial-averaged neurometric performance [13, 14, 18], because it\nquanti\ufb01es associations between sensory information coding and choices within the same trial, instead\nof considering the similarity of trial-averaged neural stimulus coding and trial-averaged behavioral\nperformance. However, the intersection information measure proposed in Ref.[17] relies strongly on\nthe speci\ufb01c choice of a stimulus decoding algorithm, that might not match the unknown decoding\nalgorithms of the brain. Further, decoding only the most likely stimulus from neural responses throws\naway part of the full structure in the measured statistical relationships between S, R and C [3].\nTo overcome these limitations, here we convert the conceptual notions described in [17] into a novel\nand rigorous de\ufb01nition of information-theoretic intersection information between sensory coding and\nbehavioral readout III(S; R; C). We construct the information-theoretic intersection III(S; R; C) by\nbuilding on recent extensions of classical information theory, called Partial Information Decomposi-\ntions (PID), that are suited to the analysis of trivariate systems [19, 20, 21]. We show that III(S; R; C)\nis endowed with a set of formal properties that a measure of intersection information should satisfy.\nFinally, we use III(S; R; C) to analyze both simulated and real cortical activity. These applications\nshow how III(S; R; C) can be used to quantitatively rede\ufb01ne the neural code as the set of neural\nfeatures that carry sensory information which is also used for task performance, and to identify brain\nareas where sensory information is read out for behavior.\n\n2 An information-theoretic de\ufb01nition of intersection information\n\nThroughout this paper, we assume that we are analyzing neural activity recorded during a perceptual\ndiscrimination task (Fig.1). Over the course of an experimental trial, a stimulus s \u2208 {s1, ..., sNs}\nis presented to the animal while simultaneously some neural features r (we assume that r either\ntakes discrete values or is discretized into a certain number of bins) and the behavioral choice\nc \u2208 {c1, ..., cNc} are recorded. We assume that the joint probability distribution p(s, r, c) has been\nempirically estimated by sampling these variables simultaneously over repeated trials. After the\nanimal learns to perform the task, there will be a statistical association between the presented stimulus\nS and the behavioral choice C, and the Shannon information I(S : C) between stimulus and choice\nwill therefore be positive.\nHow do we quantify the intersection information between the sensory coding s \u2192 r and the\nconsequent behavioral readout r \u2192 c that involves the recorded neural activity features r in the\nsame trial? Clearly, the concept of intersection information must require the analysis of the full\ntrivariate probability distribution p(s, r, c) during perceptual discriminations. The well-established,\n\n2\n\n\fFigure 1: Schematics of the information \ufb02ow in a perceptual discrimination task: sensory infor-\nmation I(S : R) (light blue block) is encoded in the neural activity R. This activity informs the\nbehavioral choice C and so carries information about it (I(R : C), green block). III(S; R; C) is\nboth a part of I(S : R) and of I(R : C), and corresponds to the sensory information used for\nbehavior.\n\nclassical tools of information theory [22] provide a framework for assessing statistical associations\nbetween two variables only. Indeed, Shannon\u2019s mutual information allows us to quantify the sensory\ninformation I(S : R) that the recorded neural features carry about the presented stimuli [3] and,\nseparately, the choice information I(R : C) that the recorded neural features carry about the behavior.\nTo assess intersection information in single trials, we need to extend the classic information-theoretic\ntools to the trivariate analysis of S, R, C.\nMore speci\ufb01cally, we argue that an information-theoretic measure of intersection information should\nquantify the part of the sensory information which also informs the choice. To quantify this concept,\nwe start from the tools of the Partial Information Decomposition (PID) framework. This framework\ndecomposes the mutual information that two stochastic variables (the sources) carry about a third\nvariable (the target) into four nonnegative information components. These components characterize\ndistinct information sharing modes among the sources and the target on a \ufb01ner scale than Shannon\ninformation quantities [19, 20, 23, 24].\nIn our analysis of the statistical dependencies of S, R, C, we start from the mutual information\nI(C : (S, R)) that S and R carry about C. Direct application of the PID framework then leads to the\nfollowing nonnegative decomposition:\nI(C : (S, R)) = SI(C : {S; R}) + CI(C : {S; R}) + U I(C : {S \\ R}) + U I(C : {R \\ S}), (1)\nwhere SI, CI and U I are respectively shared (or redundant), complementary (or synergistic) and\nunique information quantities as de\ufb01ned in [20]. More in detail,\n\n\u2022 SI(C : {S; R}) is the information about the choice that we can extract from any of S and\nR, i.e. the redundant information about C shared between S and R.\n\u2022 U I(C : {S \\ R}) is the information about the choice that we can only extract from the\nstimulus but not from the recorded neural response. It thus includes stimulus information\nrelevant to the behavioral choice that is not represented in R.\n\u2022 U I(C : {R \\ S}) is the information about the choice that we can only extract from the\nneural response but not from the stimulus. It thus includes choice information in R that\narises from stimulus-independent variables, such as level of attention or behavioral bias.\n\u2022 CI(C : {S; R}) is the information about choice that can be only gathered if both S and R\nare simultaneously observed with C, but that is not available when only one between S and\nR is simultaneously observed with C. More precisely, it is that part of I(C : (S, R)) which\ndoes not overlap with I(S : C) nor with I(R : C) [19].\n\nSeveral mathematical de\ufb01nitions for the PID terms described above have been proposed in the\nliterature [19, 20, 23, 24]. In this paper, we employ that of Bertschinger et al. [20], which is\nwidely used for tripartite systems [25, 26]. Accordingly, we consider the space \u2206p of all probability\ndistributions q(s, r, c) with the same pairwise marginal distributions q(s, c) = p(s, c) and q(r, c) =\n\n3\n\n\fp(r, c) as the original distribution p(s, r, c). The redundant information SI(C : {S; R}) is then\nde\ufb01ned as the solution of the following convex optimization problem on the space \u2206p [20]:\n\nSI(C : {S; R}) \u2261 max\nq\u2208\u2206p\n\nCoIq(S; R; C),\n\n(2)\nwhere CoIq(S; R; C) \u2261 Iq(S : R) \u2212 Iq(S : R|C) is the co-information corresponding to the\nprobability distribution q(s, r, c). All other PID terms are then directly determined by the value of\nSI(C : {S; R})[19].\nHowever, none of the existing PID information components described above \ufb01ts yet the notion of\nintersection information, as none of them quanti\ufb01es the part of sensory information I(S : R) carried\nby neural activity R that also informs the choice C. The PID quantity that seems to be closest to this\nnotion is the redundant information that S and R share about C, SI(C : {S; R}). However, previous\nworks pointed out the subtle possibility that even two statistically independent variables (here, S and\nR) can share information about a third variable (here, C) [23, 27]. This possibility rules out using\nSI(C : {S; R}) as a measure of intersection information, since we expect that a neural response\nR which does not encode stimulus information (i.e., such that S \u22a5\u22a5 R) cannot carry intersection\ninformation.\nWe thus reason that the notion of intersection information should be quanti\ufb01ed as the part of the\nredundant information that S and R share about C that is also a part of the sensory information\nI(S : R). This kind of information is even \ufb01ner than the existing information components of the\nPID framework described above, and we recently found that comparing information components of\nthe three different Partial Information Decompositions of the same probability distribution p(s, r, c)\nleads to the identi\ufb01cation of \ufb01ner information quantities [21]. We take advantage of this insight to\nquantify the intersection information by introducing the following new de\ufb01nition:\n\nIII(S; R; C) = min{SI(C : {S; R}), SI(S : {R; C})}.\n\n(3)\nThis de\ufb01nition allows us to further decompose the redundancy SI(C : {S; R}) into two nonnegative\ninformation components, as\n\nSI(C : {S; R}) = III(S; R; C) + X(R),\n\n(4)\nwhere X(R) \u2261 SI(C : {S; R}) \u2212 III(S; R; C) \u2265 0. This \ufb01ner decomposition is useful because,\nunlike SI(C : {S; R}), III(S; R; C) has the property that S \u22a5\u22a5 R =\u21d2 III(S; R; C) = 0 (see Supp.\nInfo Sec.1). This is a \ufb01rst basic property that we expect from a meaningful de\ufb01nition of intersection\ninformation. Moreover, III(S; R; C) satis\ufb01es a number of additional important properties (see proofs\nin Supp. Info Sec. 1) that a measure of intersection information should satisfy:\n\n1. III(S; R; C) \u2264 I(S : R): intersection information should be a part of the sensory informa-\ntion extractable from the recorded response R \u2013 namely, the part which is relevant for the\nchoice;\n2. III(S; R; C) \u2264 I(R : C): intersection information should be a part of the choice information\nextractable from the recorded response R \u2013 namely, the part which is related to the stimulus;\n3. III(S; R; C) \u2264 I(S : C): intersection information should be a part of the information\n4. III(S;{R1, R2}; C) \u2265 III(S; R1; C), III(S; R2; C), as the task-relevant information that\ncan be extracted from any recorded neural features should not be smaller than the task-\nrelevant information that can be extracted from any subset of those features.\n\nbetween stimulus and choice \u2013 namely, the part which can be extracted from R;\n\nThe measure III(S; R; C) thus translates all the conceptual features of intersection information into a\nwell-de\ufb01ned analytical tool: Eq.3 de\ufb01nes how III(S; R; C) can be computed numerically from real\ndata once the distribution p(s, r, c) is estimated empirically. In practice, the estimated p(s, r, c) de\ufb01nes\nthe space \u2206p where the problem de\ufb01ned in Eq.2 should be solved. We developed a gradient-descent\noptimization algorithm to solve these problems numerically with a Matlab package that is freely\navailable for download and reuse through Zenodo and Github https://doi.org/10.5281/zenodo.850362\n(see Supp. Info Sec. 2). Computing III(S; R; C) allows the experimenter to estimate that portion\nof the sensory information in a neural code R that is read out for behaviour during a perceptual\ndiscrimination task, and thus to quantitatively evaluate hypotheses about neural coding from empirical\ndata.\n\n4\n\n\fS\n\nS\n\nR1\n\nR2\n\nR1\n\nR2\n\nC\n\n(a)\n\nC\n\n(b)\n\nS\n\nR1\n\nC\n\n(c)\n\nS\n\nR1\n\nR2\n\nC\n\n(d)\n\ninformation about the stimulus; blue:\n\nFigure 2: Some example cases where III(S; R1; C) = 0 for a neural code R1. Each panel\ncontains a probabilistic graphical model representation of p(s, r, c), augmented by a color code\nillustrating the nature of the information carried by statistical relationships between variables.\nRed:\ninformation about anything else (internal noise,\ndistractors, and so on). III(Ri) > 0 only if the arrows linking Ri with S and C have the same\ncolor. a: I(S : R2) > I(S : R1) = 0. I(C : R2) = I(C : R1). III(R2) > III(S; R1; C) = 0. b:\nI(S : R2) = I(S : R1). I(C : R2) > I(C : R1) = 0. III(R2) > III(S; R1; C) = 0. c: I(S : R1) > 0,\nI(C : R1) > 0, I(S : C) = 0. d: I(S : R1) > 0, I(C : R1) > 0, I(S : C) > 0, III(S; R1; C) = 0.\n\n2.1 Ruling out neural codes for task performance\n\nA \ufb01rst important use of III(S; R; C) is that it permits to rule out recorded neural features as candidate\nneural codes. In fact, the neural features R for which III(S; R; C) = 0 cannot contribute to task\nperformance. It is interesting, both conceptually and to interpret empirical results, to characterize\nsome scenarios where III(S; R1; C) = 0 for a recorded neural feature R1. III(S; R1; C) = 0 may\ncorrespond, among others, to one of the four scenarios illustrated in Fig.2:\n\n\u2022 R1 drives behavior but it is not informative about the stimulus, i.e. I(R1 : S) = 0 (Fig.2a);\n\u2022 R1 encodes information about S but it does not in\ufb02uence behavior, i.e. I(R1 : C) = 0\n\n(Fig.2b);\n\n\u2022 R1 is informative about both S and C but I(S : C) = 0 (Fig.2c, see also Supp. Info Sec.2);\n\u2022 I(S : R1) > 0, I(R1 : C) > 0, I(S : C) > 0, but the sensory information I(S : R1) is not\nread out to drive the stimulus-relevant behavior and, at the same time, the way R1 affects\nthe behaviour is not related to the stimulus (Fig.2d, see also Supp. Info Sec.2).\n\n3 Testing our measure of intersection information with simulated data\n\nTo better illustrate the properties of our measure of information-theoretic intersection information\nIII(S; R; C), we simulated a very simple neural scheme that may underlie a perceptual discrimination\ntask. As illustrated in Fig.3a, in every simulated trial we randomly drew a stimulus s \u2208 {s1, s2}\nwhich was then linearly converted to a continuous variable that represents the neural activity in the\nsimulated sensory cortex. This stimulus-response conversion was affected by an additive Gaussian\nnoise term (which we term \u201csensory noise\u201d) whose amplitude was varied parametrically by changing\nthe value of its standard deviation \u03c3S. The simulated sensory-cortex activity was then separately\nconverted, with two distinct linear transformations, to two continuous variables that simulated two\nhigher-level brain regions. These two variables are termed \u201cparietal cortex\u201d (R) and \u201cbypass pathway\u201d\n(R(cid:48)), respectively. We then combined R and R(cid:48) with parametrically tunable weights (we indicate the\nratio between the R-weight and the R(cid:48)-weight with \u03b1, see Supp. Info Sec.4) and added Gaussian\nnoise (termed \u201cchoice noise\u201d), whose standard deviation \u03c3C was varied parametrically, to eventually\nproduce another continuous variable that was fed to a linear discriminant. We took as the simulated\nbehavioral choice the binary output of this \ufb01nal linear discriminant, which in our model was meant to\nrepresent the readout mechanism in high-level brain regions that inform the motor output.\nWe ran simulations of this model by varying parametrically the sensory noise \u03c3S, the choice noise\n\u03c3C, and the parietal to bypass ratio \u03b1, to investigate how III(S; R; C) depended on these parameters.\n\n5\n\n\fFigure 3: a) Schematics of the simulated model used to test our framework. In each trial, a binary\nstimulus is linearly converted into a \u201csensory-cortex activity\u201d after the addition of \u2019sensory noise\u2019.\nThis signal is then separately converted to two higher-level activities, namely a \u201cparietal-cortex\nactivity\u201d R and a \u201cbypass-pathway activity\u201d R(cid:48). R and R(cid:48) are then combined with parametrically\ntunable weights and, after the addition of \u201cchoice noise\u201d, this signal is fed to a linear discriminant.\nThe output of the discriminant, that is the decoded stimulus \u02c6s, drives the binary choice c. We\ncomputed the intersection information of R to extract the part of the stimulus information encoded\nin the \u201cparietal cortex\u201d that contributes to the \ufb01nal choice. b-d) Intersection Information for the\nsimulations represented in a). Mean \u00b1 sem of III(S; R; C) across 100 experimental sessions, each\nrelying on 100 simulated trials, as a function of three independently varied simulation parameters.\nb) Intersection Information decreases when the stimulus representation in the parietal cortex R is\nmore noisy (higher sensory noise \u03c3S ). c) Intersection Information decreases when the bene\ufb01cial\ncontribution of the stimulus information carried by parietal cortex R to the \ufb01nal choice is reduced\nby increasing choice noise \u03c3C. d) Intersection Information increases when the parietal cortex R\ncontributes more strongly to the \ufb01nal choice by increasing the parietal to bypass ratio \u03b1.\n\nIn each simulated session, we estimated the joint probability psession(s, r, c) of the stimulus S, the\nresponse in parietal cortex R, and the choice C, from 100 simulated trials. We computed, sepa-\nrately for each simulated session, an intersection information III(S; R; C) value from the estimated\npsession(s, r, c). Here, and in all the analyses presented throughout the paper, we used a quadratic\nextrapolation procedure to correct for the limited sampling bias of information [28]. In Fig.3b-d\nwe show mean \u00b1 s.e.m. of III(S; R; C) values across 100 independent experimental sessions, as a\nfunction of each of the three simulation parameters.\nWe found that III(S; R; C) decreases with increasing \u03c3S (Fig.3b). This result was explained by the\nfact that increasing \u03c3S reduces the amount of stimulus information that is passed to the simulated\nparietal activity R, and thus also reduces the portion of such information that can inform choice and\ncan be used to perform the task appropriately. We found that III(S; R; C) decreases with increasing\n\u03c3C (Fig.3c), consistently with the intuition that for higher values of \u03c3C the choice depends more\nweakly on the activity of the simulated parietal activity R, which in turn also reduces how accurately\nthe choice re\ufb02ects the stimulus in each trial. We also found that III(S; R; C) increases with increasing\n\u03b1 (Fig.3d), because when \u03b1 is larger the portion of stimulus information carried by the simulated\nparietal activity R that bene\ufb01ts the behavioral performance is larger.\n\n6\n\n\f4 Using our measure to rank candidate neural codes for task performance:\nstudying the role of spike timing for somatosensory texture discrimination\n\nThe neural code was traditionally de\ufb01ned in previous studies as the set of features of neural activity\nthat carry all or most sensory information. In this section, we show how III(S; R; C) can be used\nto quantitatively rede\ufb01ne the neural code as the set of features that contributes the most sensory\ninformation for task performance. The experimenter can thus use III(S; R; C) to rank a set of\ncandidate neural features {R1, ..., RN} according to the numerical ordering III(S; Ri1 ; C) \u2264 ... \u2264\nIII(S; RiN ; C). An advantage of the information-theoretic nature of III(S; R; C) is that it quanti\ufb01es\nintersection information on the meaningful scale of bits, and thus enables a quantitative comparison of\ndifferent candidate neural codes. If for example III(S; R1; C) = 2III(S; R2; C) we can quantitatively\ninterpret that the code R1 provides twice as much information for task performance as R2. This\ninterpretation is not as meaningful, for example, when comparing different values of fraction-correct\nmeasures [17].\nTo illustrate the power of III(S; R; C) for evaluating and ranking candidate neural codes, we apply it\nto real data to investigate a fundamental question: is the sensory information encoded in millisecond-\nscale spike times used by the brain to perform perceptual discrimination? Although many studies\nhave shown that millisecond-scale spike times of cortical neurons encode sensory information not\ncarried by rates, whether or not this information is used has remained controversial [16, 29, 30]. It\ncould be, for example, that spike times cannot be read out because the biophysics of the readout\nneuronal systems is not suf\ufb01ciently sensitive to transmit this information, or because the readout\nneural systems do not have access to a stimulus time reference that could be used to measure these\nspike times [31].\nTo investigate this question, we used intersection information to compute whether millisecond-\nscale spike timing of neurons (n=299 cells) in rat primary (S1) somatosensory cortex provides\ninformation that is used for performing a whisker-based texture discrimination task (Figure 4a-b).\nFull experimental details are reported in [32]. In particular, we compared III(S; timing; C) with\nthe intersection information carried by rate III(S; rate; C), i.e. information carried by spike counts\nover time scales of tens of milliseconds. We \ufb01rst computed a spike-timing feature by projecting the\nsingle-trial spike train onto a zero-mean timing template (constructed by linearly combining the \ufb01rst\nthree spike trains PCs to maximize sensory information, following the procedure of [32]), whose\nshape indicated the weight assigned to each spike depending on its timing (Figure 4a). Then we\ncomputed a spike-rate feature by weighting the spikes with a \ufb02at template which assigns the same\nweight to spikes independently of their time. Note that this de\ufb01nition of timing, and in particular the\nfact that the timing template was zero mean, ensured that the timing variable did not contain any rate\ninformation. We veri\ufb01ed that this calculation provided timing and rate features that had negligible\n(-0.0030 \u00b1 0.0001 across the population) Pearson correlation.\nThe dif\ufb01culty of the texture discrimination task was set so that the rat learned the task well but still\nmade a number of errors in each session (mean behavioral performance 76.9%, p<0.001 above chance,\npaired t-test). These error trials were used to decouple in part choice from stimulus coding and to\nassess the impact of the sensory neural codes on behavior by computing intersection information.\nWe thus computed information across all trials, including both behaviorally correct and incorrect\ntrials. We found that, across all trials and on average over the dataset, timing carried similar texture\ninformation to rate (Figure 4b) ((9 \u00b1 2) \u00d7 10\u22123 bit in timing, (8.5 \u00b1 1.1) \u00d7 10\u22123 bit in rate, p=0.78\ntwo-sample t-test), while timing carried more choice information than rate ((16 \u00b1 1) \u00d7 10\u22123 bit\nin timing, (3.0 \u00b1 0.7) \u00d7 10\u22123 bit in rate, p<10\u221215 two-sample t-test). If we used only traditional\nmeasures of stimulus and choice information, it would be dif\ufb01cult to decide which code is most helpful\nfor task performance. However, when we applied our new information-theoretic framework, we found\nthat the intersection information III (Figure 4b) was higher for timing than for rate ((7 \u00b1 1) \u00d7 10\u22123 bit\nin timing, (3.0 \u00b1 0.6) \u00d7 10\u22123 bit in rate, p<0.002 two-sample t-test), thus suggesting that spike\ntiming is a more crucial neural code for texture perception than spike rate.\nInterestingly, intersection information III was approximately 80% of the total sensory information\nfor timing, while it was only 30% of the total sensory information for rate. This suggests that in\nsomatosensory neurons timing information about the texture is read out, and in\ufb02uences choice,\nmore ef\ufb01ciently than rate information, contrarily to what is widely assumed in the literature [34].\nThese results con\ufb01rm early results that were obtained with a decoding-based intersection information\nmeasure [32]. However, the information theoretic results in Fig.4b have the advantage that they do\n\n7\n\n\fFigure 4: Intersection Information for two experimental datasets. a: Simpli\ufb01ed schematics of the\nexperimental setup in [32]. Rats are trained to distinguish between textures with different degrees\nof coarseness (left), and neural spiking data from somatosensory cortex (S1) is decomposed in\nindependent rate and timing components (right). b: Stimulus, choice and intersection information\nfor the data in panel a. Spike timing carries as much sensory information (p=0.78, 2-sample t-test),\nbut more choice information (p<10\u221215), and more III (p<0.002) than \ufb01ring rate. c: Simpli\ufb01ed\nschematics of the experimental setup in [33]. Mice are trained to distinguish between auditory\nstimuli located to their left or to their right. Neural activity is recorded in auditory cortex (AC)\nand posterior parietal cortex (PPC) with 2-photon calcium imaging. d: Stimulus, choice and\nintersection information for the data in panel c. Stimulus information does not differ signi\ufb01cantly\nbetween AC and PPC, but PPC has more choice information (p<0.05) and more III than AC\n(p<10\u22126, 2-sample t-test).\n\nnot depend on the use of a speci\ufb01c decoder to calculate intersection information. Importantly, the\nnew information theoretic approach also allowed us to quantify the proportion of sensory information\nin a neural code that is read out downstream for behavior, and thus to obtain the novel conclusion that\nonly spike timing is read out with high ef\ufb01ciency.\n\n5 Application of intersection information to discover brain areas\n\ntransforming sensory information into choice\n\nOur intersection information measure III(S; R; C) can also be used as a metric to discover and index\nbrain areas that perform the key computations needed for perceptual discrimination, and thus turn\nsensory information into choice. Suppose for example that we are investigating this issue by recording\nfrom populations of neurons in different areas. If we rank the neural activities in the recorded areas\naccording to the sensory information they carry, we will \ufb01nd that primary sensory areas are ranked\nhighly. Instead, if we rank the areas according to the choice information they carry, the areas encoding\nthe motor output will be ranked highly. However, associative areas that transform sensory information\ninto choice will not be found by any of these two traditional sensory-only and choice-only rankings,\nand there is no currently established metric to quantitatively identify such areas. Here we argue that\nIII(S; R; C) can be used as such metric.\nTo illustrate this possible use of III(S; R; C), we analyzed the activity of populations of single neurons\nrecorded in mice with two-photon calcium imaging either in Auditory Cortex (AC, n=329 neurons) or\nin Posterior Parietal Cortex (PPC, n=384 neurons) while the mice were performing a sound location\ndiscrimination task and had to report the perceived sound location (left vs right) by the direction\nof their turn in a virtual-reality navigation setup (Fig.4c; full experimental details are available in\nRef.[33]). AC is a primary sensory area, whereas PPC is an association area that has been described\nas a multisensory-motor interface [35, 36, 37], was shown to be essential for virtual-navigation tasks\n[36], and is implicated in the spatial processing of auditory stimuli [38, 39].\nWhen applying our information theoretic formalism to these data, we found that similar stimu-\nlus (sound location) information was carried by the \ufb01ring rate of neurons in AC and PPC (AC:\n(10 \u00b1 3) \u00d7 10\u22123 bit, PPC: (5 \u00b1 1) \u00d7 10\u22123 bit, p=0.17, two-sample t-test). Cells in PPC carried\n\n8\n\n00.0050.010.0150.02RoughSmoothTimeInst. rateInst. rateTimeRate templateTiming templateXLeftRightPPCAC(a)(b)(c)(d)***ACPPCRateTiming*****Information (bits)Information (bits)StimulusChoiceIntersection00.0040.0080.012StimulusChoiceIntersection*\fmore choice information than cells in AC (AC: (2.8 \u00b1 1.4) \u00d7 10\u22123 bit, PPC: (6.4 \u00b1 1.2) \u00d7 10\u22123 bit,\np<0.05, two-sample t-test). However, neurons in PPC had values of III ((3.6 \u00b1 0.8) \u00d7 10\u22123 bit)\nhigher (p<10\u22126, two-sample t-test) than those of AC ((2.3 \u00b1 0.8) \u00d7 10\u22123 bit): this suggests that the\nsensory information in PPC, though similar to that of AC, is turned into behavior into a much larger\nproportion (Figure 4d). Indeed, the ratio between III(S; R; C) and sensory information was higher\nin PPC than in AC (AC: (24 \u00b1 11) %, PPC: (73 \u00b1 24) %, p<0.03, one-tailed z-test). This \ufb01nding\nre\ufb02ects the associative nature of PPC as a sensory-motor interface. This result highlights the potential\nusefulness of III(S; R; C) as an important metric for the analysis of neuro-imaging experiments and\nthe quantitative individuation of areas transforming sensory information into choice.\n\n6 Discussion\n\nHere, we derived a novel information theoretic measure III(S; R; C) of the behavioral impact of\nthe sensory information carried by the neural activity features R during perceptual discrimination\ntasks. The problem of understanding whether the sensory information in the recorded neural features\nreally contributes to behavior is hotly debated in neuroscience [16, 17, 30]. As a consequence, a\nlot of efforts are being devoted to formulate advanced analytical tools to investigate this question\n[17, 40, 41]. A traditional and fruitful approach has been to compute the correlation between trial-\naveraged behavioral performance and trial-averaged stimulus decoding when presenting stimuli of\nincreasing complexity [13, 14, 18]. However, this measure does not capture the relationship between\n\ufb02uctuations of neural sensory information and behavioral choice in the same experimental trial. To\ncapture this single-trial relationship, Ref.[17] proposed to use a speci\ufb01c stimulus decoding algorithm\nto classify trials that give accurate sensory information, and then quantify the increase in behavioral\nperformance in the trials where the sensory decoding is correct. However, this approach makes strong\nassumptions about the decoding mechanism, which may or may not be neurally plausible, and does\nnot make use of the full structure of the trivariate S, R, C dependencies.\nIn this work, we solved all the problems described above by extending the recent Partial Information\nDecomposition framework [19, 20] for the analysis of trivariate dependencies to identify III(S; R; C)\nas a part of the redundant information about C shared between S and R that is also a part of the\nsensory information I(S : R). This quantity satis\ufb01es several essential properties of a measure of\nintersection information between the sensory coding s \u2192 r and the consequent behavioral readout\nr \u2192 c, that we derived from the conceptual notions elaborated in Ref.[17]. Our measure III(S; R; C)\nprovides a single-trial quanti\ufb01cation of how much sensory information is used for behavior. This\nquanti\ufb01cation refers to the absolute physical scale of bit units, and thus enables a direct comparison of\ndifferent candidate neural codes for the analyzed task. Furthermore, our measure has the advantages\nof information-theoretical approaches, that capture all statistical dependencies between the recorded\nquantities irrespective of their relevance to neural function, as well as of model-based approaches, that\nlink directly empirical data with speci\ufb01c theoretical hypotheses about sensory coding and behavioral\nreadout but depend strongly on their underlying assumptions (see e.g. [12]).\nAn important direction for future expansions of this work will be to combine III(S; R; C) with\ninterventional tools on neural activity, such as optogenetics. Indeed, the novel statistical tools in this\nwork cannot distinguish whether the measured value of intersection information III(S; R; C) derives\nfrom the causal involvement of R in transmitting sensory information for behavior, or whether R\nonly correlates with causal information-transmitting areas [17].\nMore generally, this work can help us mapping information \ufb02ow and not only information represen-\ntation. We have shown above how computing III(S; R; C) separates the sensory information that\nis transmitted downstream to affect the behavioral output from the rest of the sensory information\nthat is not transmitted. Further, another interesting application of III arises if we replace the \ufb01nal\nchoice C with other nodes of the brain networks, and compute with III(S; R1; R2) the part of the\nsensory information in R1 that is transmitted to R2. Even more generally, besides the analysis of\nneural information processing, our measure III can be used in the framework of network information\ntheory: suppose that an input X = (X1, X2) (with X1 \u22a5\u22a5 X2) is encoded by 2 different parallel\nchannels R1, R2, which are then decoded to produce collectively an output Y . Suppose further\nthat experimental measurements in single trials can only determine the value of X, Y , and R1,\nwhile the values of X1, X2, Y1, Y2, R2 are experimentally unaccessible. As we show in Supp. Fig.\n3, III(X; R1; Y ) allows us to quantify the information between X and Y that passes through the\nchannel R1, and thus does not pass through the channel R2.\n\n9\n\n\f7 Acknowledgements and author contributions\n\nGP was supported by a Seal of Excellence Fellowship CONISC. SP was supported by Fondation\nBertarelli. CDH was supported by grants from the NIH (MH107620 and NS089521). CDH is a New\nYork Stem Cell Foundation Robertson Neuroscience Investigator. TF was supported by the grants\nERC (NEURO-PATTERNS) and NIH (1U01NS090576-01). CK was supported by the European\nResearch Council (ERC-2014-CoG; grant No 646657).\nAuthor contributions: SP, GP and EP conceived the project; GP and EP performed the project; CAR,\nMED and CDH provided experimental data; GP, EP, HS, CK, SP and TF provided materials and\nanalysis methods; GP, EP and SP wrote the paper; all authors commented on the manuscript; SP\nsupervised the project.\n\nReferences\n[1] W. Bialek, F. Rieke, R.R. de Ruyter van Steveninck, and D. Warland. Reading a neural code. Science,\n\n252(5014):1854\u20131857, 1991.\n\n[2] A. Borst and F.E. Theunissen. Information theory and neural coding. Nat. Neurosci., 2(11):947\u2013957, 1999.\n\n[3] R. Quian Quiroga and S. Panzeri. Extracting information from neuronal populations: information theory\n\nand decoding approaches. Nat. Rev. Neurosci., 10(3):173\u2013185, 2009.\n\n[4] D. V. Buonomano and W. Maass. State-dependent computations: spatiotemporal processing in cortical\n\nnetworks. Nat. Rev. Neurosci., 10:113\u2013125, 2009.\n\n[5] M. A. Harvey, H. P. Saal, J. F. III Dammann, and S. J. Bensmaia. Multiplexing stimulus information\nthrough rate and temporal codes in primate somatosensory cortex. PLOS Biology, 11(5):e1001558, 2013.\n\n[6] C. Kayser, M. A. Montemurro, N. K. Logothetis, and S. Panzeri. Spike-phase coding boosts and stabilizes\n\ninformation carried by spatial and temporal spike patterns. Neuron, 61(4):597\u2013608, 2009.\n\n[7] A. Luczak, B. L. McNaughton, and K. D. Harris. Packet-based communication in the cortex. Nat. Rev.\n\nNeurosci., 16(12):745\u2013755, 2015.\n\n[8] S. Panzeri, N. Brunel, N. K. Logothetis, and C. Kayser. Sensory neural codes using multiplexed temporal\n\nscales. Trends Neurosci., 33(3):111\u2013120, 2010.\n\n[9] M. Shamir. Emerging principles of population coding: in search for the neural code. Curr. Opin. Neurobiol.,\n\n25:140\u2013148, 2014.\n\n[10] S. Panzeri, J.H. Macke, J. Gross, and C. Kayser. Neural population coding: combining insights from\n\nmicroscopic and mass signals. Trends Cogn. Sci., 19(3):162\u2013172, 2015.\n\n[11] K. H. Britten, W. T. Newsome, M. N. Shadlen, S. Celebrini, and J. A. Movshon. A relationship between\nbehavioral choice and the visual responses of neurons in macaque MT. Vis. Neurosci., 13:87\u2013100, 1996.\n\n[12] R. M. Haefner, S. Gerwinn, J. H. Macke, and M. Bethge. Inferring decoding strategies from choice\n\nprobabilities in the presence of correlated variability. Nat. Neurosci., 16:235\u2013242, 2013.\n\n[13] W. T. Newsome, K. H. Britten, and J. A. Movshon. Neuronal correlates of a perceptual decision. Nature,\n\n341(6237):52\u201354, 1989.\n\n[14] C. T. Engineer, C. A. Perez, Y. H. Chen, R. S. Carraway, A. C. Reed, J. A. Shetake, V. Jakkamsetti, K. Q.\nChang, and M. P. Kilgard. Cortical activity patterns predict speech discrimination ability. Nat. Neurosci.,\n11:603\u2013608, 2008.\n\n[15] A. L. Jacobs, G. Fridman, R. M. Douglas, N. M. Alam, P. E. Latham, G. T. Prusky, and S. Nirenberg.\n\nRuling out and ruling in neural codes. Proc. Natl. Acad. Sci. U.S.A., 106(14):5936\u20135941, 2009.\n\n[16] R. Luna, A. Hernandez, C. D. Brody, and R. Romo. Neural codes for perceptual discrimination in primary\n\nsomatosensory cortex. Nat. Neurosci., 8(9):1210\u20131219, 2005.\n\n[17] S. Panzeri, C. D. Harvey, E. Piasini, P. E. Latham, and T. Fellin. Cracking the Neural Code for Sensory\n\nPerception by Combining Statistics, Intervention, and Behavior. Neuron, 93(3):491\u2013507, 2017.\n\n[18] R. Romo and E. Salinas. Flutter discrimination: neural codes, perception, memory and decision making.\n\nNat. Rev. Neurosci., 4(3):203\u2013218, 2003.\n\n10\n\n\f[19] P. Williams and R. Beer. Nonnegative decomposition of multivariate information. arXiv:1004.2515, 2010.\n\n[20] N. Bertschinger, J. Rauh, E. Olbrich, J. Jost, and N. Ay. Quantifying unique information. Entropy,\n\n16(4):2161\u20132183, 2014.\n\n[21] G. Pica, E. Piasini, D. Chicharro, and S. Panzeri. Invariant components of synergy, redundancy, and unique\n\ninformation among three variables. Entropy, 19(9):451, 2017.\n\n[22] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(3):379\u2013423,\n\n1948.\n\n[23] M. Harder, C. Salge, and D. Polani. Bivariate measure of redundant information. Phys. Rev. E, 87(1):012130,\n\n2013.\n\n[24] V. Grif\ufb01th and C. Koch. Quantifying synergistic mutual information.\n\nInception, pages 159\u2013190. Springer Berlin Heidelberg, 2014.\n\nIn Guided Self-Organization:\n\n[25] A. Barrett. Exploration of synergistic and redundant information sharing in static and dynamical gaussian\n\nsystems. Phys. Rev. E, 91(5):052802, 2015.\n\n[26] D. Chicharro. Quantifying multivariate redundancy with maximum entropy decompositions of mutual\n\ninformation. arXiv:1708.03845, 2017.\n\n[27] N. Bertschinger, J. Rauh, E. Olbrich, and J. Jost. Shared information \u2013 new insights and problems in\ndecomposing information in complex systems. In Proceedings of the ECCS 2012, Brussels, Belgium,\n2012.\n\n[28] S. P. Strong, R. Koberle, R. R. de Ruyter van Steveninck, and W. Bialek. Entropy and information in neural\n\nspike trains. Phys. Rev. Lett., 80:197\u2013200, 1998.\n\n[29] J. D. Victor and S. Nirenberg. Indices for testing neural codes. Neural Comput., 20(12):2895\u20132936, 2008.\n\n[30] D. H. O\u2019 Connor, S. A. Hires, Z. V. Guo, N. Li, J. Yu, Q.-Q. Sun, D. Huber, and K. Svoboda. Neural\ncoding during active somatosensation revealed using illusory touch. Nat. Neurosci., 16(7):958\u2013965, 2013.\n\n[31] S. Panzeri, R. A. A. Ince, M. E. Diamond, and C. Kayser. Reading spike timing without a clock: intrinsic\n\ndecoding of spike trains. Phil. Trans. R. Soc. Lond., B, Biol. Sci., 369(1637):20120467, 2014.\n\n[32] Y. Zuo, H. Safaai, G. Notaro, A. Mazzoni, S. Panzeri, and M. E. Diamond. Complementary contributions\nof spike timing and spike rate to perceptual decisions in rat S1 and S2 cortex. Curr. Biol., 25(3):357\u2013363,\n2015.\n\n[33] C. A. Runyan, E. Piasini, S. Panzeri, and C. D. Harvey. Distinct timescales of population coding across\n\ncortex. Nature, 548:92\u201396, 2017.\n\n[34] M. N. Shadlen and W. T. Newsome. The variable discharge of cortical neurons: Implications for connectiv-\n\nity, computation, and information coding. J. Neurosci., 18(10):3870\u20133896, 1998.\n\n[35] J. I. Gold and M. N. Shadlen. The neural basis of decision making. Annu. Rev. Neurosci., 30(1):535\u2013574,\n\n2007.\n\n[36] C. D. Harvey, P. Coen, and D. W. Tank. Choice-speci\ufb01c sequences in parietal cortex during a virtual-\n\nnavigation decision task. Nature, 484(7392):62\u201368, 2012.\n\n[37] D. Raposo, M. T. Kaufman, and A. K. Churchland. A category-free neural population supports evolving\n\ndemands during decision-making. Nat. Neurosci., 17(12):1784\u20131792, 2014.\n\n[38] K. Nakamura. Auditory spatial discriminatory and mnemonic neurons in rat posterior parietal cortex. J.\n\nNeurophysiol., 82(5):2503, 1999.\n\n[39] J. P. Rauschecker and B. Tian. Mechanisms and streams for processing of \"what\" and \"where\" in auditory\n\ncortex. Proc. Natl. Acad. Sci. U.S.A., 97(22):11800\u201311806, 2000.\n\n[40] R. Rossi-Pool, E. Salinas, A. Zainos, M. Alvarez, J. Vergara, N. Parga, and R. Romo. Emergence of an\nabstract categorical code enabling the discrimination of temporally structured tactile stimuli. Proc. Natl.\nAcad. Sci. U.S.A., 113(49):E7966\u2013E7975, 2016.\n\n[41] X. Pitkow, S. Liu, D. E. Angelaki, G. C. DeAngelis, and A. Pouget. How can single sensory neurons\n\npredict behavior? Neuron, 87(2):411\u2013423, 2015.\n\n11\n\n\f", "award": [], "sourceid": 2051, "authors": [{"given_name": "Giuseppe", "family_name": "Pica", "institution": "Istituto Italiano di Tecnologia"}, {"given_name": "Eugenio", "family_name": "Piasini", "institution": "Istituto Italiano di Tecnologia"}, {"given_name": "Houman", "family_name": "Safaai", "institution": "Harvard Medical School"}, {"given_name": "Caroline", "family_name": "Runyan", "institution": "University of Pittsburgh"}, {"given_name": "Christopher", "family_name": "Harvey", "institution": "Harvard Medical School"}, {"given_name": "Mathew", "family_name": "Diamond", "institution": "International School for Advanced Studies"}, {"given_name": "Christoph", "family_name": "Kayser", "institution": "University of Glasgow"}, {"given_name": "Tommaso", "family_name": "Fellin", "institution": "Istituto Italiano di Tecnologia"}, {"given_name": "Stefano", "family_name": "Panzeri", "institution": "Istituto Italiano di Tecnologia"}]}