{"title": "A general framework for investigating how far the decoding process in the brain can be simplified", "book": "Advances in Neural Information Processing Systems", "page_first": 1225, "page_last": 1232, "abstract": "``How is information decoded in the brain?'' is one of the most difficult and important questions in neuroscience. Whether neural correlation is important or not in decoding neural activities is of special interest. We have developed a general framework for investigating how far the decoding process in the brain can be simplified. First, we hierarchically construct simplified probabilistic models of neural responses that ignore more than $K$th-order correlations by using a maximum entropy principle. Then, we compute how much information is lost when information is decoded using the simplified models, i.e., ``mismatched decoders''. We introduce an information theoretically correct quantity for evaluating the information obtained by mismatched decoders. We applied our proposed framework to spike data for vertebrate retina. We used 100-ms natural movies as stimuli and computed the information contained in neural activities about these movies. We found that the information loss is negligibly small in population activities of ganglion cells even if all orders of correlation are ignored in decoding. We also found that if we assume stationarity for long durations in the information analysis of dynamically changing stimuli like natural movies, pseudo correlations seem to carry a large portion of the information.", "full_text": "A general framework for investigating how far the\n\ndecoding process in the brain can be simpli\ufb01ed\n\nMasafumi Oizumi1, Toshiyuki Ishii2, Kazuya Ishibashi1\n\nToshihiko Hosoya2, Masato Okada1,2\noizumi@mns.k.u-tokyo.ac.jp\n\ntishii@brain.riken.jp,kazuya@mns.k.u-tokyo.ac.jp\n\nhosoya@brain.riken.jp, okada@k.u-tokyo.ac.jp\n\n1 University of Tokyo, Kashiwa-shi, Chiba, JAPAN\n\n2 RIKEN Brain Science Institute, Wako-shi, Saitama, JAPAN\n\nAbstract\n\n\u201cHow is information decoded in the brain?\u201d is one of the most dif\ufb01cult and im-\nportant questions in neuroscience. Whether neural correlation is important or not\nin decoding neural activities is of special interest. We have developed a general\nframework for investigating how far the decoding process in the brain can be sim-\npli\ufb01ed. First, we hierarchically construct simpli\ufb01ed probabilistic models of neu-\nral responses that ignore more than Kth-order correlations by using a maximum\nentropy principle. Then, we compute how much information is lost when infor-\nmation is decoded using the simpli\ufb01ed models, i.e., \u201cmismatched decoders\u201d. We\nintroduce an information theoretically correct quantity for evaluating the informa-\ntion obtained by mismatched decoders. We applied our proposed framework to\nspike data for vertebrate retina. We used 100-ms natural movies as stimuli and\ncomputed the information contained in neural activities about these movies. We\nfound that the information loss is negligibly small in population activities of gan-\nglion cells even if all orders of correlation are ignored in decoding. We also found\nthat if we assume stationarity for long durations in the information analysis of dy-\nnamically changing stimuli like natural movies, pseudo correlations seem to carry\na large portion of the information.\n\n1 Introduction\n\nAn ultimate goal of neuroscience is to elucidate how information is encoded and decoded by neural\nactivities. To investigate what information is encoded by neurons in certain area of the brain, the\nmutual information between stimuli and neural responses is often calculated.\nIn the analysis of\nmutual information, it is implicitly assumed that encoded information is decoded by an optimal\ndecoder, which exactly matches the encoder.\nIn other words, the brain is assumed to have full\nknowledge of the encoding process. Generally, if the neural activities are correlated, the amount of\ndata needed for the optimal decoding scales exponentially with the number of neurons. Since a large\namount of data and many complex computations are needed for optimal decoding, the assumption\nof an optimal decoder in the brain is doubtful.\n\nThe reason mutual information is widely used in neuroscience despite the doubtfulness of the opti-\nmal decoder is that we are completely ignorant of how information is decoded in the brain. Thus,\nwe simply evaluate the maximal amount of information that can be extracted from neural activities\nby calculating the mutual information. To address this lack of knowledge, we can ask a different\nquestion: \u201cHow much information can be obtained by a decoder that has partial knowledge of the\nencoding process?\u201d [10, 14] We call this type of a decoder \u201csimpli\ufb01ed decoder\u201d or a \u201cmismatched\ndecoder\u201d. For example, an independent decoder is a simpli\ufb01ed decoder; it takes only the marginal\n\n1\n\n\fdistribution of the neural responses into consideration and ignores the correlations between neuronal\nactivities. The independent decoder is of particular importance because several studies have shown\nthat maximum likelihood estimation can be implemented by a biologically plausible network [2, 4].\nIf it is experimentally shown that a suf\ufb01ciently large portion of information is obtained by the in-\ndependent decoder, we can say that the brain may function in a manner similar to the independent\ndecoder. In this context, Nirenberg et al. computed the amount of information obtained by the in-\ndependent decoder in pairs of retinal ganglion cells activities [10]. They showed that no pair of\ncells showed a loss of information greater than 11%. Because only pairs of cells were considered\nin their analysis, it has not been still elucidated whether correlations are not important in population\nactivities.\n\nTo elucidate whether correlations are important or not in population activities, we have developed\na general framework for investigating the importance of correlation in decoding neural activities.\nWhen population activities are analyzed, we have to deal with not only second-order correlations\nbut also higher-order correlations in general. Therefore, we need to hierarchically construct simpli-\n\ufb01ed decoders that account of up to Kth-order correlations, where K = 1, 2, ..., N. By computing\nhow much information is obtained by the simpli\ufb01ed decoders, we investigate how many orders of\ncorrelation should be taken into account to extract enough information. To compute the information\nobtained by the mismatched decoders, we introduce a information theoretically correct quantity de-\nrived by Merhav et al. [8]. Information for mismatched decoders previously proposed by Nirenberg\nand Latham is the lower bound on the correct information [5, 11]. Because this lower bound can be\nvery loose and their proposed information can be negative when many cells are analyzed as is shown\nin the paper, we need to accurately evaluate the information obtained by mismatched decoders.\n\nThe plan of the paper is as follows. In Section 2, we describe a way of computing the information\nthat can be extracted from neural activities by mismatched decoders using the information derived\nby Merhav et al.. Using analytical computation, we demonstrate how information for mismatched\ndecoders previously proposed by Nirenberg and Latham differs from the correct information derived\nby Merhav et al., especially when many cells are analyzed. In Section 3, we apply our framework to\nspike data for ganglion cells in the salamander retina. We \ufb01rst describe the method of hierarchically\nconstructing simpli\ufb01ed decoders by using the maximum entropy principle [12]. We then compute the\ninformation obtained with the simpli\ufb01ed decoders. We \ufb01nd that more than 90% of the information\ncan be extracted from the population activities of ganglion cells even if all orders of correlations\nare ignored in decoding. We also describe the problem of previous studies [10, 12] in which the\nstationarity of stimuli is assumed for a duration that is too long. Using a toy model, we demonstrate\nthat pseudo correlations seem to carry a large portion of the information because of the stationarity\nassumption.\n\n2 Information for mismatched decoders\n\nLet us consider how much information about stimuli can be extracted from neural responses. We\nassume that we experimentally obtain the conditional probability distribution p(r|s) that neural re-\nsponses r are evoked by stimulus s. We can say that the stimulus is encoded by neural response r,\nwhich obeys the distribution p(r|s). We call p(r|s) the \u201cencoding model\u201d. The maximal amount of\ninformation obtained with the optimal decoder can be evaluated by using the mutual information:\n\nI = \u2212Z drp(r) log2 p(r) +Z drXs\n\np(s)p(r|s) log2 p(r|s),\n\n(1)\n\nwhere p(r) =Ps p(r|s)p(s) and p(s) is the prior probability of stimuli. In the optimal decoder, the\n\nprobability distribution q(r|s) that exactly matches the encoding model p(r|s) is used for decoding;\nthat is, q(r|s) = p(r|s). We call q(r|s) the \u201cdecoding model\u201d. We can also compute the maximal\namount of information obtained by a decoder using a decoding model q(r|s) that does not match the\nencoding model p(r|s) by using an equation derived by Merhav et al. [8]:\n\nI \u2217(\u03b2) = \u2212Z drp(r) log2Xs\n\np(s)q(r|s)\u03b2 +Z drXs\n\np(s)p(r|s) log2 q(r|s)\u03b2,\n\n(2)\n\nwhere \u03b2 takes the value that maximizes I \u2217(\u03b2). Thus, \u03b2 is the value that satis\ufb01es \u2202I \u2217/\u2202\u03b2 = 0. We\ncall a decoder using the mismatched decoding model a \u201cmismatched decoder\u201d.\n\n2\n\n\f(cid:14976)(cid:15000)\n\n(cid:15040)\n\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15045)\n(cid:15032)\n(cid:15051)\n(cid:15033)\n(cid:15046)\n(cid:14967)\n(cid:15045)\n(cid:15046)\n(cid:15040)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15049)\n(cid:15046)\n(cid:15037)\n(cid:15045)\n(cid:15040)\n(cid:14967)\n(cid:15037)\n(cid:15046)\n(cid:14967)\n(cid:15051)\n(cid:15045)\n(cid:15052)\n(cid:15046)\n(cid:15044)\n(cid:15000)\n\n(cid:14972)\n(cid:14975)\n(cid:14967)\n(cid:15049)\n(cid:15036)\n(cid:15035)\n(cid:15046)\n(cid:15034)\n(cid:15036)\n(cid:15035)\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15039)\n(cid:15034)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15050)\n(cid:15044)\n(cid:14967)\n(cid:15056)\n(cid:15033)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n\n(cid:15040)\n\n(cid:19)\n\n(cid:18)(cid:16)(cid:23)\n\n(cid:18)\n\n(cid:770)(cid:18)(cid:16)(cid:23)\n\n(cid:770)(cid:19)\n\n(cid:20)\n\n(cid:15001)\n\n(cid:15008)(cid:14977)(cid:14982)(cid:15008)\n(cid:15008)(cid:15013)(cid:15011)(cid:14982)(cid:15008)\n\n(cid:19)(cid:23)(cid:18)\n\n(cid:15040)\n\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15045)\n(cid:15032)\n(cid:15051)\n(cid:15033)\n(cid:15046)\n(cid:14967)\n(cid:15045)\n(cid:15046)\n(cid:15040)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15049)\n(cid:15046)\n(cid:15037)\n(cid:15045)\n(cid:15040)\n(cid:14967)\n(cid:15037)\n(cid:15046)\n(cid:14967)\n(cid:15051)\n(cid:15045)\n(cid:15052)\n(cid:15046)\n(cid:15044)\n(cid:15000)\n\n(cid:14976)\n(cid:14972)\n(cid:14975)\n(cid:14967)\n(cid:15049)\n(cid:15036)\n(cid:15035)\n(cid:15046)\n(cid:15034)\n(cid:15036)\n(cid:15035)\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15039)\n(cid:15034)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15050)\n(cid:15044)\n(cid:14967)\n(cid:15056)\n(cid:15033)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n\n(cid:15040)\n\n(cid:19)\n\n(cid:18)(cid:16)(cid:27)\n\n(cid:18)(cid:16)(cid:26)\n\n(cid:18)(cid:16)(cid:25)\n\n(cid:18)(cid:16)(cid:24)\n\n(cid:18)(cid:16)(cid:23)\n(cid:20)\n\n(cid:21)\n\n(cid:15008)(cid:14977)\n(cid:14984)(cid:14982)(cid:15008)\n(cid:15008)(cid:15013)(cid:15011)(cid:14982)(cid:15008)\n\n(cid:25)\n\n(cid:22)\n\n(cid:24)\n(cid:15013)(cid:15052)(cid:15044)(cid:15033)(cid:15036)(cid:15049)(cid:14967)(cid:15046)(cid:15037)(cid:14967)(cid:15034)(cid:15036)(cid:15043)(cid:15043)(cid:15050)\n\n(cid:23)\n\n(cid:23)(cid:18)\n\n(cid:19)(cid:18)(cid:18)\n\n(cid:15013)(cid:15052)(cid:15044)(cid:15033)(cid:15036)(cid:15049)(cid:14967)(cid:15046)(cid:15037)(cid:14967)(cid:15034)(cid:15036)(cid:15043)(cid:15043)(cid:15050)\n\nFigure 1: Comparison between correct information I \u2217 derived by Merhav et al. and Nirenberg-\nLatham information I N L. A: Difference between I \u2217/I (solid line) and I N L/I (dotted line) in\nGaussian model where correlations and derivatives of mean \ufb01ring rates are uniform. Correlation\n1 /I (dotted line) when spike\nparameter c = 0.01. B: Difference between I \u2217\ndata in Figure 3A are used. For this spike data and other spike data analyzed, Nirenberg-Latham\ninformation provides a tight lower bound on the correct information, possibly because the number\nof cells is small.\n\n1 /I (solid line) and I N L\n\nPreviously, Nirenberg and Latham proposed that the information obtained by mismatched decoders\ncan be evaluated by using [11]\n\nI N L = \u2212Z drp(r) log2Xs\n\np(s)q(r|s) +Z drXs\n\np(s)p(r|s) log2 q(r|s).\n\n(3)\n\nWe call their proposed information \u201cNirenberg-Latham information\u201d. If we set \u03b2 = 1 in Eq. 2,\nwe obtain Nirenberg-Latham information, I \u2217(1) = I N L. Thus, Nirenberg-Latham information\ndoes not give correct information; instead, it simply provides the lower bound on the correct infor-\nmation, I \u2217(\u03b2), which is the maximum value with respect to \u03b2 [5, 8]. The lower bound provided\nby Nirenberg-Latham information can be very loose and the Nirenberg-Latham information can be\nnegative when many cells are analyzed.\n\nTheoretical evaluation of information I, I \u2217, and I N L\n\nWe consider the problem where mutual information is computed when stimulus s, which is a single\nvariable, and slightly different stimulus s + \u2206s are presented. We assume the prior probability of\nstimuli, p(s) and p(s + \u2206s), are equal: p(s) = p(s + \u2206s) = 1/2. Neural responses evoked by the\nstimuli are denoted by r, which is considered here to be the neuron \ufb01ring rate. When the difference\nbetween two stimuli is small, the conditional probability p(r|s + \u2206s) can be expanded with respect\nto \u2206s as p(r|s+\u2206s) = p(r|s)+p\u2032(r|s)\u2206s+ 1\n2 p\u2032\u2032(r|s)(\u2206s)2 +..., where \u2032 represents differentiation\nwith respect to s. Using the expansion, to leading order of \u2206s, we can write mutual information I\nas\n\nI =\n\n\u2206s2\n\n8 Z dr\n\n(p\u2032(r|s))2\n\np(r|s)\n\n,\n\n(4)\n\nwhereR dr p\u2032(r|s)2\n\nis the Fisher information. Thus, we can see that the mutual information is pro-\nportional to the Fisher information when \u2206s is small. Similarly, the correct information I \u2217 for the\nmismatched decoders and the Nirenberg-Latham information I N L can be written as\n\np(r|s)\n\nI \u2217 =\n\nI N L =\n\np(r|s)(q\u2032(r|s))2\n\nq(r|s)2\n\n,\n\n\u00b6\u22121\nq(r|s) ! .\n\np\u2032(r|s)q(r|s)\n\n\u2206s2\n\n\u2206s2\n\nq(r|s)\n\np\u2032(r|s)q\u2032(r|s)\n\n\u00b62\u00b5Z dr\n8 \u00b5Z dr\n8 \u00c3\u2212Z drp(r|s)\u00b5 q\u2032(r|s)\nq(r|s)\u00b62\n\u00b42\u00b3R dr p(r|s)(q\u2032(r|s))2\n\nq(r|s)2\n\n+ 2Z dr\n\u00b4\u22121\n\nTaking into consideration the proportionality of the mutual information to the Fisher information, we\nin Eq. 5 is a Fisher information-like\n\ncan interpret that\u00b3R dr p\u2032(r|s)q\u2032(r|s)\n\nquantity for mismatched decoders.\n\nq(r|s)\n\n(5)\n\n(6)\n\n3\n\n\fLet us consider the case in which the encoding model p(r|s) obeys the Gaussian distribution\n\np(r|s) =\n\n1\nZ\n\nexp\u00b5\u2212\n\n1\n2\n\n(r \u2212 f (s))T C\u22121(r \u2212 f (s))\u00b6 ,\n\n(7)\n\nwhere T stands for the transpose operation, f (s) is the mean \ufb01ring rates given stimulus s, and C is\nthe covariance matrix. We consider an independent decoding model q(r|s) that ignores correlations:\n\nq(r|s) =\n\n1\nZD\n\nexp\u00b5\u2212\n\n1\n2\n\n(r \u2212 f (s))T C\u22121\n\nD (r \u2212 f (s))\u00b6 ,\n\nwhere CD is the diagonal covariance matrix obtained by setting the off-diagonal elements of C to\n0. If the Gaussian integral is performed for Eqs. 4-5, I, I \u2217, and I N L can be written as\n\n(8)\n\n(9)\n\n(10)\n\n(11)\n\nI =\n\nI \u2217 =\n\n\u2206s2\n\n8\n\n\u2206s2\n\n8\n\nf \u2032T (s)C\u22121f \u2032(s),\n\n(f \u2032T (s)C\u22121\nD\nCC\u22121\nD\n\nf \u2032T (s)C\u22121\nD\n\nf \u2032(s))2\n\nf \u2032(s)\n\n,\n\n\u2206s2\n\nI N L =\n\n8 \u00a1\u2212f \u2032T (s)C\u22121\n\nD\n\nCC\u22121\nD\n\nf \u2032(s) + 2f \u2032T (s)C\u22121\nD\n\nf \u2032(s)\u00a2 .\n\nThe correct information obtained by the independent decoder for the Gaussian model (Eq. 10) is\ninversely proportional to the decoding error of s when the independent decoder is applied, which\nwas computed from the generalized Cram\u00b4er Rao bound by Wu et al. [14].\nAs a simple example, we consider a uniform correlation model [1, 14] in which covariance matrix C\nis given by Cij = \u03c32[\u03b4ij + c(1 \u2212 \u03b4ij)] and assume that the derivatives of the \ufb01ring rates are uniform:\nthat is f \u2032\n\ni = f \u2032. In this case, I, I \u2217, and I N L can be computed using\n\nI =\n\nI \u2217 =\n\nI N L =\n\n\u2206s2\n\nN f \u20322\n\n8\n\n\u03c32(N c + 1 \u2212 c)\n\n,\n\n\u2206s2\n\n8\n\n\u2206s2\n\n8\n\nN f \u20322\n\n,\n\n\u03c32(N c + 1 \u2212 c)\n(\u2212c(N \u2212 1) + 1)N f \u20322\n\n\u03c32\n\n(12)\n\n(13)\n\n(14)\n\n,\n\nwhere N is the number of cells. We can see that I \u2217 is equal to I, which means that information is\nnot lost even if correlation is ignored in the decoding process. Figure 1A shows I N L/I and I \u2217/I\nwhen the degree of correlation c is 0.01. As shown in Figure 1A, the difference between the correct\ninformation I \u2217 and Nirenberg-Latham information I N L is very large when the number of cells N is\nlarge. When N > c+1\nc , I N L is negative. Analysis showed that using Nirenberg-Latham information\nI N L as a lower bound on the correct information I \u2217 can lead to wrong conclusions, especially when\nmany cells are analyzed.\n\n3 Analysis of information in population activities of ganglion cells\n\n3.1 Methods\n\nWe analyzed the data obtained when N = 7 retinal ganglion cells were simultaneously recorded\nusing a multielectrode array. The stimulus was a natural movie, which was 200 s long and repeated\n45 times. We divided the movie into many short natural movies and considered them as stimuli over\nwhich information contained in neural activities is computed. For instance, when it was divided into\n10-s-long natural movies, there were 20 stimuli. Figure 2A shows the response of the seven retinal\nganglion cells to natural movies from 0 to 10 s in length. To apply information theoretic techniques,\nwe \ufb01rst discretized the time into small time bins \u2206\u03c4 and indicated whether a spike was emitted or\nnot in each time bin with a binary variable: \u03c3i = 1 means that the cell i spiked and \u03c3i = 0 means that\nit did not spike. We set the length of the time, \u2206\u03c4 , to 5 ms so that it was short enough to avoid two\nspikes falling into the same bin. In this way, the spike pattern of ganglion cells was transformed into\nan N-letter binary word, \u03c3 = {\u03c31, \u03c32, ..., \u03c3N }, as shown in Figure 2B. Then, we determined the\n\n4\n\n\f(cid:15000)\n\n(cid:15049)\n(cid:15036)\n(cid:15033)\n(cid:15044)\n(cid:15052)\n(cid:15045)\n(cid:14967)\n(cid:15043)\n(cid:15043)\n\n(cid:15036)\n(cid:15002)\n\n(cid:25)\n\n(cid:24)\n\n(cid:23)\n\n(cid:22)\n\n(cid:21)\n\n(cid:20)\n\n(cid:19)\n(cid:18)\n\n(cid:15001)\n\n(cid:15049)\n(cid:15036)\n(cid:15033)\n(cid:15044)\n(cid:15052)\n(cid:15045)\n(cid:14967)\n(cid:15043)\n(cid:15043)\n\n(cid:15036)\n(cid:15002)\n\n(cid:25)\n\n(cid:24)\n\n(cid:23)\n\n(cid:22)\n\n(cid:21)\n\n(cid:20)\n\n(cid:19)\n(cid:18)(cid:16)(cid:27)\n\n(cid:18)(cid:16)(cid:27)(cid:23)\n\n(cid:15019)(cid:15040)(cid:15044)(cid:15036)(cid:14967)(cid:14975)(cid:15050)(cid:14976)\n\n(cid:19)\n\n(cid:23)\n\n(cid:15019)(cid:15040)(cid:15044)(cid:15036)(cid:14967)(cid:14975)(cid:15050)(cid:14976)\n\n(cid:19)(cid:18)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14984)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14984)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14984)\n(cid:14983)\n(cid:14984)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n\n(cid:14983)\n(cid:14983)\n(cid:14984)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14984)\n\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n(cid:14983)\n\nFigure 2: A: Raster plot of seven retinal ganglion cells responding to a natural movie. B: Transfor-\nmation of spike trains into binary words.\n\nfrequency with which a particular spike pattern, \u03c3, was observed during each stimulus and estimated\nthe conditional probability distribution pdata(\u03c3|s) from experimental data. Using these conditional\nprobabilities, we evaluated the information contained in N-letter binary words \u03c3.\nGenerally, the joint probability of N binary variables can be written as [9]\n\npN (\u03c3) =\n\n1\nZ\n\nexp\uf8ee\n\uf8f0Xi\n\n\u03b8i\u03c3i +Xi<j\n\n\u03b8ij\u03c3i\u03c3j + \u00b7 \u00b7 \u00b7 + \u03b812...N \u03c31\u03c32...\u03c3N\uf8f9\n\uf8fb .\n\n(15)\n\nThis type of probability distribution is called a log-linear model. Because the number of parameters\nin a log-linear model is equal to the number of all possible con\ufb01gurations of an N-letter binary word\n\u03c3, we can determine the values of parameters so that the log-linear model pN (\u03c3) exactly matches\nempirical probability distribution pdata(\u03c3): that is, pN (\u03c3) = pdata(\u03c3).\nTo compute the information for mismatched decoders, we construct simpli\ufb01ed models of neural\nresponses that partially match the empirical distribution, pdata(\u03c3). The simplest model is an \u201cinde-\npendent model\u201d p1(\u03c3), where only the average of each \u03c3i agrees with the experimental data: that is,\nh\u03c3iip1(\u03c3) = h\u03c3iipdata(\u03c3). There are many possible probability distributions that satisfy these con-\nstraints. In accordance with the maximum entropy principle [12], we choose the one that maximizes\n\nentropy H, H = \u2212P\u03c3 p1(\u03c3) log p1(\u03c3). The resulting maximum entropy distribution is\n\n(16)\n\np1(\u03c3) =\n\n1\nZ1\n\nexp\"Xi\n\n\u03b8(1)\n\ni \u03c3i# .\n\nin which model parameters \u03b8(1) are determined so that the constraints are satis\ufb01ed. This model\ncorresponds to a log-linear model in which all orders of correlation parameters {\u03b8ij, \u03b8ijk, ..., \u03b812...N }\nare omitted. If we perform maximum likelihood estimation of model parameters \u03b8(1) in the log-\nlinear model, the result is that the average \u03c3i under the log-linear model equals the average \u03c3i\nfound in the data: that is, h\u03c3iip1(\u03c3) = h\u03c3iipdata(\u03c3). This result is identical to the constraints of\nthe maximum entropy model. Generally, the maximum entropy method is equivalent to maximum\nlikelihood \ufb01tting of a log-linear model [6].\nSimilarly, we can consider a \u201csecond-order correlation model\u201d p2(\u03c3), which is consistent with not\nonly the averages of \u03c3i but also the averages of all products \u03c3i\u03c3j found in the data. Maximizing the\nentropy with constraints h\u03c3iip2(\u03c3) = h\u03c3iipdata(\u03c3) and h\u03c3i\u03c3jip2(\u03c3) = h\u03c3i\u03c3jipdata(\u03c3), we obtain\n\np2(\u03c3) =\n\n1\nZ2\n\nexp\uf8ee\n\uf8f0Xi\n\n\u03b8(2)\n\ni \u03c3i +Xi,j\n\n\u03b8(2)\n\nij \u03c3i\u03c3j\uf8f9\n\uf8fb\n\n,\n\n(17)\n\nin which model parameters \u03b8(2) are determined so that the constraints are satis\ufb01ed. The procedure\ndescribed above can also be used to construct a \u201cKth-order correlation model\u201d pK(\u03c3). If we substi-\ntute the simpli\ufb01ed models of neural responses pK(\u03c3|s) into mismatched decoding models q(\u03c3|s) in\n\n5\n\n\f(cid:15000)\n\n(cid:15040)\n\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15045)\n(cid:15032)\n(cid:15051)\n(cid:15033)\n(cid:15046)\n(cid:14967)\n(cid:15045)\n(cid:15046)\n(cid:15040)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15049)\n(cid:15046)\n(cid:15037)\n(cid:15045)\n(cid:15040)\n(cid:14967)\n(cid:15037)\n(cid:15046)\n(cid:14967)\n(cid:15051)\n(cid:15045)\n(cid:15052)\n(cid:15046)\n(cid:15044)\n(cid:15000)\n\n(cid:14976)\n(cid:14972)\n(cid:14975)\n(cid:14967)\n(cid:15049)\n(cid:15036)\n(cid:15035)\n(cid:15046)\n(cid:15034)\n(cid:15036)\n(cid:15035)\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15039)\n(cid:15034)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15050)\n(cid:15044)\n(cid:14967)\n(cid:15056)\n(cid:15033)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n\n(cid:15040)\n\n(cid:19)\n\n(cid:18)(cid:16)(cid:27)\n\n(cid:18)(cid:16)(cid:26)\n\n(cid:18)(cid:16)(cid:25)\n\n(cid:18)(cid:16)(cid:24)\n\n(cid:18)(cid:16)(cid:23)\n(cid:20)\n\n(cid:15001)\n\n(cid:15040)\n\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15045)\n(cid:15032)\n(cid:15051)\n(cid:15033)\n(cid:15046)\n(cid:14967)\n(cid:15045)\n(cid:15046)\n(cid:15040)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15049)\n(cid:15046)\n(cid:15037)\n(cid:15045)\n(cid:15040)\n(cid:14967)\n(cid:15037)\n(cid:15046)\n(cid:14967)\n(cid:15051)\n(cid:15045)\n(cid:15052)\n(cid:15046)\n(cid:15044)\n(cid:15000)\n\n(cid:14976)\n(cid:14972)\n(cid:14975)\n(cid:14967)\n(cid:15049)\n(cid:15036)\n(cid:15035)\n(cid:15046)\n(cid:15034)\n(cid:15036)\n(cid:15035)\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15039)\n(cid:15034)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15050)\n(cid:15044)\n(cid:14967)\n(cid:15056)\n(cid:15033)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n\n(cid:15040)\n\n(cid:19)\n\n(cid:18)(cid:16)(cid:27)\n\n(cid:18)(cid:16)(cid:26)\n\n(cid:18)(cid:16)(cid:25)\n\n(cid:18)(cid:16)(cid:24)\n\n(cid:18)(cid:16)(cid:23)\n(cid:20)\n\n(cid:21)\n\n(cid:15008)(cid:14977)\n(cid:14984)(cid:14982)(cid:15008)\n(cid:15008)(cid:14977)\n(cid:14985)(cid:14982)(cid:15008)\n\n(cid:21)\n\n(cid:22)\n\n(cid:23)\n\n(cid:24)\n\n(cid:25)\n\n(cid:15013)(cid:15052)(cid:15044)(cid:15033)(cid:15036)(cid:15049)(cid:14967)(cid:15046)(cid:15037)(cid:14967)(cid:15034)(cid:15036)(cid:15043)(cid:15043)(cid:15050)\n\n(cid:15008)(cid:14977)\n(cid:14984)(cid:14982)(cid:15008)\n(cid:15008)(cid:14977)\n(cid:14985)(cid:14982)(cid:15008)\n\n(cid:25)\n\n(cid:22)\n\n(cid:24)\n(cid:15013)(cid:15052)(cid:15044)(cid:15033)(cid:15036)(cid:15049)(cid:14967)(cid:15046)(cid:15037)(cid:14967)(cid:15034)(cid:15036)(cid:15043)(cid:15043)(cid:15050)\n\n(cid:23)\n\nFigure 3: Dependence of amount of information obtained by simpli\ufb01ed decoders on number of\nganglion cells analyzed. Same spike data obtained from retinal ganglion cells responding to a natural\nmovie were used to obtain analysis results shown in panels A and B. A: 10-s-long natural movie B:\n100-ms-long natural movie\n\nEq. 2, we can compute the amount of information that can be obtained when more than Kth-order\ncorrelations are ignored in the decoding,\n\nI \u2217\n\nK(\u03b2) = \u2212X\u03c3\n\npN (\u03c3) log2Xs\n\np(s)pK(\u03c3|s)\u03b2 +Xs\n\np(s)X\u03c3\n\npN (\u03c3|s) log2 pK(\u03c3|s)\u03b2.\n\n(18)\n\nBy evaluating the ratio of information, I \u2217\nbe taken into account to extract enough information.\n\nK/I, we can infer how many orders of correlation should\n\n3.2 Results\n\n1 = I \u2217\n\n1 /I and I \u2217\n\n1 /I, and that\nFirst, we investigated how the ratio of information obtained by an independent model, I \u2217\nobtained by a second-order correlation model, I \u2217\n2 /I, changed when the number of cells analyzed was\nchanged. We set the length of the stimulus to 10 s. We could obtain 20 kinds of stimuli from a 200-s-\nlong natural movie (see Methods). In previous studies, comparable length stimuli (7 s for Nirenberg\net al.\u2019s study [10] and 20 s for Schneidman et al.\u2019s study [12]) were used. When two neurons were\nanalyzed, there were 21 possible combinations for choosing 2 cells out of 7 cells, which is the total\nnumber of cells simultaneously recorded. We computed the average value of I \u2217\nK/I for K = 1, 2 over\n2 /I monotonically decreased\nall possible combinations of cells. Figure 3A shows that I \u2217\nwhen the number of cells was increased. A comparison between the correct information, I \u2217\n1 /I, and\n1 /I where I N L\n1 (\u03b2 = 1), is shown in Figure 1B. When\nNirenberg-Latham information, I N L\nonly two cells were considered, I \u2217\n1 /I exceeded 90%, which means that ignoring correlation leads\nto only a small loss of information. This is consistent with the result obtained by Nirenberg et al.\n[10]. However, when all cells (N = 7) were used in the analysis, I \u2217\n1 /I becomes only about 60%.\nThus, correlation seems to be much more important for decoding when population activities are\nconsidered than when only two cells are considered. At least, we can say that qualitatively different\nthings occur when large populations of cells are analyzed, as Schneidman et al. pointed out [12].\nWe should be careful about concluding from the results shown in Figure 3A that correlation is\nimportant for decoding. In this analysis, we considered a 10-s-long stimuli and assumed stationarity\nduring each stimulus. By stationarity we mean that we assumed spikes are generated by a single\nprocess that can be described by a single conditional distribution p(\u03c3|s). Because the natural movies\nchange much more rapidly and our visual system has much higher time resolution than 10 s [13],\nwe also considered shorter stimuli. In Figure 3B, we computed I \u2217\n2 /I over 100-ms-long\nnatural movies. In this case, we could obtain 2000 stimuli from the 200-s-long natural movie. When\nthe length of each stimulus was 100 ms, no spikes occurred while some stimuli were presented. We\nremoved those stimuli and used the remaining stimuli for the analysis. In this case, the amount of\ninformation obtained by independent model I \u2217\n1 was more than 90% even when all cells (N = 7)\nwere considered. Although 100 ms may still be too long to be considered as a single process, the\nresult shown in Figure 3B re\ufb02ects a situation that our brain has to deal with, that is more realistic than\nthat re\ufb02ected in Figure 3A. Figure 4A shows the dependence of information obtained by simpli\ufb01ed\ndecoders on the length of stimulus. In this analysis, we changed the length of the stimulus from 100\nms to 10 s and computed I \u2217\n2 /I for activities of N = 7 cells. We also analyzed additional\nexperimental data obtained when N = 6 retinal ganglion cells were simultaneously recorded from\n\n1 /I and I \u2217\n\n1 /I and I \u2217\n\n6\n\n\f(cid:15000)\n\n(cid:15040)\n\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15045)\n(cid:15032)\n(cid:15051)\n(cid:15033)\n(cid:15046)\n(cid:14967)\n(cid:15045)\n(cid:15046)\n(cid:15040)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15049)\n(cid:15046)\n(cid:15037)\n(cid:15045)\n(cid:15040)\n(cid:14967)\n(cid:15037)\n(cid:15046)\n(cid:14967)\n(cid:15051)\n(cid:15045)\n(cid:15052)\n(cid:15046)\n(cid:15044)\n(cid:15000)\n\n(cid:14976)\n(cid:14972)\n(cid:14975)\n(cid:14967)\n(cid:15049)\n(cid:15036)\n(cid:15035)\n(cid:15046)\n(cid:15034)\n(cid:15036)\n(cid:15035)\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15039)\n(cid:15034)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15050)\n(cid:15044)\n(cid:14967)\n(cid:15056)\n(cid:15033)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n\n(cid:15040)\n\n(cid:19)\n\n(cid:18)(cid:16)(cid:27)\n\n(cid:18)(cid:16)(cid:26)\n\n(cid:18)(cid:16)(cid:25)\n\n(cid:18)(cid:16)(cid:24)\n\n(cid:18)(cid:16)(cid:23)\n(cid:19)(cid:18)\n\n(cid:770)(cid:19)\n\n(cid:15001)\n\n(cid:15040)\n\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15045)\n(cid:15032)\n(cid:15051)\n(cid:15033)\n(cid:15046)\n(cid:14967)\n(cid:15045)\n(cid:15046)\n(cid:15040)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15049)\n(cid:15046)\n(cid:15037)\n(cid:15045)\n(cid:15040)\n(cid:14967)\n(cid:15037)\n(cid:15046)\n(cid:14967)\n(cid:15051)\n(cid:15045)\n(cid:15052)\n(cid:15046)\n(cid:15044)\n(cid:15000)\n\n(cid:14976)\n(cid:14972)\n(cid:14975)\n(cid:14967)\n(cid:15049)\n(cid:15036)\n(cid:15035)\n(cid:15046)\n(cid:15034)\n(cid:15036)\n(cid:15035)\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15039)\n(cid:15034)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15050)\n(cid:15044)\n(cid:14967)\n(cid:15056)\n(cid:15033)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n\n(cid:15040)\n\n(cid:15008)(cid:14977)\n(cid:14984)(cid:14982)(cid:15008)\n(cid:15008)(cid:14977)\n(cid:14985)(cid:14982)(cid:15008)\n\n(cid:18)\n\n(cid:19)(cid:18)\n\n(cid:19)\n\n(cid:19)(cid:18)\n\n(cid:15011)(cid:15036)(cid:15045)(cid:15038)(cid:15051)(cid:15039)(cid:14967)(cid:15046)(cid:15037)(cid:14967)(cid:15050)(cid:15051)(cid:15040)(cid:15044)(cid:15052)(cid:15043)(cid:15052)(cid:15050)(cid:14967)(cid:14975)(cid:15050)(cid:14976)\n\n(cid:19)\n\n(cid:18)(cid:16)(cid:27)\n\n(cid:18)(cid:16)(cid:26)\n\n(cid:18)(cid:16)(cid:25)\n\n(cid:15008)(cid:14977)\n(cid:14984)(cid:14982)(cid:15008)\n(cid:15008)(cid:14977)\n(cid:14985)(cid:14982)(cid:15008)\n\n(cid:770)(cid:19)\n\n(cid:19)(cid:18)\n\n(cid:18)\n\n(cid:19)(cid:18)\n\n(cid:19)\n\n(cid:19)(cid:18)\n\n(cid:15011)(cid:15036)(cid:15045)(cid:15038)(cid:15051)(cid:15039)(cid:14967)(cid:15046)(cid:15037)(cid:14967)(cid:15050)(cid:15051)(cid:15040)(cid:15044)(cid:15052)(cid:15043)(cid:15052)(cid:15050)(cid:14967)(cid:14975)(cid:15050)(cid:14976)\n\n(cid:15002)\n\n(cid:15040)\n\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15045)\n(cid:15032)\n(cid:15051)\n(cid:15033)\n(cid:15046)\n(cid:14967)\n(cid:15045)\n(cid:15046)\n(cid:15040)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15049)\n(cid:15046)\n(cid:15037)\n(cid:15045)\n(cid:15040)\n(cid:14967)\n(cid:15037)\n(cid:15046)\n(cid:14967)\n(cid:15051)\n(cid:15045)\n(cid:15052)\n(cid:15046)\n(cid:15044)\n(cid:15000)\n\n(cid:14976)\n(cid:14972)\n(cid:14975)\n(cid:14967)\n(cid:15049)\n(cid:15036)\n(cid:15035)\n(cid:15046)\n(cid:15034)\n(cid:15036)\n(cid:15035)\n(cid:14967)\n(cid:15035)\n(cid:15036)\n(cid:15039)\n(cid:15034)\n(cid:15051)\n(cid:15032)\n(cid:15044)\n(cid:15050)\n(cid:15044)\n(cid:14967)\n(cid:15056)\n(cid:15033)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n(cid:14967)\n\n(cid:15040)\n\n(cid:19)\n\n(cid:18)(cid:16)(cid:26)\n\n(cid:18)(cid:16)(cid:24)\n\n(cid:18)(cid:16)(cid:22)\n\n(cid:18)(cid:16)(cid:20)\n\n(cid:18)\n\n(cid:15008)(cid:14977)\n\n(cid:14984)(cid:14982)(cid:15008)\n\n(cid:18)(cid:16)(cid:26)\n(cid:15011)(cid:15036)(cid:15045)(cid:15038)(cid:15051)(cid:15039)(cid:14967)(cid:15046)(cid:15037)(cid:14967)(cid:15050)(cid:15051)(cid:15040)(cid:15044)(cid:15052)(cid:15043)(cid:15052)(cid:15050)(cid:14967)(cid:14975)(cid:15050)(cid:14976)\n\n(cid:18)(cid:16)(cid:27)\n\n(cid:19)\n\nFigure 4: Dependence of amount of information obtained by simpli\ufb01ed decoders on length of stim-\nuli. Stimulus was same natural movie for both panels, but spike data obtained from retinas of\ndifferent salamander were used in panels A and B. A: Seven simultaneously recorded ganglion cells\nB: Six simultaneously recorded ganglion cells C: Arti\ufb01cial spike data generated according to the\n\ufb01ring rates shown in Figure 5A\n\n(cid:15001)\n\n(cid:23)(cid:18)\n\n(cid:18)\n(cid:18)\n\n(cid:23)(cid:18)\n\n(cid:15000)\n\n(cid:23)(cid:18)\n(cid:14970)(cid:14984)\n\n(cid:18)\n(cid:18)\n\n(cid:23)(cid:18)\n(cid:14970)(cid:14985)\n\n(cid:18)\n(cid:18)\n\n(cid:15040)\n\n(cid:14976)\n(cid:15050)\n(cid:14982)\n(cid:15050)\n(cid:15036)\n(cid:15042)\n(cid:15047)\n(cid:15050)\n(cid:14975)\n(cid:14967)\n(cid:15036)\n(cid:15051)\n(cid:15032)\n(cid:15049)\n(cid:14967)\n(cid:15038)\n(cid:15045)\n(cid:15040)\n(cid:15049)\n(cid:15040)\n(cid:15005)\n\n(cid:19)\n\n(cid:19)\n\n(cid:15019)(cid:15040)(cid:15044)(cid:15036)(cid:14967)(cid:14975)(cid:15050)(cid:14976)\n\n(cid:20)\n\n(cid:20)\n\n(cid:15002)\n\n(cid:23)(cid:18)\n\n(cid:18)\n\n(cid:23)(cid:18)\n\n(cid:18)\n\n(cid:15050)(cid:14984)\n\n(cid:15050)(cid:14985)\n\n(cid:15050)(cid:14986)\n\n(cid:15050)(cid:14987)\n\n(cid:23)(cid:18)\n\n(cid:18)\n\n(cid:23)(cid:18)\n\n(cid:18)\n\n(cid:18)(cid:16)(cid:20) (cid:18)(cid:16)(cid:22)\n\n(cid:18)(cid:16)(cid:20) (cid:18)(cid:16)(cid:22)\n\n(cid:23)(cid:18)\n\n(cid:18)\n\n(cid:23)(cid:18)\n\n(cid:18)\n\n(cid:23)(cid:18)\n\n(cid:18)\n\n(cid:23)(cid:18)\n\n(cid:18)\n\n(cid:19)(cid:16)(cid:20) (cid:19)(cid:16)(cid:22)\n\n(cid:19)(cid:16)(cid:20) (cid:19)(cid:16)(cid:22)\n\n(cid:18)(cid:16)(cid:24) (cid:18)(cid:16)(cid:26) (cid:19)\n\n(cid:18)(cid:16)(cid:24) (cid:18)(cid:16)(cid:26) (cid:19)\n\n(cid:15019)(cid:15040)(cid:15044)(cid:15036)(cid:14967)(cid:14975)(cid:15050)(cid:14976)\n\n(cid:19)(cid:16)(cid:24) (cid:19)(cid:16)(cid:26) (cid:20)\n\n(cid:19)(cid:16)(cid:24) (cid:19)(cid:16)(cid:26) (cid:20)\n\n(cid:19)(cid:16)(cid:23)\n\n(cid:20)\n\n(cid:19)(cid:16)(cid:23)\n\n(cid:20)\n\n(cid:15050)(cid:14984)\n\n(cid:15050)(cid:14985)\n\n(cid:23)(cid:18)\n\n(cid:18)\n(cid:19)\n\n(cid:23)(cid:18)\n\n(cid:18)(cid:16)(cid:23)\n\n(cid:19)\n\n(cid:18)\n(cid:18)\n\n(cid:18)(cid:16)(cid:23)\n\n(cid:18)\n(cid:19)\n\n(cid:19)\n\n(cid:15019)(cid:15040)(cid:15044)(cid:15036)(cid:14967)(cid:14975)(cid:15050)(cid:14976)\n\nFigure 5: Firing rates of two model cells. Rate of cell #1 shown in top panel; rate of cell #2 is shown\nin bottom panel. A: Firing rates from 0 to 2 s. B: Firing rates (solid line) and mean \ufb01ring rates\n(dashed line) when stimulus was 1 s long. C: Firing rates (solid line) and mean \ufb01ring rates (dashed\nline) when stimulus was 500 ms long.\n\nanother salamander retina. The same 200-s-long natural movie was used as a stimulus for Figure 4B\nas for Figure 4A, and the activities of N = 6 cells were analyzed. Figure 4B shows the result. We\ncan clearly see the same tendency as shown in Figures 4A and B: the amount of information decoded\nby the simpli\ufb01ed decoders monotonically increased as the length of the stimulus was shortened.\n\nTo clarify the reason the correlation becomes less important as the stimulus is shortened, we used\nthe toy model shown in Figure 5. We considered the case in which two cells \ufb01re independently\nin accordance with a Poisson process and performed an analysis similar to the one we did for the\nactual spike data. We used simulated spike data for the two cells generated in accordance with the\n\ufb01ring rates shown in Figure 5A. The \ufb01ring rates with a 2-s stimulus sinusoidally change with time.\nWe divided the 2-s-long stimulus into two 1-s-long stimulus, s1 and s2, as shown in Figure 5B.\nThen, we computed mutual information I and the information obtained by independent model I \u2217\n1\nover s1 and s2. Because the two cells \ufb01red independently, there were no correlations between two\ncells essentially. However, there was pseudo correlation due to the assumption of stationarity for the\ndynamically changing stimulus. The pseudo correlation was high for s1 and low for s2. This means\nthat \u201ccorrelation\u201d plays an important role in discriminating two stimuli, s1 and s2. In contrast, the\nmean \ufb01ring rates of the two cells during each stimulus were equal for s1 and s2. Therefore, if the\nstimulus is 1 s long, we cannot discriminate two stimuli by using the independent model, that is,\n1 = 0. We also considered the case in which the stimulus was 0.5 s long, as shown in Figure\nI \u2217\n5C. In this case, pseudo correlations again appeared but there was a signi\ufb01cant difference in the\nmean \ufb01ring rates between the stimuli. Thus, the independent model can be used to extract almost all\nthe information. The dependence of I \u2217\n1 /I on the stimulus length is shown in Figure 4C. Behaviors\nsimilar to those represented in Figure 4C were also observed in the analysis of the actual spike data\nfor retinal ganglion cells (Figure 4A and 4B). Even if we observe that correlation carries a signi\ufb01cant\nlarge portion of information for longer stimuli compared with the speed of change in the \ufb01ring rates,\n\n7\n\n\fit may simply be caused by meaningless pseudo correlation. To assess the role of correlation in\ninformation processing, the stimuli used should be short enough to think neural responses to these\nstimuli generated by a single process.\n\n4 Summary and Discussion\n\nWe described a general framework for investigating how far the decoding process in the brain can\nbe simpli\ufb01ed. We computed the amount of information that can be extracted by using simpli\ufb01ed\ndecoders constructed using a maximum entropy model, i.e., mismatched decoders. We showed\nthat more than 90% of the information encoded in retinal ganglion cells activities can be decoded\nby using an independent model that ignores correlation. Our results imply that the brain uses a\nsimpli\ufb01ed decoding strategy in which correlation is ignored.\n\nWhen we computed the information obtained by the independent model, we regarded a 100-ms-long\nnatural movie as one stimulus. However, when we considered stimuli that were long compared with\nthe speed of the change in the \ufb01ring rates as one stimulus, correlation carried a large portion of\ninformation. This is due to pseudo correlation, which is observed if stationarity is assumed for long\ndurations. The human visual system can process visual information in less than 150 ms [13]. We\nshould set the length of the stimulus appropriately by taking the time resolution of our visual system\ninto account.\n\nOur results do not imply that any kind of correlation does not carry much information because we\ndealt only with correlated spikes within a 5-ms time bin. In our analysis, we did not analyze the\ncorrelation on a longer time scale, which can be observed in the activities of retinal ganglion cells\n[7]. We also did not investigate the information carried by the relative timing of spikes [3]. Further\ninvestigations are needed for these types of correlation. Our approach of comparing the mutual\ninformation with the information obtained by simpli\ufb01ed decoders can also be used for studying\nother types of correlations.\n\nReferences\n\n[1] Abbott, L. F., & Dayan, P. (1999). Neural Comput., 11, 91-101.\n[2] Deneve, S., Latham, P. E., & Pouget, A. (1999). Nature Neurosci., 2, 740-745.\n[3] Gollish, S., & Meister, M. (2008). Science, 319, 1108-1111.\n[4] Jazayeri, M. & Movshon, J. A. (2006). Nature Neurosci., 9, 690-696.\n[5] Latham, P. E., & Nirenberg, S. (2005). J. Neurosci., 25, 5195-5206.\n[6] MacKay, D. (2003). Information Theory, Inference and Learning Algorithms (Cambridge Univ.\n\nPress, Cambridge, England).\n\n[7] Meister, M., & Berry, M. J. II (1999). Neuron, 22, 435-450.\n[8] Merhav, N., Kaplan, G., Lapidoth, A., & Shamai Shitz, S. (1994). IEEE Trans. Inform. Theory,\n\n40, 1953-1967.\n\n[9] Nakahara, H., & Amari, S. (2002). Neural Comput., 14, 2269-2316.\n[10] Nirenberg, S., Carcieri, S. M., Jacobs, A. L., & Latham, P. E. (2001). Nature, 411, 698-701.\n[11] Nirenberg, S., & Latham, P. (2003). Proc. Natl. Acad. Sci. USA, 100, 7348-7353.\n[12] Schneidman, E., Berry, M. J. II, Segev, R., & Bialek. W. (2006). Nature, 440, 1007-1012.\n[13] Thorpe, S., Fize, D., & Marlot, C. (1996). Nature, 381, 520-522.\n[14] Wu, S., Nakahara, H., & Amari, S. (2001). Neural Comput., 13, 775-797.\n\n8\n\n\f", "award": [], "sourceid": 354, "authors": [{"given_name": "Masafumi", "family_name": "Oizumi", "institution": null}, {"given_name": "Toshiyuki", "family_name": "Ishii", "institution": null}, {"given_name": "Kazuya", "family_name": "Ishibashi", "institution": null}, {"given_name": "Toshihiko", "family_name": "Hosoya", "institution": null}, {"given_name": "Masato", "family_name": "Okada", "institution": null}]}