{"title": "Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 1255, "page_last": 1262, "abstract": "", "full_text": "Information Dynamics and Emergent\n\nComputation in Recurrent Circuits of Spiking\n\nNeurons\n\nThomas Natschl\u00a8ager, Wolfgang Maass\nInstitute for Theoretical Computer Science\n\nftnatschl, maassg@igi.tugraz.at\n\nTechnische Universitaet Graz\n\nA-8010 Graz, Austria\n\nAbstract\n\nWe employ an ef\ufb01cient method using Bayesian and linear classi\ufb01ers\nfor analyzing the dynamics of information in high-dimensional states of\ngeneric cortical microcircuit models. It is shown that such recurrent cir-\ncuits of spiking neurons have an inherent capability to carry out rapid\ncomputations on complex spike patterns, merging information contained\nin the order of spike arrival with previously acquired context information.\n\n1\n\nIntroduction\n\nCommon analytical tools of computational complexity theory cannot be applied to re-\ncurrent circuits with complex dynamic components, such as biologically realistic neuron\nmodels and dynamic synapses.\nIn this article we explore the capability of information\ntheoretic concepts to throw light on emergent computations in recurrent circuit of spiking\nneurons. This approach is attractive since it may potentially provide a solid mathematical\nbasis for understanding such computations. But it is methodologically dif\ufb01cult because of\nsystematic errors caused by under-sampling problems that are ubiquitous even in extensive\ncomputer simulations of relatively small circuits. Previous work on these methodologi-\ncal problems had focused on estimating the information in spike trains, i.e.\ntemporally\nextended protocols of the activity of one or a few neurons. In contrast to that this paper\naddresses methods for estimating the information that is instantly available to a neuron that\nhas synaptic connections to a large number of neurons.\n\nWe will de\ufb01ne the speci\ufb01c circuit model used for our study in section 2 (although the\nmethods that we apply appear to be useful for to a much wider class of analog and dig-\nital recurrent circuits). The combination of information theoretic methods with methods\nfrom machine learning that we employ is discussed in section 3. The results of applica-\ntions of these methods to the analysis of the distribution and dynamics of information in\na generic recurrent circuit of spiking neurons are presented in section 4. Applications of\nthese methods to the analysis of emergent computations are discussed in section 5.\n\n\fpossible templates for spike train segments\n\nA \ntemplate \n\n1. segment\n\n2. segment\n\n3. segment\n\n4. segment\n\n0 \n\n1 \n\nB \n\nan example for resulting input spike trains (with noise)\n\ntemplate 1 (s\n=1)\n1\n\ntemplate 0 (s\n=0)\n2\n\n=1)\ntemplate 1 (s\n3\n\ntemplate 0 (s\n=0)\n4\n\n0\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\ntime [sec]\n\n0.5\n\n0.6\n\n0.7\n\n0.8\n\nFigure 1: Input distribution used throughout the paper. Each input consists of 5 spike trains\nof length 800 ms generated from 4 segments of length 200 ms each. A For each segment\n2 templates 0 and 1 were generated randomly (Poisson spike trains with a frequency of\n20 Hz). B The actual input spike trains were generated by choosing randomly for each\nsegment i, i = 1; : : : ; 4, one of the two associated templates (si = 0 or si = 1), and then\ngenerating a noisy version by moving each spike by an amount drawn from a Gaussian\ndistribution with mean 0 and SD 4 ms.\n2 Our study case: A Generic Neural Microcircuit Model\n\nAs our study case for analyzing information in high-dimensional circuit states we used\na randomly connected circuit with sparse, primarily local connectivity consisting of 800\nleaky integrate-and-\ufb01re (I&F) neurons, 20% of which were randomly chosen to be in-\nhibitory. The 800 neurons of the circuit were arranged on two 20 \u00a3 20 layers L1 and L2.\nCircuit inputs consisting of 5 spike trains were injected into a randomly chosen subset of\nneurons in layer L1 (the connection probability was set to 0.25 for each of the 5 input\nchannels and each neuron in layer L1). We modeled the (short term) dynamics of synapses\naccording to the model proposed in [1], with the synaptic parameters U (use), D (time\nconstant for depression), F (time constant for facilitation) randomly chosen from Gaussian\ndistributions that model empirical data for such connections. Parameters of neurons and\nsynapses were chosen as in [2] to \ufb01t data from microcircuits in rat somatosensory cortex\n(based on [3] and [1]).\n\nSince neural microcircuits in the nervous system often receive salient input in the form of\nspatio-temporal \ufb01ring patterns (e.g. from arrays of sensory neurons, or from other brain\nareas), we have concentrated on circuit inputs of this type. Such \ufb01ring pattern could for ex-\nample represent visual information received during a saccade, or the neural representation\nof a phoneme or syllable in auditory cortex. Information dynamics and emergent com-\nputation in recurrent circuits of spiking neurons were investigated for input streams over\n800 ms consisting of sequences of noisy versions of 4 of such \ufb01ring patterns. We restricted\nour analysis to the case where in each of the four 200 ms segments one of two template\npatterns is possible, see Fig. 1. In the following we write si = 1 (si = 0) if a noisy version\nof template 1 (0) is used in the i-th time segment of the circuit input.\nFig. 2 shows the response of a circuit of spiking neurons (drawn from the distribution\nspeci\ufb01ed above) to the input stream exhibited in Fig. 1B. Each frame in Fig. 2 shows the\ncurrent \ufb01ring activity of one layer of the circuit at a particular point t in time. Since in\nsuch rather small circuit (compared for example with the estimated 105 neurons below a\nmm2 of cortical surface) very few neurons \ufb01re at any given ms, we have replaced each\nspike by a pulse whose amplitude decays exponentially with a time constant of 30 ms. This\nmodels the impact of a spike on the membrane potential of a generic postsynaptic neuron.\nThe resulting vector r(t) = hr1(t); : : : ; r800(t)i consisting of 800 analog values from the\n\n\ft=280 ms\n\nt=290 ms\n\nt=300 ms\n\nt=310 ms\n\n6\n\n4\n\n2\n\n0\n\nFigure 2: Snapshots of the \ufb01rst 400 components of the circuit state r(t) (corresponding to\nthe neurons in the layer L1) at various times t for the input shown at the bottom of \ufb01g. 1.\nBlack denotes high activity, white no activity. A spike at time ts \u2022 t adds a value of\nexp(\u00a1(t \u00a1 ts)=(30ms)) to the corresponding component of the state r(t).\n\n800 neurons in the circuit is exactly the \u201cliquid state\u201d of the circuit at time t in the context\nof the abstract computational model introduced in [2]. In the subsequent sections we will\nanalyze the temporal dynamics of the information contained in these momentary circuit\nstates r(t).1\n\n3 Methods for Analyzing the Information contained in Circuit States\nThe mutual information M I(X; R) between two random variables X and R can be de\ufb01ned\n\nby M I(X; R) = H(X) \u00a1 H(XjR), where H(X) = \u00a1Px2Range(X) p(x) log p(x) is the\n\nentropy of X, and H(XjR) is the expected value (with regard to R) of the conditional\nentropy of X given R, see e.g. [4]. It is well known that empirical estimates of the entropy\ntend to underestimate the true entropy of a random variable (see e.g. [5, 6]). Hence in\nsituations where the true value of H(X) is known (as is typically the case in neuroscience\napplications where X represents the stimulus, whose distribution is controlled by the ex-\nperimentalist), the generic underestimate of H(XjR) yield a generic overestimate of the\nmutual information M I(X; R) = H(X) \u00a1 H(XjR) for \ufb01nite sample sizes. This under-\nsampling effect has been addressed in a number of studies (see e.g. [7], [8] and [9] and the\nreferences therein), and has turned out to be a serious obstacle for a wide-spread application\nof information theoretic methods to the analysis of neural computation. The seriousness of\nthis problem becomes obvious from results achieved for our study case of a generic neural\nmicrocircuit shown in Fig. 3A. The dashed line shows the dependence of \u201craw\u201d estimates\nM Iraw of the mutual information M I(s2; R) on the sample size2 N, which ranges here\nfrom 103 to 2 \u00a2 105. The raw estimate of M I(s2; R) results from a direct application of the\nde\ufb01nition of M I to the observed occupancy frequencies for a discrete set of bins3 , where\nR consists here of just d = 5 or d = 10 components of the 800-dimensional circuit state\nr(t) for t = 660 ms, and s2 is the bit encoded by the second input segment. For more\ncomponents d of the current circuit state r(t), e.g. for estimating the mutual information\nM I(s2; R) between the preceding circuit input s2 and the current \ufb01ring activity in a sub-\ncircuit consisting of d = 20 or more neurons, even sample sizes beyond 106 are likely to\nseverely overestimate this mutual information.\n\n1One should note that these circuit states do not re\ufb02ect the complete current state of the underlying\ndynamical system, only those parts of the state of the dynamical system that are in principle \u201cvisible\u201d\nfor neurons outside the circuit. The current values of the membrane potential of neurons in the circuit\nand the current values of internal variables of dynamic synapses of the circuit are not visible in this\nsense.\n\n2In our case the sample size N refers to the number of computer simulations of the circuit response\nto new drawings of circuit inputs, with new drawings of temporal jitter in the input spike trains and\ninitial conditions of the neurons in the circuit.\n\n3 For direct estimates of the M I the analog value of each component of the circuit state r(t) has\nto be divided into discrete bins. We \ufb01rst linearly transformed each component of r(t) such that it has\nzeros mean and variance (cid:190)2 = 1:0. The transformed components are then binned with a resolution\nof \u2020 = 0:5. This means that there are four bins in the range \u00a7(cid:190).\n\n\f B lower bounds (d=5, s2)\n\n MIraw\nBayes (e = 0.5)\nlinear (e = 0.5)\nlinear (e = 0)\n\n A corrected MI (d=5, s2)\n\n MIraw\n MInaive\n MIinfinity\n\n]\nt\ni\n\nb\n\n[\n \nI\n\nM\n\n \n\n0.45\n\n0.4\n\n0.35\n\n0.3\n\n1\n\n]\nt\ni\n\nb\n\n[\n \nI\n\nM\n\n \n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n C lower bounds (d=10, s2)\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n E lower bounds (d=5, s3)\n\n1\n\n F lower bounds (d=10, s3)\n\n1\n\n D entropy of states (d=5)\n\n7.5\n\n7.25\n\n7\n\n6.75\n\n]\nt\ni\n\nb\n\n[\n \ny\np\no\nr\nt\n\nn\ne\n\n H(R)\n\nraw\n\n H(R)\n\nMa\n\n H(R|X)\n\nraw\n\n H(R|X)\n\nMa\n\n]\nt\ni\n\nb\n\n[\n \nI\n\nM\n\n \n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n103\n\n104\n\nsample size\n\n105\n\n103\n\n104\n\nsample size\n\n105\n\n0\n103\n\n104\n\nsample size\n\n105\n\nFigure 3: Estimated mutual information depends on sample size. In all panels d denotes\nthe number of components of the circuit state r(t) at time t = 660 ms (or equivalently\nthe number of neurons considered). A Dependence of the \u201craw\u201d estimate M Iraw and two\ncorrected estimates M Inaive and M Iinf inity of the mutual information M I(s2; R) (see\ntext). B Lower bounds M I(s2; h(R)) for the mutual information obtained via classi\ufb01ers h\nwhich are trained to predict the actual value of s2 given the circuit state r(t). Results are\nshown for a) an empirical Bayes classi\ufb01er (discretization \u2020 = 0:5, see footnote 3 and 5), b)\na linear classi\ufb01er trained on the discrete (\u2020 = 0:5) data and c) for a linear classi\ufb01er trained\non the analog data (\u2020 = 0). In the case of the Bayes classi\ufb01er M I(s2; h(R)) was estimated\nby employing a leave-one-out procedure (which is computationally ef\ufb01cient for a Bayes\nclassi\ufb01er), whereas for the linear classi\ufb01ers a test set of size 5 \u00a2 104 was used (hence no\nresults beyond a sample size of 1:5 \u00a2 105). C Same as B but for d = 10. D Estimates of the\nentropies H(R) and H(RjX). The \u201craw\u201d estimates are compared with the corresponding\nMa-bounds (see text). The \ufb01lled triangle marks the sample size from which on the Ma-\nbound is below the raw estimate. E Same as B but for M I(s3; h(R)). F Same as E but for\nd = 10.\n\nSeveral methods for correcting this bias towards overestimation of M I have been sug-\ngested in the literature. In section 3.1 of [7] it is proposed to subtract one of two possible\nbias correction terms Bnaive and Bf ull from the raw estimate M Iraw of the mutual in-\nformation. The effect of subtracting Bnaive is shown for d = 5 components of r(t) in\nFig. 3A. This correction is too optimistic for these applications, since the corrected es-\ntimate M Inaive = M Iraw \u00a1 Bnaive at small sample sizes (e.g. 104) is still substan-\ntially larger than the raw estimate M Iraw at large sample sizes (e.g. 105). The subtrac-\ntion of the second proposed term Bf ull is not applicable in our situation because it yields\nforM If ull = M Iraw \u00a1 Bf ull values lower than zero for all considered sample sizes. The\nreason is, that Bf ull is proportional to the quotient \u201cnumber of possible response bins\u201d / N\nand the number of possible response bins is in the order of 3010 in this example. Another\nway to correct M Iraw is proposed in [10]. This approach is based on a series expansion of\nM I in 1=N [6] and is effectively a method to get an empirical estimate M Iinf inity of the\nmutual information for in\ufb01nite sample size (N ! 1). It can be seen in Fig. 3A that for\n\n\fmoderate sample sizes M Iinf inity also yields too optimistic estimates for M I.\nAnother method for dealing with generic overestimates of M I has been proposed in [10].\nThis method it based on the equation M I(X; R) = H(R) \u00a1 H(RjX) and compares the\nraw estimates of H(R) and H(RjX) with the so-called Ma-bounds, and suggests to judge\nraw estimates of H(R) and H(RjX), and hence raw estimates of M I(X; R) = H(R) \u00a1\nH(RjX), as being trustworthy as soon as the sample size is so large that the corresponding\nMa-bounds (which are conjectured to be less affected by undersampling) assume values\nbelow the raw estimates of H(R) and H(RjX). According to this criterion a sample size\nof 9 \u00a2 103 would be suf\ufb01cient in the case of 5-neuron subcircuits (i.e., d = 5 components of\nr(t)), c.f. Fig. 3D.4 However, Fig. 3A shows that the raw estimate M Iraw is still too high\nfor N = 9 \u00a2 103 since M Iraw assumes a substantially smaller value at N = 2 \u00a2 105.\nIn view of this unreliability of \u2013 even corrected \u2013 estimates for the mutual information we\nhave employed standard methods from machine learning in order to derive lower bounds\nfor the M I (see for example [8] and [9] for references to preceding related work). This\nmethod is computationally feasible and yields with not too large sample sizes reliable\nlower bounds for the M I even for large numbers of components of the circuit state. In\nfact, we will apply it in sections 4 and 5 even to the full 800-component circuit state\nr(t). This method is quite simple. According to the data processing inequality [4] one\nhas M I(X; R) \u201a M I(X; h(R)) for any function h. Obviously M I(X; h(R)) is easier to\nestimate than M I(X; R) if the dimension of h(R) is substantially lower than that of R,\nespecially if h(R) assumes just a few discrete values. Furthermore the difference between\nM I(X; R) and M I(X; h(R)) is minimal if h(R) throws away only that information in R\nthat is not relevant for predicting the value of X. Hence it makes sense to use as h a predic-\ntor or classi\ufb01er that has been trained to predict the current value of X. Similar approaches\nfor estimating a lower bound were motivated by the idea of predicting the stimulus (X)\ngiven the neural response (R) (see [8], [9] and the references therein). To get an unbiased\nestimate for M I(X; h(R)) one has to make sure that M I(X; h(R)) is estimated on data\nwhich have not been used for the training of h. To make the best use of the data one can al-\nternatively use cross-validation or even leave-one-out (see [11]) to estimate M I(X; h(R)).\nFig. 3B, 3C, 3E, and 3F show for 3 different predictors h how the resulting lower bounds\nfor the M I depend on the sample size N.\nIt is noteworthy that the lower bounds M I(X; h(R)) derived with the empirical Bayes\nclassi\ufb01er5 increase signi\ufb01cantly with the sample size6 and converge quite well to the upper\nbounds M Iraw(X; R). This re\ufb02ects the fact that the estimated joint probability density\nbetween X and R gets more and more accurate. Furthermore the computationally less\ndemanding7 use of linear classi\ufb01ers h also yields signi\ufb01cant lower bounds for M I(X; R),\nespecially if the true value of M I(X; R) is not too small. In our application this does\nnot even require high numerical precision, since a coarse binning (see footnote 3) of the\nanalog components of r(t) suf\ufb01ces, see Fig. 3 B,C,E,F. All estimates of M I(X; R) in\n\n4These kind of results depend on a division of the space of circuit states into subspaces, which is\nrequired for the calculation of the Ma-bound. In our case we have chosen the subspaces such that the\nfrequency counts of any two circuit states in the same subspace differ by at most 1.\n\n5 The empirical Bayes classi\ufb01er operates as follows: given observed (and discretized) d com-\nponents r(d)(t) of the state r(t) it predicts the input which was observed most frequently for the\ngiven state components r(d)(t) (maximum a posterior classi\ufb01cation, see e.g. [11]). If r(d)(t) was\nnot observed so far a random guess about the input is made.\n\n6In fact, in the limit N ! 1 the Bayes classi\ufb01er is the optimal classi\ufb01er for the discretized data\nin the sense that it would yield the lowest classi\ufb01cation error \u2014 and hence the highest lower bound\non mutual information \u2014 over all possible classi\ufb01ers.\n\n7In contrast to the Bayes classi\ufb01er the linear classi\ufb01ers (both for analog and discrete data) yield\nalready for relatively small sample sizes N good results which do not improve much with increasing\nN.\n\n\f1\n\n0.5\n\n0\n1\n\n0.5\n\n0\n1\n\n0.5\n\n0\n1\n\n0.5\n\n0\n1\n\n0.5\n\n0\n1\n\n0.5\n\n]\nt\ni\n\nb\n\n[\n \n\nn\no\n\ni\nt\n\na\nm\nr\no\nn\n\nf\n\ni\n \nl\n\nt\n\na\nu\nu\nm\n\n0\n\n0\n\ns\n\n1\n\ns\n\n2\n\n0.2\n\ns\n\n3\n\n0.4\n\ntime [sec]\n\ns\n\n4\n\n0.6\n\n1\u00b7 5\n\n5\u00b7 5\n\n1\u00b7 10\n\n1\u00b7 20\n\n5\u00b7 160\n\n1\u00b7 800\n\n0.8\n\nFigure 4: Information in subset of neurons. Shown are lower bounds for mutual information\nM I(si; h(R)) obtained with a linear classi\ufb01er h operating on d components of the circuit\nstate r(t). The numbers a \u00a3 d to the right of each panel specify the number of components\nd used by the linear classi\ufb01er and for how many different choices a of such subsets of size\nd the results are plotted in that panel.\n\nthe subsequent sections are lower bounds M I(X; h(R)) computed via linear classi\ufb01ers h.\nThese types of lower bounds for M I(X; R) are of particular interest from the point of\nview of neural computation, since a linear classi\ufb01er can in principle be approximated by a\nneuron that is trained (for example by a suitable variation of the perceptron learning rule)\nto extract information about X from the current circuit state R. Hence a high value of a\nlower bound M I(X; h(R)) for such h shows not only that information about X is present\nin the current circuit state R, but also that this information is in principle accessible for\nother neurons.\n\n4 Distribution and Dynamics of Information in Circuit States\n\nWe have applied the method of estimating lower bounds for mutual information via linear\nclassi\ufb01ers described in the preceding section to analyze the spatial distribution and tem-\nporal dynamics of information for our study case described in section 2. Fig. 4 shows the\ntemporal dynamics of information (estimated every 20ms as described in section 3) about\ninput bits si (encoded as described in section 2) for different components of the circuit state\nr(t) corresponding to different randomly drawn subsets of neurons in the circuit. One sees\nthat even subsets of just 5 neurons absorb substantial information about the input bits si,\nhowever with a rather slow onset of the information uptake at the beginning of a segment\nand little memory retention when this information is overwritten by the next input segment.\nBy merging the information from different subsets of neurons the uptake of new informa-\ntion gets faster and the memory retention grows. Note that for large sets of neurons (160\nand 800) the information about each input bit si jumps up to its maximal value right at the\nbeginning of the corresponding ith segment of the input trains.\n\n\fA\n\n]\n)\ns\n(\nH\n\n100\n\n50\n\n0\n\n \n \nf\n\n \n\no\n%\n\n[\n \nI\n\nM\n\n \n\nC\n\n]\n)\nf\n(\n\nH\n\n \n \nf\n\n \n\no\n%\n\n[\n \nI\n\nM\n\n \n\n100\n\n50\n\n0\n\n0\n\ns1\n\ns2\n\ns3\n\ns4\n\nH\n\n \n \nf\n\n \n\no\n%\n\n[\n \nI\n\nM\n\n \n\nxor(s\n,s\n)\n2\n1\n\n0.2\n\n0.4\n\ntime [sec]\n\n0.6\n\n0.8\n\nH\n\n \n \nf\n\n \n\no\n%\n\n[\n \nI\n\nM\n\n \n\nB\n\n]\n)\nf\n(\n\nD\n\n]\n)\nf\n(\n\n100\n\n50\n\n0\n\n100\n\n50\n\n0\n\n0\n\n (cid:217)\ns\n(cid:216) s\n1\ns\ns\n\n (cid:216) s\n (cid:217)\n2\n s\n (cid:217)\n2\n1\n s\n (cid:218)\n2\n1\n s\n2\n1\n\nparity(s\nparity(s\n\n,s\n,s\n)\n3\n2\n1\n,s\n,s\n)\n4\n3\n2\n\n0.2\n\n0.4\n\ntime [sec]\n\n0.6\n\n0.8\n\nFigure 5: Emergent computations. A Dynamics of information about input bits as in the\nbottom row of Fig. 4. H(s) denotes the entropy of a segment si (which is 1 bit for i =\n1; 2; 3; 4). B, C, D Lower bounds for the mutual information M I(f; h(R)) for various\nBoolean functions f (s1; : : : ; s4) obtained with a linear classi\ufb01er h operating on the full\n800-component circuit state R = r(t). H(f ) denotes the entropy of a Boolean function\nf (s1; : : : ; s4) if the si are independently uniformly drawn from f0; 1g.\n5 Emergent Computation in Recurrent Circuits of Spiking Neurons\n\nIn this section we apply the same method to analyze the mutual information between the\ncurrent circuit state and the target outputs of various computations on the information con-\ntained in the sequence of spatio-temporal spike patterns in the input stream to the circuit.\nThis provides an interesting new method for analyzing neural computation, rather than just\nneural communication and coding. There exist 16 different Boolean functions f (s1; s2)\nthat depend just on the \ufb01rst two of the 4 bits s1; : : : ; s4. Fig. 5B,C shows that all these\nBoolean functions f are autonomously computed by the circuit, in the sense that the cur-\nrent circuit state contains high mutual information with the target output f (s1; s2) of this\nfunction f. Furthermore the information about the result f (s1; s2) of this computation\ncan be extracted linearly from the current circuit state r(t) (in spite of the fact that the\ncomputation of f (s1; s2) from the spike patterns in the input requires highly nonlinear\ncomputational operations). This is shown in Fig. 5B and 5C for those 5 Boolean functions\nof 2 variables that are nontrivial in the sense that their output really depends on both input\nvariables. There exist 5 other Boolean functions which are nontrivial in this sense, which\nare just the negations of the 5 Boolean functions shown (and for which the mutual informa-\ntion analysis therefore yields exactly the same result). In Fig. 5D corresponding results are\nshown for parity functions that depend on three of the 4 bits s1; s2; s3; s4. These Boolean\nfunctions are the most dif\ufb01cult ones to compute in the sense that knowledge of just 1 or 2\nof their input bits does not give any advantage in guessing the output bit.\n\nOne noteworthy feature in all these emergent computations is that information about the\nresult of the computation is already present in the current circuit state long before the com-\nplete spatio-temporal input patterns that encode the relevant input bits have been received\nby the circuit. In fact, the computation of f (s1; s2) automatically just uses the temporal\norder of the \ufb01rst spikes in the pattern encoding s2, and merges information contained in the\norder of these spikes with the \u201ccontext\u201d de\ufb01ned by the preceding input pattern. In this way\nthe circuit automatically completes an ultra-rapid computation within just 20 ms of the be-\nginning of the second pattern s2. The existence of such ultra-rapid neural computations has\npreviously already been inferred [12] but models that could explain the possibility of such\nultra-rapid computations on the basis of generic models for recurrent neural microcircuits\n\n\fhave been missing.\n\n6 Discussion\nWe have analyzed the dynamics of information in high-dimensional circuit states of a\ngeneric neural microcircuit model. We have focused on that information which can be\nextracted by a linear classi\ufb01er (a linear classi\ufb01er may be viewed as a coarse model for the\nclassi\ufb01cation capability of a biological neuron). This approach also has the advantage that\nsigni\ufb01cant lower bounds for the information content of high-dimensional circuit states can\nalready be achieved for relatively small sample sizes. Our results show that information\nabout current and preceding circuit inputs is spread throughout the circuit in a rather uni-\nform manner. Furthermore our results show that a generic neural microcircuit model has\ninherent capabilities to process new input in the context of other information that arrived\nseveral hundred ms ago, and that information about the outputs of numerous potentially in-\nteresting target functions automatically accumulates in the current circuit state. Such emer-\ngent computation in circuits of spiking neurons is extremely fast, and therefore provides\nan interesting alternative to models based on special-purpose constructions for explaining\nempirically observed [12] ultra-rapid computations in neural systems.\n\nThe method for analyzing information contained in high-dimensional circuit states that we\nhave explored in this article for a generic neural microcircuit model should also be appli-\ncable to biological data from multi-unit recordings, f M RI etc., since signi\ufb01cant lower\nbounds for mutual information were achieved in our study case already for sample sizes in\nthe range of a few hundred (see Fig. 3). In this way one could get insight into the dynamics\nof information and emergent computations in biological neural systems.\nAcknowledgement: We would like to thank Henry Markram for inspiring discussions.\nThis research was partially supported by the Austrian Science Fund (FWF), project #\nP15386.\nReferences\n[1] H. Markram, Y. Wang, and M. Tsodyks. Differential signaling via the same axon of neocortical\n\npyramidal neurons. Proc. Natl. Acad. Sci., 95:5323\u20135328, 1998.\n\n[2] W. Maass, T. Natschl\u00a8ager, and H. Markram. Real-time computing without stable states: A new\nframework for neural computation based on perturbations. Neural Computation, 14(11):2531\u2013\n2560, 2002.\n\n[3] A. Gupta, Y. Wang, and H. Markram. Organizing principles for a diversity of GABAergic\n\ninterneurons and synapses in the neocortex. Science, 287:273\u2013278, 2000.\n\n[4] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, 1991.\n[5] M. S. Roulston. Estimating the errors on measured entropy and mutual information. Physica\n\nD, 125:285\u2013294, 1999.\n\n[6] S. Panzeri and A. Treves. Analytical estimates of limited sampling biases in different informa-\n\ntion measures. Network: Computation in Neural Systems, 7:87\u2013107, 1996.\n\n[7] G. Pola, S. R. Schultz, R. S. Petersen, and S. Panzeri. A practical guide to information analysis\nof spike trains. In R. K\u00a8otter, editor, Neuroscience Databases. A Practical Guide, chapter 10,\npages 139\u2013153. Kluwer Academic Publishers (Boston), 2003.\n\n[8] L. Paninski. Estimation of entropy and mutual information. Neural Computation, 15:1191\u2013\n\n1253, 2003.\n\n[9] J. Hertz. Reading the information in the outcome of neural computation. online available via\n\nhttp://www.nordita.dk/\u02dchertz/papers/in\ufb01t.ps.gz.\n\n[10] S.P. Strong, R. Koberle, R. R. de Ruyter van Steveninck, and E. Bialek. Entropy and information\n\nin neural spike trains. Physical Review Letters, 80(1):197\u2013200, 1998.\n\n[11] R. O. Duda, P.E. Hart, and D. G. Stork. Pattern Classi\ufb01cation. John Wiley & Sons, 2nd edition,\n\n2001.\n\n[12] S. Thorpe, D. Fize, and C. Marlot. Speed of processing in the human visual system. Nature,\n\n381:520\u2013522, 1996.\n\n\f", "award": [], "sourceid": 2389, "authors": [{"given_name": "Thomas", "family_name": "Natschl\u00e4ger", "institution": null}, {"given_name": "Wolfgang", "family_name": "Maass", "institution": null}]}