{"title": "Spatio-temporal Representations of Uncertainty in Spiking Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 2024, "page_last": 2032, "abstract": "It has been long argued that, because of inherent ambiguity and noise, the brain needs to represent uncertainty in the form of probability distributions. The neural encoding of such distributions remains however highly controversial. Here we present a novel circuit model for representing multidimensional real-valued distributions using a spike based spatio-temporal code. Our model combines the computational advantages of the currently competing models for probabilistic codes and exhibits realistic neural responses along a variety of classic measures. Furthermore, the model highlights the challenges associated with interpreting neural activity in relation to behavioral uncertainty and points to alternative population-level approaches for the experimental validation of distributed representations.", "full_text": "Spatio-temporal Representations of Uncertainty in\n\nSpiking Neural Networks\n\nCristina Savin\nIST Austria\n\nKlosterneuburg, A-3400, Austria\n\ncsavin@ist.ac.at\n\nSophie Deneve\n\nGroup for Neural Theory, ENS Paris\n\nRue d\u2019Ulm, 29, Paris, France\nsophie.deneve@ens.fr\n\nAbstract\n\nIt has been long argued that, because of inherent ambiguity and noise, the brain\nneeds to represent uncertainty in the form of probability distributions. The neu-\nral encoding of such distributions remains however highly controversial. Here we\npresent a novel circuit model for representing multidimensional real-valued distri-\nbutions using a spike based spatio-temporal code. Our model combines the com-\nputational advantages of the currently competing models for probabilistic codes\nand exhibits realistic neural responses along a variety of classic measures. Fur-\nthermore, the model highlights the challenges associated with interpreting neural\nactivity in relation to behavioral uncertainty and points to alternative population-\nlevel approaches for the experimental validation of distributed representations.\n\nCore brain computations, such as sensory perception, have been successfully characterized as prob-\nabilistic inference, whereby sensory stimuli are interpreted in terms of the objects or features that\ngave rise to them [1, 2]. The tenet of this Bayesian framework is the idea that the brain repre-\nsents uncertainty about the world in the form of probability distributions. While this notion seems\nsupported by behavioural evidence, the neural underpinnings of probabilistic computation remain\nhighly debated [1, 2]. Different proposals offer different trade-offs between \ufb02exibility, i.e. the class\nof distributions they can represent, and speed, i.e. how fast can the uncertainty be read out from the\nneural activity. Given these two dimensions, we can divide existing models in two main classes.\nThe \ufb01rst set, which we will refer to as spatial codes, distributes information about the distribution\nacross neurons; the activity of different neurons re\ufb02ects different values of an underlying random\nvariable (alternatively, it can be viewed as encoding parameters of the underlying distribution [1,\n2]). Linear probabilistic population codes (PPCs) are a popular instance of this class, whereby\nthe log-probability of a random variable can be linearly decoded from the responses of neurons\ntuned to different values of that variable [3]. This encoding scheme has the advantage of speed, as\nuncertainty can be decoded in a neurally plausible way from the quasi-instantaneous neural activity,\nand reproduces aspects of the experimental data. However, these bene\ufb01ts come at the price of\n\ufb02exibility: the class of distributions that the network can represent needs to be highly restricted,\notherwise the network size scales exponentially with the number of variables [1].\nThis limitation has lead to a second class of models, which we will refer to as temporal codes.These\nuse stochastic network dynamics to sample from the target distribution [4, 1]. Existing models\nfrom this class assume that the activity of each neuron encodes a different random variable; the\nnetwork explores the state space such that the time spent in any particular state is proportional to its\nprobability under the distribution [4]. This representation is exact in the limit of in\ufb01nite samples.\nIt has several important computational advantages (e.g. easy marginalization, parameter learning,\nlinear scaling of network size with the number of dimensions) and further accounts for trial-to-\ntrial variability in neural responses [1]. These bene\ufb01ts come at the cost of sampling time: a fair\nrepresentation of the underlying distribution requires pooling over several samples, i.e. integrating\nneural activity over time. Some have argued that this feature makes sampling unfeasibly slow [2].\n\n1\n\n\fHere we show that it is possible to construct spatio-temporal codes that combine the best of both\nworlds. The core idea is that the network activity evolves through recurrent dynamics such that\nsamples from the posterior distribution can be linearly decoded from the (quasi-)instantaneous neu-\nral responses. This distributed representation allows several independent samples to be encoded\nsimultaneously, thus enabling a fast representation of uncertainty that improves over time. Com-\nputationally, our model inherits all the bene\ufb01ts of a sampling-based representation, while overcom-\ning potential shortcomings of classic temporal codes. We explored the general implications of the\nnew coding scheme for a simple inference problem and found that the network reproduces many\nproperties of biological neurons, such as tuning, variability, co-variability and their modulation by\nuncertainty. Nonetheless, these single or pairwise measures provided limited information about the\nunderlying distribution represented by the circuit. In the context of our model, these results argue for\nusing decoding as tool for validating distributed probabilistic codes, an approach which we illustrate\nwith a simple example.\n\n1 A distributed spatio-temporal representation of uncertainty\n\nThe main idea of the representation is simple: we want to approximate a real-valued D-dimensional\ndistribution P(x) by samples generated by K independent chains implementing Markov Chain\nMonte Carlo (MCMC) sampling [5], y(t) = {yk(t)}k=1...K, with yk \u223c P(x) (Fig. 1). To this aim,\nwe encode the stochastic trajectory of the chains in a population of N spiking neurons (N > KD),\nsuch that y(t) is linearly decodable from the neural responses. In particular, we adapt a recently\nproposed coding scheme for representing time-varying signals [6] and construct stochastic neural\ndynamics such that samples from the target distribution can be obtained by a linear mapping of the\nspikes convolved with an epsp-like exponential kernel (Fig. 1a):\n\n\u02c6y(t) = \u0393 \u00b7 r(t)\n\n(1)\nwhere \u02c6y(t) denotes the decoded state of the K MCMC chains at time t (of size D \u00d7 K), \u0393 is the\ndecoding matrix1 and r is the low-pass version of the spikes o, \u03c4V \u02d9ri = \u2212ri + oi.\nTo facilitate the presentation of the model, we start by constructing recurrent dynamics for sampling\na single MCMC chain, which we then generalise to the multi-chain scenario. Based on these network\ndynamics, we implement probabilistic inference in a linear Gaussian mixture, which we use in\nSection 2 to investigate the neural implications of the code.\n\nDistributed MCMC sampling\n\nAs a starting point, consider the computational task of representing an arbitrary temporal trajectory\n(the gray line in Fig. 1b) as the linear combination of the responses of a set of neurons (one can think\nof this as an analog-to-digital conversion of sorts). If the decoding weights of each neuron points in\na different direction (colour coded), then the trajectory could be ef\ufb01ciently reconstructed by adding\nthe proper weight vectors (the local derivative of the trajectory) at just the right moment. Indeed,\nrecent work has shown how to construct network dynamics enabling the network to track a trajectory\nas closely as possible [6]. To achieve this, neurons use a greedy strategy: each neuron monitors\nthe current prediction error (the difference between the trajectory and its linear decoding from the\nspikes) and spikes only when its weight vector points in the right direction. When the decoding\nweights of several neurons point the same way (as in Fig. 1a), they compete to represent the signal\nvia recurrent inhibition:2 from the perspective of the decoder, it does not matter which of these\nneurons spikes next, so the actual population responses depend on the previous spike history, initial\nconditions and intrinsic neural noise.3 As a result, spikes are highly irregular and look \u2018random\u2019\n(with Poisson-like statistics), even when representing a constant signal. While competition is an\nimportant driving force for the network, neurons can also act cooperatively \u2013 when the change in the\nsignal is larger than the contribution of a single decoding vector, then several neurons need to spike\ntogether to represent the signal (e.g. response to the step in Fig. 1a).\n\n1The decoding matrix can be arbitrary.\n2This competition makes spike correlations extremely weak in general [7].\n3When N (cid:29) D there is a strong degeneracy in the map between neural responses and the signal, such that\nseveral different spike sequences yield the same decoded signal. In absence of internal noise, the encoding is\nnonetheless deterministic despite apparent variability.\n\n2\n\n\fFigure 1: Overview of the model. a. We assume a linear decoder, where the estimated signal \u02c6y\nis obtained as a weighted sum of neural responses (exponential kernel, blue). b. When the signal\nis multidimensional, different neurons are responsible for encoding different directions along the\ntarget trajectory (gray). c. Alternative network architectures: in the externally-driven version the\ntarget trajectory is given as an external input, whereas in the self-generated case it is computed via\nslow recurrent connections (green arrow); the input s is used during inference, when sampling from\nP(x|s). d. Encoding an example MCMC trajectory in the externally-driven mode. Light colours\nshow ground truth; dark colours the decoded signal. e. Single-chain samples from a multivariate\ndistribution (shown as colormap) decoded from a spiking network; trajectory subsampled by a factor\nof 10 for visibility. e. Decoded samples using 5 chains (colors) and a \ufb01fth of the time in e.\n\n\u03c4v\n\nj \u03932\n\nthe neural threshold is \u0398i = (cid:80)\n\nFormally, the network dynamics minimise the squared reconstruction error, (y \u2212 \u02c6y)2, under certain\nconstraints on mean \ufb01ring rate which ensure the representation is distributed (see Suppl. Info.).\nThe resulting network consists of spiking neurons with simple leaky-integrate-and-\ufb01re dynamics,\nV \u2212 Wo + I, where \u02d9V denotes the temporal derivative of V, the binary vector o denotes\n\u02d9V = \u2212 1\nthe spikes, oi(t) = \u03b4 iff Vi(t) > \u0398i, \u03c4v is the membrane time constant (same as that of the decoder),\nij + \u03bb and the recurrent connections, W = \u0393T\u0393 + \u03bb \u00b7 I, can\nbe learned by STDP [8], where \u03bb is a free parameter controlling neural sparseness. The membrane\npotential of each neuron tracks the component of the reconstruction error along the direction of its\ndecoding weights. As a consequence, the network is balanced (because the dynamics aim to bring\nthe reconstruction error to zero) and membrane potentials are correlated, particularly in pairs of\nneurons with similar decoding weights [7] (see Fig. 2c).\nIn the traditional form, which we refer to as the \u2018externally-driven\u2019 network (Fig. 1c), information\nabout the target trajectory is provided as an external input to the neurons: I = \u0393T \u00b7 (1/\u03c4vy + \u02d9y). In\nour particular case, this input implements a particular kind of MCMC sampling (Langevin). Brie\ufb02y,\nthe sampler involves stochastic dynamics driven by the gradient of log P (y), with additive Gaussian\nnoise [5] (see Suppl.Info. for implementation details). Hence, the external input is stochastic I =\n\u0393T \u00b7 (1/\u03c4vy + F (y) + \u0001), where F (y) = \u2207 log P(y), and \u0001 is D-dimensional white independent\nGaussian noise. Using our network dynamics, we can encode the MCMC trajectory with high\nprecision (Fig. 1d). Importantly, because of the distributed representation, the integration window\nof the decoder does not restrict the frequency content of the signal. The network can represent\nsignals that change faster than the membrane time constant (Fig. 1a, d).\nTo construct a viable biological implementation of this network, we need to embed the sampling\ndynamics within the circuit (\u2018self-generated\u2019 architecture in Fig. 1c). We achieved this by approxi-\nmating the current I using the decoded signal \u02c6y instead of y. This results in a second recurrent input\nto the neurons, \u02c6I = \u0393T \u00b7 (1/\u03c4v \u02c6y + F (\u02c6y) + \u0001). While this is an approximation, we found it does not\naffect sampling quality in the parameter regime when the encoding scheme itself works well (see\nexample dynamics in Fig. 1e).\n\n3\n\n\fSuch dynamics can be derived for any distribution from the broad class of product-of-(exponential-\nfamily) experts [9], with no restrictions on D; for simplicity and to ease visualisation, here we focus\non the multivariate Gaussian case and restrict the simulations to bivariate distributions (D = 2). For\na Gaussian distribution with mean \u00b5 and covariance \u03a3, the resulting membrane potential dynamics\nare linear:4\n\n\u2202V\n\u2202t\n\n= \u2212 1\n\u03c4v\n\nV \u2212 Wfasto + Wslowr + D + \u0393T\u0001\n\n(2)\n\n1\n\n\u00b7 \u0393T(cid:0)I \u2212 \u03a3\u22121(cid:1) \u0393 (e.g. NMDA currents) and the drift term D = 1\n\nwhere o denotes the spikes, r is a low-passed version of the spikes. The connections Wfast\ncorrespond to the recurrent dynamics derived above, while the slow5 connections, Wslow =\n\u0393T\u03a3\u22121\u00b5 correspond to\n\u03c4slow\nthe deterministic component of the MCMC dynamics6 and \u0001 is white independent Gaussian noise\n(implemented for instance by a small chaotic subnetwork appropriately connected to the principal\nneurons). In summary, relatively simple leaky integrate-and-\ufb01re neurons with appropriate recurrent\nconnectivity are suf\ufb01cient for implementing Langevin sampling from a Gaussian distribution in a\ndistributed code. More complex distributions will likely involve nonlinearities in the slow connec-\ntions (possibly computed in the dendrites) [10].\n\n\u03c4slow\n\nMulti-chain encoding: instantaneous representation of uncertainty\n\nwould now sample from another Gaussian, P(cid:0)x\u2217K(cid:1), with mean \u00b5\u2217K (K repetitions of \u00b5) and co-\n\nThe earliest proposal for sampling-based neural representations of uncertainty suggested distributing\nsamples either across neurons or across time [4]. Nonetheless, all realisations of neural sampling use\nthe second solution. The reason is simple: when equating the activity of individual neurons (either\nvoltage or \ufb01ring rate) to individual random variables, it is relatively straightforward to construct neu-\nral dynamics implementing MCMC sampling. It is less clear what kind of neural dynamics would\ngenerate samples in several neurons at a time. One naive solution would be to construct several net-\nworks that each sample from the same distribution in parallel. This however seems to unavoidably\nentail a \u2018copy-pasting\u2019 of all recurrent connections across different circuits, which is biologically\nunrealistic. Our distributed representation, in which neurons jointly encode the sampling trajectory,\nprovides a potential solution to this problem. In particular, it allows several chains to be embedded\nin a single network.\nTo extend the dynamics to a multi-chain scenario, we imagine an auxiliary probability distribution\nover K random variables. We want each to correspond to one chain, so we take them to be indepen-\ndent and identically distributed according to P(x). Since the sampling dynamics derived above do\nnot restrict the dimensionality of the underlying distribution, we can use them to sample from this\nD \u00d7 K-dimensional distribution instead. For the example of a multivariate normal, for instance, we\nvariance \u03a3\u2217K, a block-diagonal matrix, obtained by K repetitions of \u03a3. In general, the multi-chain\ntrajectory can be viewed as just another instance of MCMC sampling, where the encoding scheme\nguarantees that the signals across different chains remain independent. What may change, however,\nis the interpretability of neural responses in relation to the underlying encoded variable. We show\nthat under mild assumptions on the decoding matrix \u0393, the main features of single and pairwise\nresponses are preserved (see below and Suppl.Info. Sec.4).\nFig. 1f shows an example run for multi-chain sampling from a bivariate Gaussian. In a \ufb01fth of the\ntime used in the single-chain scenario (Fig. 1e), the network dynamics achieves a similar spread\nacross the state space, allowing for a quick estimation of uncertainty (see also Suppl.Info. 2). For a\ncertain precision of encoding (determined by the size of the decoding weights \u0393) and neural sparse-\nness level, N scales linearly with the dimensionality of the state space D and the number of simul-\ntaneously encoded chains K. Thus, our representation provides a convenient trade-off between the\nnetwork size and the speed of the underlying computation. When N is \ufb01xed, faster sampling re-\nquires either a penalty on precision, or increased \ufb01ring rates (N (cid:29) D). Overall, the coding scheme\nallows for a linear trade-off between speed and resources (either neurons or spikes).\n\n4Since F (x) = \u03a3\u22121 (x \u2212 \u00b5), this results in a stochastic generalisation of the dynamics in [7].\n5\u2018Slow\u2019 marks the fact that the term depends on the low-passed neural output r, rather than o.\n6Learning the connections goes beyond the scope of this paper; it seems parameter learning can be achieved\n\nusing the plasticity rules derived for the temporal code, if these are local (not shown).\n\n4\n\n\f2 Neural implications\n\nTo investigate the experimental implications of our coding scheme, we assumed the posterior distri-\nbution is centred around a stimulus-speci\ufb01c mean (a set of S = 12 values, equidistantly distributed\non a circle of radius 1 around the origin, see black dots in Fig. 3a), with a stimulus independent\ncovariance parametrizing the uncertainty about x. This kind of posterior arises e.g. as a result of\ninference in a linear Gaussian mixture (since the focus here is not on a speci\ufb01c probabilistic model\nof the circuit function, we keep the computation very basic, see Suppl. Info. for details). It allows us\nquantify the general properties of distributed sampling in terms of classic measures (tuning curves,\nFano factors, FF, cross-correlogram, CCG, and spike count correlations, rsc) and how these change\nwith uncertainty. Since we found that, under mild assumptions for the decoding matrix \u0393, the results\nare qualitatively similar in a single vs. a multi-chain scenario (see Suppl. Info.), and to facilitate the\nexplanation, the results reported in the main text used K = 1.\n\nFigure 2: Our model recapitulates several known features of cortical responses. a. Mean \ufb01ring rates\nas a function of stimulus, for all neurons (N = 37); color re\ufb02ects the phase of \u0393i (right). b. The\nnetwork is in an asynchronous state. Left: example spike raster. Right: Fano factor distribution. c.\nWithin-trial correlations in membrane potential for pairs of neurons as a function of the similarity\nof their decoding weights. d. Spike count correlations (averaged across stimuli) as a function of\nthe neurons\u2019 tuning similarity. Right: distribution of rsc, with mean in magenta. e We use cross-\ncorrelograms (CCG) to asses spike synchrony. Left: CCG for an example neuron. Middle: Area\nunder the peak \u00b110ms (between the dashed vertical bars) for all neuron pairs for 3 example stimuli;\nneurons ordered by \u0393i phase. Right: the area under CCG peak as a function of tuning similarity.\n\na. The neural dynamics are consistent with a wide range of experimental observations\nFirst, we measured the mean \ufb01ring rate of the neurons for each stimulus (averaged across 50 tri-\nals, each 1s long). We found that individual neurons show selectivity to stimulus orientations, with\nbell-shaped tuning curves, reminiscent of e.g. the orientation-tuning of V1 neurons (Fig. 2a). The in-\nhomogeneity in the scale of the responses across the population is a re\ufb02ection of the inhomogeneities\nin the decoding matrix \u0393.7\nNeural responses were asynchronous, with irregular \ufb01ring (Fig. 2b), consistent with experimental\nobservations [11, 12]. To quantify neural variability, we estimated the Fano factors, measured as the\n/\u00b5fi. We\nratio between the variance and the mean of the spike counts in different trials, F Fi = \u03c32\nfi\n\n7The phase of the decoding weights was sampled uniformly around the circle, with an amplitude drawn\n\nuniformly from the interval [0.005; 0.025].\n\n5\n\n\ffound that the Fano factor distribution was centered around 1, a signature of Poisson variability. This\nobservation suggests that the sampling dynamics preserve the main features of the distributed code\ndescribed in Ref. [6]. Unlike the basic model, however, here neural variability arises both because\nof indeterminacies, due to distributed coding, and because of \u2018true\u2019 stochasticity, owed to sampling.\nThe contribution of the latter, which is characteristic of our version, will depend on the underlying\ndistribution represented: when the distribution is highly peaked, the deterministic component of the\nMCMC dynamics dominates, while the noise plays an increasingly important role the broader the\ndistribution.\nAt the level of the membrane potential, both sources of variability introduce correlations between\nneurons with similar tuning (Fig. 2c), as seen experimentally [13]: the \ufb01rst because the reconstruc-\ntion error acts as a shared latent cause, the second because the stochastic component \u2013which was\nindependent in the y space\u2013 is mapped through \u0393T in a distributed representation (see Eq. 2). While\nthe membrane correlations introduced by the \ufb01rst disappear at the level of the spikes [7], the addition\nof the stochastic component turns out to have important consequences for the spike correlations both\non the fast time scale, measured by CCG, and for the across-trial spike count covariability, measured\nby the noise correlations, rsc.\nFig. 2e shows the CCG of an example pair of neurons, with similar tuning; their activity synchro-\nnizes on the time scale of few milliseconds. In more detail, our CCG measure was normalised by\n\ufb01rst computing the raw cross-correlogram (averaged across trials) and then subtracting a baseline\nobtained as the CCG of shuf\ufb02ed data, where the responses of each neuron come from a different\ntrial. The raw cross-correlogram for a time delay, \u03c4, CCG(\u03c4 ) was computed as the Pearsons corre-\nlation of the neural responses, shifted in time time by \u03c4.8 At the level of the population, the amount\nof synchrony (measured as the area under the CCG peak \u00b110ms) was strongly modulated by the\ninput (Fig. 2e, middle), with synchrony most prominent in pairs of neurons that aligned with the\nstimulus (not shown). This is consistent with the idea that synchrony is stimulus-speci\ufb01c [14, 15].\nWe also measured spike count correlation (the Pearsons correlation coef\ufb01cient of spike counts\nrecorded in different trials for the same stimulus) and found they depend on the selectivity of the\nneurons, with positive correlations for pairs of neurons with similar tuning (Fig. 2d), as seen in ex-\nperiments [16]. The overall distribution was broad, with a small positive mean (Fig. 2d), as in recent\nreports [11, 12]. Taken together, these results suggest that our model qualitatively recapitulates the\nbasic features of cortical neural responses.\nb. Uncertainty modulates neural variability and covariability\nWe have seen that sampling introduces spike correlations, not seen when encoding a deterministic\ndynamical system [7]. Since stochasticity seems to be key for these effects, this suggests uncer-\ntainty should signi\ufb01cantly modulate pairwise correlations. To con\ufb01rm this prediction, we varied the\ncovariance structure of the underlying distribution for the same circuit (Fig. 3a; the low variance con-\ndition corresponds to baseline measures reported above) and repeated all previous measurements.\nWe found that changes in uncertainty leave neuronal tuning invariant (Fig. 3b, not surprisingly since\nthe mean \ufb01ring rates re\ufb02ect the posterior mean). Nonetheless, increasing uncertainty had signi\ufb01-\ncant effects on neural variability and co-variability. Fano factors increased for broader distributions\n(Fig. 3b), congruent with the common observation of the stimulus quenching response variability\nin experiments [17]. Second, we found a slower component in the CCG, which increased with\nuncertainty (Fig. 3e), as in the data [15]. Lastly, the dependence of different spike correlation mea-\nsures on neural co-tuning increased with uncertainty (Fig. 3c, d). In particular, neurons with similar\nstimulus preferences increased their synchrony and spike-count correlations with increasing uncer-\ntainty, consistent with the stimulus quenching response co-variability in neural data and increases in\ncorrelations at low contrast [17, 16].\nAlthough we see a signi\ufb01cant modulation of (co-)variability with changes in uncentainty, these mea-\nsures provide limited information about the underlying distribution represented in the network. They\ncan be used to detect changes in the overall spread of the distribution, i.e. the high vs. low-variance\ncondition look different at the level of pairwise neural responses. However, they cannot discriminate\nbetween distributions with similar spread, but very different dependency structure, e.g. between the\ncorrelated and anti-correlated condition (Fig. 3d, f; also true for FF and the slow component of the\nCCG, not shown). For this, we need to look at the population level.\n\n8While this is not the most common expression for the CCG; we found it reliably detects synchronous \ufb01ring\n\nacross neurons; spikes discretised in 2ms bins.\n\n6\n\n\fFigure 3: The effects of uncertainty on neural responses. a. Overview of different experimen-\ntal conditions, posterior mean centred on different stimuli (black dots) with stimulus independent\ncovariance shown for 4 conditions. b. Left: Tuning curves for an example neuron, for different con-\nditions. Right: \ufb01ring rate in the low variance vs. all other conditions, summary across all neurons;\ndots correspond to different neuron-stimulus pairs. c. Fano factor distribution for high-variance\ncondition (compare Fig.2b). d. Area under CCG peak \u00b110ms as a function of the tuning similarity\nof the neurons, for different uncertainty conditions (colours as in b). e. Complete CCG, averaged\nacross 10 neurons with similar tuning while sampling from independent bivariate Gaussians with\ndifferent s.d. (0.1 for \u2018high variance\u2019). f. Spike count correlations (averaged across stimuli) as a\nfunction of the tuning similarity of the neurons, for different uncertainty conditions.\n\nc. Decoding can be used to assess neural representations of uncertainty\nSince in a distributed representation single-neuron or pairwise measures tell us little about the de-\npendency structure of the represented random variables, alternative methods need to be devised for\ninvestigating the underlying computation performed by the circuit. The representational framework\nproposed here suggests that linear decoding may be used for this purpose. In particular, we can\nrecord neural responses for a variety of stimuli and reverse-engineer the map between spikes and\nthe relevant latent variables (or, if the assumed generative model is linear as here, the stimuli them-\nselves). We can use the low-variance condition to get a reasonable estimate of the decoding matrix,\n\u02c6\u0393 (since the underlying sampling dynamics are close to the posterior mean) and then use the de-\ncoder for visualising the trajectory of the network while varying uncertainty. As an illustration, we\nuse simple linear regression of the stimuli s as a function of the neuron \ufb01ring rates, scaled by \u03c4v.9\n\n9This requires knowledge of \u03c4v and, in a multi-chain scenario, a grouping of neural responses by chain\n\npreference. Proxies for which neurons should be decoded together are discussed in Suppl.Info. Sec.4.\n\nFigure 4: A decoding approach to study the encoding of uncertainty. a. In a low-variability condition\nwe record neural responses for several repetitions of different stimuli (black dots); We estimated the\ndecoding matrix by linear regression and used it to project the activity of the population in individual\ntrials. b. The decoder captures well the underlying dynamics in a trial; ground-truth in black. c.\nThe same decoder \u02c6\u0393 can be used to visualise the structure of the underlying distribution in other\nconditions. Note the method is robust to a misalignment in initial conditions (red trace).\n\n7\n\nstimuliexperimental setupS stimuli(repeated trials)neuron12345true trajectoryestimateestimatesame condition (lowVar)across condition(highVar)estimateacross condition (Corr)abc\fAlthough the recovered decoding weights are imperfect and the initial conditions unknown, the pro-\njections of the neural responses in single trials along \u02c6\u0393 captures the main features of the underlying\nsampler, both in the low-variance and in other conditions (Fig. 4b, c).\n\n3 Discussion\n\nHow populations of neurons encode probability distributions in a central question for Bayesian ap-\nproaches to understanding neural computation. While previous work has shown that spiking neural\nnetworks could represent a probability over single real-valued variables [18], or the joint probability\nof many binary random variables [19], the representation of complex multi-dimensional real-valued\ndistributions10 remains less clear [1, 2]. Here we have proposed a new spatio-temporal code for\nrepresenting such distributions quickly and \ufb02exibly. Our model relies on network dynamics which\napproximate the target distribution by several MCMC chains, encoded in the spiking neural activity\nsuch that the samples can be linearly decoded from the quasi-instantaneous neural responses. Un-\nlike previous sampling-based codes [19], our model does not require a one-to-one correspondence\nbetween random variables and neurons. This separation between computation and representation is\ncritical for the increased speed, as it allows multiple chains to be realistically embedded in the same\ncircuit, while preserving all the computational bene\ufb01ts of sampling. Furthermore, it makes the en-\ncoding robust to neural damage, which seems important when representing behaviourally-relevant\nvariables, e.g. in higher cortical areas. These bene\ufb01ts come at the cost of a linear increase in the\nnumber of neurons with K, providing a convenient trade-off between speed and neural resources.\nThe speedup due to increases in network size is orthogonal to potential improvements in sampling\nef\ufb01ciency achieved by more sophisticated MCMC dynamics, e.g. relying on oscillations [21] or non-\nnormal stochastic dynamics [22], suggesting that distributed sampling could be made even faster by\ncombining the two approaches.\nThe distributed coding scheme has important consequences for interpreting neural responses: since\nknowledge about the underlying distribution is spread across the population, the activity of single\ncells does not re\ufb02ect the underlying computation in any obvious way. In particular, although the\nnetwork did reproduce various properties of single neuron and pairs of neuron responses seen exper-\nimentally, we found that their modulation with uncertainty provides relatively limited information\nabout the underlying probabilistic computation. Changes in the overall spread (entropy) of the pos-\nterior are re\ufb02ected in changes in variability (Fano factors) and covariability (synchrony on the ms\ntimescale and spike-count correlations across trials) of neural responses across the population, as\nseen in the data. Since these features arise due to the interaction between sampling and distributed\ncoding, the model further predicts that the degree of correlations between a pair of neurons should\ndepend on their functional similarity, and that the degree of this modulation should be affected by\nuncertainty. Nonetheless, the distributed representation occludes the structure of the underlying\ndistribution (e.g. correlations between random variables), something which would have been imme-\ndiately apparent in a one-to-one sampling code.\nOur results reinforce the idea that population, rather than single-cell, responses are key to under-\nstanding cortical computation, and points to linear decoding as a potential analysis tool for inves-\ntigating probabilistic computation in a distributed code. In particular, we have shown that we can\ntrain a linear decoder on spiking data and use it to reveal the underlying sampling dynamics in dif-\nferent conditions. While ours is a simple toy example, where we assume that we can record from\nall the neurons in the population, the fact that the signal is low-dimensional relative to the number\nof neurons gives hope that it should be possible to adapt more sophisticated machine learning tech-\nniques [23] for decoding the underlying trajectory traced by a neural circuit in realistic settings. If\nthis could be done reliability on data, then the analysis of probabilistic neural computation would\nno longer be restricted to regions for which we have good ideas about the mathematical form of the\nunderlying distribution, but could be applied to any cortical circuit of interest.11 Thus, our coding\nscheme opens exciting avenues for multiunit data analysis.\nAcknowledgements\nThis work was supported by ANR-10-LABX-0087 IEC, ANR-10-IDEX-0001-02 PSL, ERC grant\nFP7-PREDISPIKE and the James McDonnell Foundation Award - Human Cognition.\n\n10Such distribution arise in many models of probabilistic inference in the brain, e.g. [20].\n11The critical requirement is to know (some of) the variables represented in the circuit, up to a linear map.\n\n8\n\n\fReferences\n[1] Fiser, J., Berkes, P., Orb\u00b4an, G. & Lengyel, M. Statistically optimal perception and learning:\n\nfrom behavior to neural representations. Trends in Cognitive Sciences 14, 119\u2013130 (2010).\n\n[2] Pouget, A., Beck, J.M., Ma, W.J. & Latham, P.E. Probabilistic brains: knowns and unknowns.\n\nNature Neuroscience 16, 1170\u20131178 (2013).\n\n[3] Pouget, A., Zhang, K., Deneve, S. & Latham, P.E. Statistically ef\ufb01cient estimation using\n\npopulation coding. Neural computation 10, 373\u2013401 (1998).\n\n[4] Hoyer, P.O. & Hyvarinen, A. Interpreting neural response variability as Monte Carlo sampling\n\nof the posterior. Advances in neural information processing systems, 293\u2013300 (2003).\n\n[5] Neal, R. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 54,\n\n113\u2013162 (2010).\n\n[6] Boerlin, M. & Deneve, S. Spike-based population coding and working memory. PLoS Com-\n\nputational Biology 7, e1001080 (2011).\n\n[7] Boerlin, M., Machens, C.K. & Den`eve, S. Predictive coding of dynamical variables in balanced\n\nspiking networks. PLoS Computational Biology (2013).\n\n[8] Bourdoukan, R., Barrett, D., Machens, C. & Deneve, S. Learning optimal spike-based repre-\n\nsentations. Advances in neural information processing systems, 2294\u20132302 (2012).\n\n[9] Hinton, G.E. Training products of experts by minimizing contrastive divergence. Neural\n\ncomputation 14, 1771\u20131800 (2002).\n\n[10] Savin, C., Dayan, P. & Lengyel, M. Correlations strike back (again): the case of associative\nmemory retrieval. in Advances in Neural Information Processing Systems 26 (eds. Burges, C.,\nBottou, L., Welling, M., Ghahramani, Z. & Weinberger, K.) 288\u2013296 (2013).\n\n[11] Renart, A. et al. The asynchronous state in cortical circuits. Science 327, 587\u2013590 (2010).\n[12] Ecker, A.S. et al. Decorrelated neuronal \ufb01ring in cortical microcircuits. Science 327, 584\u2013587\n\n(2010).\n\n[13] Yu, J. & Ferster, D. Functional coupling from simple to complex cells in the visually driven\n\ncortical circuit. Journal of Neuroscience 33, 18855\u201318866 (2013).\n\n[14] Ohiorhenuan, I.E. et al. Sparse coding and high-order correlations in \ufb01ne-scale cortical net-\n\nworks. Nature 466, 617\u2013621 (2010).\n\n[15] Kohn, A. & Smith, M.A. Stimulus dependence of neuronal correlation in primary visual cortex\n\nof the macaque. Journal of Neuroscience 25, 3661\u20133673 (2005).\n\n[16] Smith, M.A. & Kohn, A. Spatial and temporal scales of neuronal correlation in primary visual\n\ncortex. Journal of Neuroscience 28, 12591\u201312603 (2008).\n\n[17] Churchland, M.M. et al. Stimulus onset quenches neural variability: a widespread cortical\n\nphenomenon. Nature Neuroscience 13, 369\u2013378 (2010).\n\n[18] Zemel, R.S., Dayan, P. & Pouget, A. Probabilistic interpretation of population codes. Neural\n\ncomputation 10, 403\u2013430 (1998).\n\n[19] Buesing, L., Bill, J., Nessler, B. & Maass, W. Neural dynamics as sampling: A model for\nstochastic computation in recurrent networks of spiking neurons. PLoS Computational Biology\n7, e1002211 (2011).\n\n[20] Karklin, Y. & Lewicki, M. A hierarchical bayesian model for learning nonlinear statistical\n\nregularities in nonstationary natural signals. Neural computation 17, 397\u2013423 (2005).\n\n[21] Savin, C., Dayan, P. & Lengyel, M. Optimal recall from bounded metaplastic synapses: pre-\ndicting functional adaptations in hippocampal area CA3. PLoS Computational Biology 10,\ne1003489 (2014).\n\n[22] Hennequin, G., Aitchison, L. & Lengyel, M. Fast sampling for Bayesian inference in neural\n\ncircuits. arXiv preprint arXiv:1404.3521 (2014).\n\n[23] Macke, J.H. et al. Empirical models of spiking in neural populations. Advances in neural\n\ninformation processing systems 24, 1350\u20131358 (2011).\n\n9\n\n\f", "award": [], "sourceid": 1096, "authors": [{"given_name": "Cristina", "family_name": "Savin", "institution": "University of Cambridge"}, {"given_name": "Sophie", "family_name": "Den\u00e8ve", "institution": "\u00c9cole Normale Sup\u00e9rieure"}]}