{"title": "Flexible information routing in neural populations through stochastic comodulation", "book": "Advances in Neural Information Processing Systems", "page_first": 14402, "page_last": 14411, "abstract": "Humans and animals are capable of flexibly switching between a multitude of tasks, each requiring rapid, sensory-informed decision making. Incoming stimuli are processed by a hierarchy of neural circuits consisting of millions of neurons with diverse feature selectivity. At any given moment, only a small subset of these carry task-relevant information. \nIn principle, downstream processing stages could identify the relevant neurons through supervised learning, but this would require many example trials. Such extensive learning periods are inconsistent with the observed flexibility of humans or animals, who can adjust to changes in task parameters or structure almost immediately. \nHere, we propose a novel solution based on functionally-targeted stochastic modulation. It has been observed that trial-to-trial neural activity is modulated by a shared, low-dimensional, stochastic signal that introduces task-irrelevant noise. Counter-intuitively this noise is preferentially targeted towards task-informative neurons, corrupting the encoded signal. However, we hypothesize that this modulation offers a solution to the identification problem, labeling task-informative neurons so as to facilitate decoding. We simulate an encoding population of spiking neurons whose rates are modulated by a shared stochastic signal, and show that a linear decoder with readout weights approximating neuron-specific modulation strength can achieve near-optimal accuracy. Such a decoder allows fast and flexible task-dependent information routing without relying on hardwired knowledge of the task-informative neurons (as in maximum likelihood) or unrealistically many supervised training trials (as in regression).", "full_text": "Flexible information routing in neural populations\n\nthrough stochastic comodulation\n\nCaroline Haimerl\n\nCenter for Neural Science\n\nNew York University\nch2880@nyu.edu\n\nCristina Savin\n\nCenter for Neural Science\nCenter for Data Science\nNew York University\ncsavin@nyu.edu\n\nAbstract\n\nEero P. Simoncelli\n\nCenter for Neural Science, and\nHoward Hughes Medical Institute\n\nNew York University\n\neero.simoncelli@nyu.edu\n\nHumans and animals are capable of \ufb02exibly switching between a multitude of\ntasks, each requiring rapid, sensory-informed decision making. Incoming stimuli\nare processed by a hierarchy of neural circuits consisting of millions of neurons\nwith diverse feature selectivity. At any given moment, only a small subset of these\ncarry task-relevant information. In principle, downstream processing stages could\nidentify the relevant neurons through supervised learning, but this would require\nmany training trials. Such extensive learning periods are inconsistent with the\nobserved \ufb02exibility of humans or animals, both of whom can adjust to changes in\ntask parameters or structure almost immediately. Here, we propose a novel solution\nbased on functionally-targeted stochastic modulation. It has been observed that trial-\nto-trial neural activity is modulated by a shared, low-dimensional, stochastic signal\nthat introduces task-irrelevant noise. Counter-intuitively, this noise appears to be\npreferentially targeted towards task-informative neurons, corrupting the encoded\nsignal. We hypothesize that this modulation offers a solution to the identi\ufb01cation\nproblem, labeling task-informative neurons so as to facilitate decoding. We simulate\nan encoding population of spiking neurons whose rates are modulated by a shared\nstochastic signal, and show that a linear decoder with readout weights estimated\nfrom neuron-speci\ufb01c modulation strength can achieve near-optimal accuracy. Such\na decoder allows fast and \ufb02exible task-dependent information routing without\nrelying on hardwired knowledge of the task-informative neurons (as in maximum\nlikelihood) or unrealistically many supervised training trials (as in regression).\n\n1\n\nIntroduction\n\nOur survival depends on the actions we take, which are derived from internal states and sensory\ninput. Accurate decisions require reliable encoding and \ufb02exible task-speci\ufb01c decoding of sensory\ninformation. Take for instance the perceptual task of detecting a change in orientation of a grating\nwithin a small aperture, placed at a particular location in the visual \ufb01eld (Fig. 1). Neurons in primary\nvisual cortex (V1) that respond selectively to features at different spatial locations and orientations\nencode the visual stimulus. However, only a small fraction of those neurons would show a change\nin response when the grating changes orientation (Fig. 1, red); the overwhelming majority will not\nrespond at all or their responses would not change signi\ufb01cantly (Fig. 1, gray). Since nearly all visual\ninformation passes through V1, any downstream areas\u2019 sole source of information is contained in\nthe responses of those few V1 cells. Thus, solving this task relies on the ability to properly gather\nand combine the responses of these task-relevant neurons, while ignoring the background chatter\nof activity emanating from the remainder of the population. Furthermore, if the task changes (e.g.,\ndue to a change in stimulus position or orientation), the informative sub-population within V1 will\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fchange, and downstream areas will need to modify their processing accordingly. The means by which\nthe brain can achieve such dynamic task-dependent routing of information is a mystery.\nThe readout of sensory information in neural responses is often explored using statistically optimal\ndecoders derived from speci\ufb01c encoding models. While these decoders can provide an upper bound\non performance [1, 2, 3, 4, 5, 6, 7, 8], they should not be interpreted as models for biological decoding,\nsince they generally rely on full knowledge of the stimulus response and noise properties of neurons.\nIt seems inconceivable that upstream decoding circuits could have access to, or store, such detailed\ninformation. An alternative possibility is that the decoder is learned from experience. This requires\nextensive training on the discrimination task, accompanied by feedback regarding the success or\nfailure on each trial. The need for many trials, with feedback, seems inconsistent with the observed\nbehavioral \ufb02exibility of animals or humans, both of whom can rapidly adjust to changes in task\nconditions [9].\nHere we propose a novel framework for biologically plausible, \ufb02exible decoding, inspired by recent\nresults on task-dependent noise properties of neural populations in the visual system. Neural noise\nlimits the amount of stimulus information that a neural population can encode [10, 1] and is commonly\nmodeled with a Poisson process. However, neurons seem to share sources of multiplicative trial-to-\ntrial variability, or correlated noise, suggesting that additional time-varying modulators in\ufb02uence the\nresponse of neurons [11]. Theoretical work indicates that such correlated noise can be detrimental\nfor population encoding, as it cannot be averaged out [7, 12]. Importantly, in some experiments this\nnoise seems to be speci\ufb01cally targeted to neurons that are informative for the task, which further\nexacerbates the detrimental effects on encoding. Speci\ufb01cally, V4 neurons have been shown to share a\ncommon source of noise-modulation, which affects neurons that are informative for the task more\nstrongly [13]. Similarly, V1 noise correlation structure is better explained by task-informativeness\nthan by stimulus tuning properties, suggesting that the source of these correlations is top-down (as\nopposed to stimulus-driven) [14].\nThe mechanisms underlying this modulation remain unclear but the observed task-speci\ufb01c structure\nhas functional implications. From an encoding perspective, it is counterintuitive that the system would\ncorrupt the responses of task-informative neurons. However, we suggest that this noisy task-irrelevant\nmodulator plays a key role in solving the mystery of decoding. Speci\ufb01cally, we propose that the\nmodulatory \ufb02uctuations serve as a label for the task-relevant neurons, helping the decoder to select\nthese neurons for readout. Speci\ufb01cally, we posit that the decoder makes use of the modulator itself (or\nthe modulator-induced covariability) when assigning appropriate decoding weights to each neuron.\nWe construct such a modulator-guided decoder, and show through simulations that moderate levels\nof task-speci\ufb01c stochastic modulation of an encoding population can lead to a substantial overall\nbene\ufb01t in decoding accuracy, while keeping the assumed knowledge about the encoding population at\na biologically plausible level. Thus, structured noise may be an essential feature of brain computation,\nwhich could guide AI algorithms to overcome an essential gap to human behavioral performance.\n\n2 Encoding/decoding models\n\nTo test our hypothesis, we simulate encoding in a population of stimulus-selective, noise-modulated\nPoisson neurons [13] and compare statistically optimal ideal observer decoders, that have full\nknowledge of the stimulus-selectivity and modulatory structure of the encoding population, with\nbiologically plausible decoders, that must operate with limited knowledge of the encoding population.\n\nEncoding model: Poisson spiking population with task-targeted modulation\n\nThe variability in spike count response kt over repeated presentations t of a stimulus s re\ufb02ects\nthe stochastic nature of neural spiking, commonly modeled using a Poisson point process with\nstimulus-dependent \ufb01ring rate \u03bb(s). We account for supra-Poisson variability in neural responses,\nby introducing additional sources of stochasticity [11, 15, 16]. Speci\ufb01cally, the stimulus-driven rate\nof neuron n is dynamically modulated by a time-varying signal mt [13], which leads to a doubly\nstochastic spiking process:\n\n(1)\nwhere g(\u00b7) is a positive-valued link function, here an exponential, to guarantee a positive \ufb01ring rate.\n\nknt(s, mt) \u223c Poiss (\u03bbn(s)g(mt)) ,\n\n2\n\n\fFigure 1: Encoding model. A. The encoding population consists of stimulus-tuned Poisson spiking neurons.\nShared stochastic modulation (green) targets preferentially task-informative neurons and acts as a multiplicative\ngain. The decoder uses the modulatory signal to identify the task-informative neurons, and combines their\nresponses to arrive at a decision. B. Stimulus selectivity of the population, qualitatively matched to experimental\ndata. Neurons fall into three categories, based on their mean response to each of the two stimuli in the\ndiscrimination task. Neurons that respond differentially to the two stimuli are informative (red). Neurons with\nsubstantial but nearly equal responses to both stimuli are uninformative (black). The remaining neurons are\ninactive (and thus also uninformative), showing weak responses to both stimuli (gray).\n\nWe simulate a binary discrimination task (i.e., discriminate s = 0 from s = 1) similar to the change-\ndetection task used in [17]. Empirical observations in macaque area V4 show that the modulatory\nsignal mt is low-dimensional, shared across the neural population, and selectively targets neurons in\nproportion to their task-informativeness [13]. To capture these effects, we assume a one-dimensional\nmodulator and introduce neuron-speci\ufb01c modulation weights, wn, that are proportional to the nth\nneuron\u2019s ability to discriminate the two stimuli. Overall modulation strength in the population is\ndetermined by the modulator variance (var(mtwn) = \u03c32\n\nn - see also [18]).\n\nknt(s, mt) \u223c Poiss (\u03bbn(s) exp(wnmt)) .\n\n(2)\nFollowing a previous encoding model [13], we assume i.i.d. zero-mean Gaussian noise and variance\nm for mt. Given the exponential nonlinearity, the modulatory factor causes an increase in spike\n\u03c32\ncount by exp\n. To remove trivial bene\ufb01ts of the modulator due to an increase in \ufb01ring rates,\nwe correct for this expected increase by normalizing the \ufb01ring rates in the encoding model:\n\n(cid:16) \u03c32\n\nmw2\n2\n\nmw2\n\n(cid:17)\n\nn\n\n(cid:18)\n\n(cid:18)\n\n(cid:19)(cid:19)\n\nknt(s, mt) \u223c Poiss\n\n\u03bbn(s) exp\n\nwnmt \u2212 \u03c32\n\nmw2\nn\n2\n\n.\n\n(3)\n\nStatistically optimal \u201cideal observer\u201d decoders\n\nGiven the modulated Poisson encoding model, an ideal observer with complete knowledge of both\nstimulus response properties {\u03bbn(s)} and modulation {wn, mt} provides an upper-bound on task-\ndecision accuracy. It operates by comparing the probability of the two stimuli under the full model\n(equivalently, by examining the sign of their log odds). For our modulated Poisson encoding model\n(see Eq. 3), this reduces to comparing a weighted linear combination of the observed neural spike\ncounts against a time-varying threshold that is a function of the modulator (see derivation in Suppl.\nInfo. S1). We refer to this as the modulator-conditioned maximum likelihood (MC-ML) decoder1:\n(4)\n\n(cid:88)\n\nknt > c(MC)\n\n,\n\na(MC)\nn\n\nt\n\nn\n\nwith weights:\n\nand time-varying threshold:\n\na(MC)\nn\n\n= \u2212(cid:88)\n\nc(MC)\nt\n\n= log(\u03bbn(1)) \u2212 log(\u03bbn(0)),\n\nexp(mtwn) [\u03bbn(1) \u2212 \u03bbn(0)] ,\n\n(5)\n\n(6)\n\n1For brevity, \u2018decoder\u2019 refers to both the stimulus readout, and its corresponding optimal discriminator.\n\nn\n\n3\n\nABresponse to stimulus 0 (Hz)response to stimulus 1 (Hz)inactiveuninformativeinformative00101055vsencodedecode**decoder\fwhere \u03bbn(s) denotes the mean response of the n-th neuron to stimulus s when mt = 0.\nThe MC-ML decoder provides an upper bound on achievable performance, and relies on perfect\nknowledge of the modulator mt, the stimulus selectivity of the neurons, \u03bbn(s), and the coupling\nweights wn. We can relax these requirements, by assuming that the modulator is unknown, and only\nthe modulator-marginalized stimulus selectivity of the cells is available (i.e., the stimulus response\naveraged over possible modulators - see Suppl. Info. S1). We refer to this solution as the modulator-\nmarginalized maximum likelihood (MM-ML) decoder. Due to the particularities of the Poisson noise\nmodel, this second decoder also computes a weighted sum over responses:\n\nBut it compares this weighted sum to a \ufb01xed threshold:\n\na(MM)\nn\n\nc(MM) = \u2212(cid:88)\n\n= log(\u03bb \u2217\n\nn (1)) \u2212 log(\u03bb \u2217\n\nn (0)).\n\nn (1) \u2212 \u03bb \u2217\n[\u03bb \u2217\n\nn (0)] ,\n\n(7)\n\n(8)\n\nn\n\nwhere \u03bb \u2217\nn (s) is the mean response of the nth neuron averaged (marginalized) over possible modulator\nvalues. For the encoding model in Eq. (3), \u03bb \u2217\nn (s) = \u03bbn(s), which means that the decoding weights\nare the same as those used in the MC-ML decoder (i.e., a(MM)\n). Hence, in the case of\na binary discrimination task, the MM-ML decoder is able to achieve an unbiased estimate of the\ndecoding weights from the stimulus responses, without knowing the modulator. However, it does\nlead to systematic time-dependent biases in the decoder threshold and therefore to biased decisions.\n\n= a(M C)\n\nn\n\nn\n\nBiologically plausible decoders\n\nThe MC-ML and MM-ML decoders are not plausible as a description of decoding in the brain,\nbut they do provide a useful yardstick against which to compare the performance of more realistic\ndecoders. They also motivate the use of a linear-threshold functional form for the solution. We now\nseek decoders of this form, that satisfy three criteria: (1) they are biologically plausible, in that they\ndo not rely on detailed knowledge about the encoding population (neither the stimulus responses, nor\nthe modulation weights), (2) they are behaviorally plausible, in that they have the ability to ef\ufb01ciently\nadapt to changes in task structure, so as to re\ufb02ect the \ufb02exibility seen in monkey behavior [17, 9], and\n(3) they achieve accuracy approaching that of the optimal decoders.\nWe start with the simplest decoder, motivated by early work on neural binary discrimination/detection\n[1], assuming minimal knowledge of the encoding population, in line with our \ufb01rst criterion. The\nidea is to average the response of two sub-populations (\u201cpreferred\u201d and \u201canti-preferred\u201d) and then\ncompare these averages. Hence, the problem of learning decoding weights is reduced to choosing\nwhich population each neuron is assigned to; this is mathematically equivalent to determining the\nsigns of a weight vector containing values \u00b11. For this reason, we refer to this model as the sign-only\n(SO) decoder. The signs are optimally estimated by comparing the mean responses to the two stimuli.\nThis solution is agnostic to the details of the encoding model.\nIn order for this decoders to satisfy our second criterion \u2013 decoding \ufb02exibility \u2013 we need to estimate\nthe signs given few trials. Indeed we see that classi\ufb01cation into the two signed groups reaches high\n(90%) accuracy with only a few tens of trials, assuming low to moderate modulator strength (see\nFig. 2A). If all neurons in a population were informative, learning the signs would provide an accurate\nreadout of task information and the SO decoder would successfully ful\ufb01ll also the last criterion\n(decoding accuracy). However, neural populations are diverse, and would generally be expected to\ninclude many uninformative neurons [19, 7]. The exact percentage depends on the neural population\nand behavioral task. We assess this parameter in detail in the section on decoder accuracy. An\nillustration of such an encoding population is given in Fig. 1 B which shows average responses of\nsimulated neurons with diverse stimulus tuning features to two task-speci\ufb01c stimuli. Only a small\nfraction of neurons are responsive, while the large majority of neurons respond weakly (\u201cinactive\u201d).\nIf the noise from these inactive neurons is not excluded by the decoder, it could still corrupt the\nsignal [1]. We assessed decoding performance (% accurately discriminated stimuli) as a function of\nthe number of inactive neurons (Fig. 2B). The SO decoder includes inactive neurons and assigns them\nto one decoding group or the other based on noise alone. Even though the individual noise of each\ninactive neuron is small by de\ufb01nition, together their task-irrelevant response eventually dominates\nthe relevant stimulus signal (see Fig. 2B). In order to discount the inactive neurons, they should\n\n4\n\n\fFigure 2: Accuracy of sign estimation, for simulated data. A. Mean % correctly attributed signs for informative\nneurons as a function of number of training trials with varying relative modulator strength (percentage of spike\ncount variance of the informative neurons accounted for by the modulator). Decoding signs are learned within a\nfew tens of trials. B. Mean performance of RG and SO decoders as the number of inactive neurons is increased.\nThe RG decoder downweights inactive neurons, thus allowing it to maintain better performance than the SO\ndecoder.\n\nbe assigned decoding weights with smaller amplitudes. The limited knowledge constraint, would\nhowever mean that these weights cannot be assumed to be known, but must be learned/adapted based\non information readily available to upstream circuits.\nSince informative neurons necessarily have to show activity during a task, one simple heuristic rule is\nto set decoding weights proportional to the mean spike count of their associated neurons:\n\nknt.\n\n(9)\n\n(cid:88)\n\nt\n\n|a(RG)\n\nn\n\n| \u221d 1\nT\n\nFor this decoder, the sign of the weights must again be learned (as for the SO decoder). The time-\ninvariant threshold is set optimally. This rate-guided (RG) decoder improves decoding accuracy over\nthe SO decoder by excluding neurons that do not respond to the stimuli (Fig. 1B, grey points). Fig. 2B\nshows that while the SO decoder\u2019s performance drops to chance level with increasing numbers of\ninactive neurons, the RG decoder is much less affected. However, the RG decoder is still far from\noptimal. In particular, it cannot exclude neurons that are active, but respond similarly to both stimuli\n(and are thus uninformative - Fig. 1B, black points).\nThe modulator could deliver this missing differentiation through its task-speci\ufb01c targeting structure.\nHere we propose a simple local rule for learning the modulation weights, by taking the inner product\nof spike counts and the modulator:\n\nmtknt\n\n(10)\n\nThis modulator guided (MG) decoder satis\ufb01es the \ufb01rst criterion (biological plausibility), as it only\nassumes knowledge about the low-dimensional modulator, and the second (\ufb02exibility), as it learns\nabsolute decoding weights on the fast modulator time scale, instead of the slow time scale dictated by\nthe task-feedback arising from each trial.\nOur heuristic learning rule results in estimates of the form (see Suppl. Info. S2):\n\n= \u03bbn\u03c32\n\nmwn,\n\n(11)\n\n\u22121\nn |a(MC)\n\nwhich scale with the average response of neuron n across stimuli, \u03bbn, and the modulator variance,\nm. For this to be an unbiased estimate of the optimal decoding weights, we need the modulation\n\u03c32\n|. This additional assumption for the encoding model will not\nstrength to scale as wn = \u03bb\naffect the optimal decoding weights, but will change the expression for the optimal threshold (see\nEq.6). We use this bias-corrected encoder here. Empirically, we have found that the positive effects\nof modulation on decoding remain, even in the absence of de-biasing.\nNote that the above expression only provides the magnitude of the weights. The corresponding signs\nmust be separately estimated, as for the SO decoder. For simplicity we assume that the MG threshold\n\nn\n\n5\n\n(cid:88)\n\nt\n\n|a(MG)\n\nn\n\n| =\n\n1\nT\n\nE(cid:104)|a(MG)\n\n|(cid:105)\n\nn\n\n020406080100training (trials)60708090100strength(%)0correctly classified signs (%)modulator1AB4550556065accuracy (%)050001000015000sign-onlynumber inactive neuronsrate-guided\fStimulus response knowledge\nDecoder\n\u03bbn(s) (modulator-conditioned) mt, wn\nMC-ML\nMM-ML \u03bb \u2217\nn (s) (modulator-marginalized)\n\u03c3m, wn\nnone\nMG\nmt\nnone\nnone\nRG\nSO\nnone\nnone\n\n2N+N+T\n2N+N+1\nT\n0\n0\n\nModulation knowledge Degrees of freedom\n\nTable 1: Knowledge assumed by each of the \ufb01ve decoders (modulator conditioned: MC-ML, modulator\nmarginalized: MM-ML, modulator guided: MG, rate guided: RG, sign only: SO - see text for details). Last\ncolumn gives the dimensionality of variables that are assumed known or need to be estimated from neural\nresponses, with N the number of neurons in the population, and T the number of time points.\n\nhas the optimal functional form, as de\ufb01ned by the MC-ML decoder (Eq.6). To maintain biological\nplausibility, we replace the true wn (which requires precise knowledge of the encoding model) with\nestimates |\u02dca(MG)\n|. Furthermore, the difference in \ufb01ring rates [\u03bbn(1) \u2212 \u03bbn(0)] is replaced by an\nempirical estimate \u2206\u03bb; this is determined as a function of the estimated decoding weights, the learned\nsigns and one free parameter per informative subpopulation (two parameters in total). It measures the\npopulation average change in activity as a function of the stimulus and can easily be learned within a\nfew trials.\n\nn\n\n3 Decoder accuracy\n\nWe tested the decoders listed in Table 1 in a binary discrimination task that evokes differential\nresponses in a small subset of cells in the encoding population. We quanti\ufb01ed decoding performance\nfor discrimination between two stimuli s = 0 and s = 1, as we varied the overall strength of\nmodulation. Results are shown in Fig. 3A. The MC-ML decoder provides a strict upper bound\non decoding performance, as it assumes full knowledge of the encoding model. As the modulator\nvariance \u03c3m increases, the performance of this decoder monotonically decreases, con\ufb01rming the\nintuition that the injecting correlated noise in task-relevant neurons is detrimental for encoding. For\nthe encoding population tested here, the MM-ML decoder is nearly as good as the MC-ML decoder,\nhowever, performance falls faster with increasing modulator strength. This is due to the use of a \ufb01xed\nthreshold, that does not adjust to temporal \ufb02uctuations of the modulator.\nAmong the biologically plausible decoders, SO performs near chance level, as it is unable to pick up\nthe signal of the few informative neurons in the population. The RG decoder performs only slightly\nbetter, since it cannot differentiate between informative and uninformative neurons. Interestingly, the\nMG decoder performance shows a non-monotonic dependence on modulator strength. At low levels\nof modulation, performance increases with modulator strength - in this regime, the modulation allows\nthe decoder to assign larger weights to the most informative cells. At higher levels, performance\ndecreases, as with the ideal decoders, re\ufb02ecting the corruption of the encoded signal. Hence, there\nexists an ideal range of modulation at which the MG decoder reaches its best performance, which\nis close to that of the ideal MC-ML decoder. Note also that since the MG decoder is capable of\nadjusting its threshold over time, depending on the modulator, it outperforms the MM-ML decoder\n(which uses a constant threshold) in the regime of high modulator strength.\nWe study this optimum with respect to modulator strength by looking at the encoding and decod-\ning processes separately. For encoding, we predict the signal-to-noise ratio using Fisher\u2019s Linear\nDiscriminant (FLD):\n\n(cid:0)aT (\u00b51 \u2212 \u00b50)(cid:1)2\n\naT \u03a31a + aT \u03a30a\n\nSNR =\n\nfor the optimal decoding weights a = a(M C) (Fig. 3B, bottom). Decoding accuracy was estimated\nby the MSE of the MG-estimated decoding weights relative to the theoretical optimum. Given that\nthe MG-decoding weights are unbiased, the MSE is given by the variance of the estimator, which\ndecreases in inverse proportion to T (see Suppl. Info. S2):\n\n\u03bbn(1 + \u03c32\n\nmw2\n\nn) + \u03bb2\n\nmw2\n\nne\u03c32\n\nn (1 + 4\u03c32\n\nmw2\n\nn) \u2212 \u03bb\n\n2\nnw2\n\nn\u03c34\nm\n\n,\n\n(13)\n\n(cid:104)|a(MG)\n\n|(cid:105)\n\nn\n\nVar\n\n(cid:16)\n\n=\n\n\u03c32\nm\nT\n\n(12)\n\n(cid:17)\n\nwhere \u03bbn and \u03bb2\n\nn are the mean and second moment of the neural response across stimuli.\n\n6\n\n\fFigure 3: Comparison of different decoders, on simulated data. A. Decoding accuracy as a function of relative\nmodulator strength (percentage of total spike count variance that can be attributed to the modulator in informative\nneurons). Lines indicate mean accuracy (% correct), and shaded region its 95% con\ufb01dence interval. We\nsimulated 5000 cells in total, of which 50 were active cells and of those 12 (24% of active) were informative\ncells. Baseline \ufb01ring rates were set similar for all active neurons. B. Increasing modulator strength has opposite\neffects on encoding and decoding accuracy. It decreases the FLD ratio (encoding accuracy), but it also increases\ndecoding accuracy, measured as the inverse of the MSE; these two effects jointly produce the maximum in\naccuracy of the MG-decoder (blue shaded region). C. Performance of different decoders as a function of the\nnumber of informative neurons (in % informative neurons of active neurons). The strength of modulation was\nset \ufb01xed to the MG-optimal strength (see A). Other parameters are the same as in A.\n\nWe also tested the in\ufb02uence of percentage of informative neurons in the encoding population on\nthese results. The decoding problem of identifying task-informative neurons is particularly dif\ufb01cult\nwhen only very few of the active neurons are task-informative. In experiments, the percentage of\ninformative neurons varies depending on the intrinsic tuning properties of the cells (e.g. width of\ntuning curves), and extrinsic task properties (e.g. coarse vs. \ufb01ne discrimination). In our simulations,\nvarying the percentage of informative neurons serves as a proxy for both. Unsurprisingly, increasing\nthe percentage of informative neurons increases decoding accuracy across the board (Fig.3C). While\nthe SO decoder improvements are modest, the RG decoder achieves reasonable accuracy if more\nthan half of the neurons are informative. The advantage of the MG decoder over RG is strongest if\nthe fraction of informative neurons is small. These results are robust to changes in the overall size\nof the population (not shown). Overall, this suggests that, under realistic conditions, our proposed\nmodulator-guided decoding mechanism provides a substantial bene\ufb01t over simpler solutions.\nFinally, the modulation strengths above were set to the optimal decoding weights for every neuron,\nhence assuming high precision targeting and no additional sources of noise (Eq.3). In Fig.4 we show\nnumerically that our results are robust to noise in the modulator weights, as well as to perturbations\nin the \ufb01ring rates of the neurons, in the form of additive Gaussian noise.\n\n4 Discussion\n\nArti\ufb01cial neural networks may excel at solving the one task they have been trained for, but require\nsubstantial retraining when goals change. In contrast, human and animals can rapidly and \ufb02exibly\nswitch between goals, with existing neural resources quickly recruited for the task at hand. Here we\nproposed that a functionally targeted stochastic modulator [13] could dynamically label informative\nneurons, facilitating their \ufb02exible and accurate task-speci\ufb01c readout. We showed that a modulator-\nguided linear decoder, in which weights are estimated through correlation of responses with the\nmodulator, can achieve near-optimal performance. We investigated how parameters of the encoder\n(proportion of inactive neurons, and active but uninformative neurons) impact performance and found\nthat these dictate a choice of modulator strength that best balances the disruptive effects of correlated\nnoise on encoding against its positive effects for decoding. Importantly, performance is invariant to\nother parameter changes, such as size of the population and baseline \ufb01ring rate, demonstrating the\nrobustness of the modulation labeling scheme to circuit details.\nHistorically, ideal observer models have ignored the presence of modulation, yet have provided\ngood approximations of behavioral performance. Our MM-ML ideal observer provides a possible\nexplanation for this incongruity: an experimenter that measures tuning functions by averaging\nneural responses in the presence of unaccounted-for modulation is effectively marginalizing over\n\n7\n\n0.5607010-1100105103104relative modulator strengthencodingdecodingaccuracy805060708090100accuracy (%)relative modulator strengthABC% informative neuronsaccuracy (%)1020304050605060708090100modulator-guidedMC-MLrate-guidedMM-MLsign-onlymodulator-guidedMC-MLrate-guidedsign-only10-210-310-110010-210-32.0\fFigure 4: Robustness of model to perturbations in modulation weights and \ufb01ring rates. A. Modulation\nstrengths wn set to the absolute value of optimal decoding weights corrupted by additive independent Gaussian\nnoise. B. Decoder performance with noisy modulation weights. C. Decoder performance with moderate and\nhigh levels of Gaussian noise added to the \ufb01ring rates de\ufb01ned by Eq.3.\n\nit. Optimal decoding weights derived from these estimates are in fact correct, but the use of a\n\ufb01xed decision threshold is suboptimal. This suboptimality is relatively minor in the context of our\nsimulations (compare MC-ML and MM-ML in Fig. 3A), but could prove more substantial when \ufb01t\nto physiological data. Another source of bias, often ignored in ideal observer analyses, arises from\nthe selection of experimentally recorded neurons. Neural recordings are generally biased towards\nactive neurons, partly because low \ufb01ring neurons are more likely to be overlooked, and partly because\nexperimental stimuli are often optimized to drive the recorded population. While the signal may\nbe concentrated in the recorded subpopulation, a downstream decoding area must also process the\nsubstantially larger unrecorded population.\nOur encoding model assumes multiplicative noise since, to date, there is no evidence that additive\nnoise is functionally targeted. MOreover, experimental reports are con\ufb02icting as to whether additive\nnoise is a common phenomenon (e.g. Goris et al. [11] argue that an additive noise model is inconsistent\nwith their data). Should it be there, task-invariant additive noise would decrease the performance\nof all decoders, but would not qualitatively change our results (see Fig. 4 C). For simplicity, we\nhave assumed a single task-speci\ufb01c signal that underlies the correlated noise within the population.\nThis is consistent with [13], which showed that V4 noise correlations were largely captured using\na one dimensional modulator per hemisphere. Alternatively, one could introduce several Gaussian\nmodulators, that combine linearly to jointly gate neural responses. This model would be harder to\nparameterize, but the net effect would be similar. Additional modulators that are not targeted would\nreduce the SNR of all neurons and negatively affect all decoders, but again, should not qualitatively\nchange the results.\nOur theory does not specify the biological implementation of the modulation, in terms of where\nit is generated, the mechanism by which it targets encoding neurons in a task-speci\ufb01c manner, or\nthe means by which it is made available to downstream circuits. In principle, dynamic changes in\npopulation noise correlations could arise through either local [20] or top-down mechanisms [14].\nFor the orientation discrimination task considered here, one could imagine taking advantage of the\ntopological organization of the sensory code for orientation, modulating spatially-localized clusters\nof neurons in V1 without requiring explicit knowledge of their individual tuning. The induced noise\ncorrelations would then propagate bottom-up to higher areas, labeling task-relevant neurons that need\nnot be topographically clustered. This mechanism predicts a hierarchy of modulator-based labeling\nacross the sensory processing stages, consistent with the experimental observation that correlations\nbetween V1 and MT increase with behavioral performance, triggered by attending towards a stimulus\n[21]. Alternatively, on a slower time scale, the top-down connections could self-organize to allow for\nfeature-selective noise targeting, akin to attentive mechanisms recently introduced in arti\ufb01cial neural\nnetworks [22, 23].\nThe topic of \ufb02exible information routing in the brain has a long history. In particular, the signal\ntransmitted by sensory neurons is enhanced when their \ufb01ring is synchronized, and thus, oscillations\nhave been hypothesized to serve as labeling mechanism [24]. The \u201ccommunication through coher-\nence\u201d(CTC) theory [25] has re\ufb01ned this idea in an encoding-decoding framework, where a top-down\noscillatory signal projects to both encoding neurons with the same feature selectivity, and to the\ndecoding network that reads out from them. Oscillations can play a similar role to our modulator,\nwith some important distinctions. First, CTC considers a \ufb01xed labeling strategy, with oscillations\ntargeting feature-selective neurons, while our framework focuses on the \ufb02exible learning through\n\n8\n\nmodulation wsign onlyrate guidedMGMC-MLMM-MLmoderate additive noisestrong additive noisemodulator strengthaccuracyaccuracymodulator strengthC|optimal weights|ABsign onlyrate guidedMGMC-MLMM-MLsuboptimal weights\ftargeting of task-informative neurons. The two proposals might be hard to distinguish in a detection\ntask, but make distinct predictions for discrimination. Second, CTC assumes a decoder with \ufb01xed\nthreshold, which (at least within our modeling framework) is suboptimal. Third, the two theories\ndiffer in the statistics of the labeling signal: CTC assumes periodic signals, while our model uses\nstochastic signals, assuming only a timescale. We note, though, that our framework could be readily\nadapted to the case of an oscillatory modulator.\nOur model makes several predictions which can be examined in an experimental context that includes\na dynamically changing task. In particular, the in\ufb02uence of low-dimensional (shared) noise should\nshift with the task, so as to continue to preferentially target task-informative neurons. Moreover,\na modulator-guided decoder should outperform simpler strategies (sign-only or rate-guided) when\napplied to physiological data. Since our theory posits an optimal level of modulation relative to\nthe stimulus-induced variance, we expect that attention shifts the level of modulation towards this\n(empirically estimable) optimum. Furthermore, the direction of the shift may provide clues regarding\nthe mechanisms underlying noise generation [20].\nOur decoders are designed for a classi\ufb01cation (as opposed to estimation) task because the experiments\nshowing task-speci\ufb01c modulation have been done with binary discrimination tasks. In principle, it\nmight be possible to extend this framework to estimation, which also entails learning to appropriately\nweight informative neurons while ignoring uninformative ones. Modulator-labeling should prove\nuseful in this context, although the details of the decoder will likely change. There is growing interest\nin the machine learning community in developing more \ufb02exible, adaptive neural models. Despite\nsome recent progress, the design of arti\ufb01cial neural networks that can handle many tasks and leverage\npast learning to generalize to new tasks (e.g., multi-task, meta-learning, transfer learning) is in its\ninfancy. Our proposal provides a novel biologically-inspired solution, as a potential step toward this\ngoal.\n\nAcknowledgments\n\nThis work was supported by the Google PhD Fellowship (CH).\n\nReferences\n[1] MN Shadlen, KH Britten, WT Newsome, and JA Movshon. A computational analysis of\nthe relationship between neuronal and behavioral responses to visual motion. Journal of\nNeuroscience, 76(4):1486\u20131510, 1996.\n\n[2] H S Seung and H Sompolinsky. Simple models for reading neuronal population codes.\n\nProc.Natl.Acad.Sci., 90:10749\u201310753, 1993.\n\n[3] E Salinas and LF Abbott. Vector reconstruction from \ufb01ring rates. Journal of Computational\n\nNeuroscience, 1(1-2):89\u2013107, 1994.\n\n[4] W Ma, J M Beck, P E Latham, and A Pouget. Bayesian inference with probabilistic population\n\ncodes. Nature Neuroscience, 9(11):1432\u20138, 2006.\n\n[5] M Jazayeri and JA Movshon. A new perceptual illusion reveals mechanisms of sensory decoding.\n\nNature, 446(7138):912\u2013915, 2007.\n\n[6] R Caruana. Multitask learning. Machine learning, 28(1):41\u201375, 1997.\n\n[7] KH Britten, WT Newsome, MN Shadlen, S Celebrini, and JA Movshon. A relationship between\nbehavioral choice and the visual responses of neurons in macaque mt. Visual Neuroscience,\n13(1996):87\u2013100, 1996.\n\n[8] D Ganguli and EP Simoncelli. Neural and perceptual signatures of ef\ufb01cient sensory coding.\n\nArXiv, pages 1\u201324, 2016.\n\n[9] EP Simoncelli. Optimal estimation in sensory systems. In The New Cognitive Neurosciences,\n\nnumber 36, pages 525\u2013539. 2009.\n\n[10] B Averbeck, P Latham, and A Pouget. Neural correlations, population coding and computation.\n\nNature Reviews Neuroscience, 7(5):358\u2013366, 2006.\n\n9\n\n\f[11] RLT Goris, JA Movshon, and EP Simoncelli. Partitioning neuronal variability. Nature Neuro-\n\nscience, 17(6):858\u2013865, 2014.\n\n[12] R Moreno-Bote, J Beck, I Kanitscheider, X Pitkow, P Latham, and A Pouget. Information-\n\nlimiting correlations. Nature Neuroscience, 17(10):1410\u20131417, 2014.\n\n[13] N Rabinowitz, RL Goris, M Cohen, and EP Simoncelli. Attention stabilizes the shared gain of\n\nv4 populations. Elife, pages 1\u201324, 2015.\n\n[14] A Bondy, RM Haefner, and BG Cumming. Feedback determines the structure of correlated\n\nvariability in primary visual cortex. Nature Neuroscience, 21:598\u2013606, 2018.\n\n[15] I Lin, M Okun, M Carandini, and KD Harris. The nature of shared cortical variability. Neuron,\n\n87(3):644\u2013656, 2015.\n\n[16] I Arandia-Romero, S Tanabe, J Drugowitsch, A Kohn, and R Moreno-Bote. Multiplicative and\nadditive modulation of neuronal tuning with population activity affects encoded information.\nNeuron, 89:1305\u20131316, 2016.\n\n[17] MR Cohen and JHR Maunsell. Attention improves performance primarily by reducing interneu-\n\nronal correlations. Nature Neuroscience, 12(12):1594\u20131600, 2009.\n\n[18] AK Churchland, R Kiani, R Chaudhuri, XJ Wang, A Pouget, and MN Shadlen. Variance as a\n\nsignature of neural computations during decision making. Neuron, 69(4):818\u2013831, 2011.\n\n[19] MR Cohen and WT Newsome. Estimates of the contribution of single neurons to perception\ndepend on timescale and noise correlation. The Journal of Neuroscience, 29(20):6635\u201348, 2009.\n\n[20] C Huang, DA Ruff, R Pyle, R Rosenbaum, MR Cohen, and B Doiron. Circuit models of\n\nlow-dimensional shared variability in cortical networks highlights. Neuron, 101:1\u201312, 2019.\n\n[21] DA Ruff and MR Cohen. Attention increases spike count correlations between visual cortical\n\nareas. The Journal of neuroscience, 36(28):7523\u201334, 2016.\n\n[22] H Larochelle and GE Hinton. Learning to combine foveal glimpses with a third-order boltzmann\n\nmachine. In Advances in neural information processing systems, pages 1243\u20131251, 2010.\n\n[23] M Denil, L Bazzani, H Larochelle, and N de Freitas. Learning where to attend with deep\n\narchitectures for image tracking. Neural computation, 24(8):2151\u20132184, 2012.\n\n[24] W Singer. Neuronal synchrony: A versatile code review for the de\ufb01nition of relations? Neuron,\n\n24:49\u201365, 1999.\n\n[25] TE Akam and DM Kullmann. Ef\ufb01cient \u201ccommunication through coherence\u201d requires os-\ncillations structured to minimize interference between signals. PLoS Comp. Biology, 8(11),\n2012.\n\n10\n\n\f", "award": [], "sourceid": 8147, "authors": [{"given_name": "Caroline", "family_name": "Haimerl", "institution": "New York University"}, {"given_name": "Cristina", "family_name": "Savin", "institution": "NYU"}, {"given_name": "Eero", "family_name": "Simoncelli", "institution": "HHMI / New York University"}]}