{"title": "Expected and Unexpected Uncertainty: ACh and NE in the Neocortex", "book": "Advances in Neural Information Processing Systems", "page_first": 173, "page_last": 180, "abstract": null, "full_text": "Expected and Unexpected Uncertainty:\n\nACh and NE in the Neocortex\n\nAngela Yu\n\nPeter Dayan\n\nGatsby Computational Neuroscience Unit\n\n17 Queen Square, London WC1N 3AR, United Kingdom.\n\nferaina@gatsby.ucl.ac.uk\n\ndayan@gatsby.ucl.ac.uk\n\nAbstract\n\nInference and adaptation in noisy and changing, rich sensory environ-\nments are rife with a variety of speci\ufb01c sorts of variability. Experimental\nand theoretical studies suggest that these different forms of variability\nplay different behavioral, neural and computational roles, and may be\nreported by different (notably neuromodulatory) systems. Here, we re-\n\ufb01ne our previous theory of acetylcholine\u2019s role in cortical inference in\nthe (oxymoronic) terms of expected uncertainty, and advocate a theory\nfor norepinephrine in terms of unexpected uncertainty. We suggest that\nnorepinephrine reports the radical divergence of bottom-up inputs from\nprevailing top-down interpretations, to in\ufb02uence inference and plasticity.\nWe illustrate this proposal using an adaptive factor analysis model.\n\n1 Introduction\n\nAnimals negotiating rich environments are faced with a set of hugely complex inference\nand learning problems, involving many forms of variability. They can be unsure which con-\ntext presently pertains, cues can be systematically more or less reliable, and relationships\namongst cues can change smoothly or abruptly. Computationally, such different forms of\nvariability need to be represented, manipulated, and wielded in different ways. There is\nample behavioral evidence that can be interpreted as suggesting that animals do make and\nrespect these distinctions,5 and there is even some anatomical, physiological and pharma-\ncological evidence as to which neural systems are engaged. 29\nPerhaps best delineated is the involvement of neocortical acetylcholine (ACh) in uncer-\ntainty. Following seminal earlier work,11, 14 we suggested6, 35 that ACh reports on the un-\ncertainty associated with a top-down model, and thus controls the integration of bottom-up\nand top-down information during inference. A corollary is that ACh should also control the\nway that bottom-up information in\ufb02uences the learning of top-down models. Intuitively,\nthis cholinergic signal reports on expected uncertainty, such that ACh levels are high when\ntop-down information is not expected to support good predictions about bottom-up data\nand should be modi\ufb01ed according to the incoming data.\nWe6, 35 formally demonstrated the inference aspects of this idea using a hidden Markov\nmodel (HMM), in which top-down uncertainty derives from slow contextual changes. In\nextending this quantitative model to learning, we found, surprisingly, that it violated our\nqualitative theory of ACh. That is, in the HMM model, greater uncertainty in the top-\ndown model (ie a lower posterior responsibility for the predominant context), reported by\nhigher ACh levels, leads to comparatively slower learning about that context. By contrast,\nwe had expected that higher ACh should lead to faster learning, since it would indicate\n\n\fthat the top-down model is potentially inadequate. In resolving this con\ufb02ict, we realized\nthat, at least in this particular HMM framework, we had incorrectly fused different sorts\nof uncertainty. As a further consequence, by thinking more generally about contextual\nchange, we also realized the formal need for a signal reporting on unexpected uncertainty,\nthat is, on strong violation of top-down predictions that are expected to be correct. There is\nsuggestive empirical evidence that one of many roles for neocortical norepinephrine (NE)\nis reporting this;29 it is also consonant with various existing theories associated with NE.\nIn sum, we suggest that expected and unexpected uncertainty play complementary but dis-\ntinct roles in representational inference and learning. Both forms of uncertainties are postu-\nlated to decrease the in\ufb02uence of top-down information on representational inference and\nincrease the rate of learning. However, unexpected uncertainty rises whenever there is a\nglobal change in the world, such as a context change, while expected uncertainty is a more\nsubtle quantity dependent on internal representations of properties of the world. Here, we\nstart by outlining some of the evidence for the individual and joint roles of ACh and NE in\nuncertainty. In section 3, we describe a simple, adaptive, factor analysis model that clar-\ni\ufb01es the uncertainty notions. Differential effects induced by disrupting ACh and NE are\ndiscussed in Section 4, accompanied by a comparison to impairments found in animals.\n\n2 ACh and NE\n\nACh and NE are delivered to the cortex from a small number of subcortical nuclei: NE\noriginates solely in the locus coeruleus, while the primary sources of ACh are nuclei in the\nbasal forebrain (nucleus basalis magnocellularis, mainly targeting the neocortex, and me-\ndial septum, mainly targeting the hippocampus). Cortical innervations of these modulators\nare extensive, targeting all cortical regions and layers. 9, 30\nAs is typical for neuromodulators, physiological studies indicate that the effects of direct\napplication of ACh or NE are confusingly diverse. Within a small cortical area, iontophore-\nsis or perfusion of ACh or NE (or their agonists) may cause synatic facilitation or suppres-\nsion, depending on the cell and depending on whether the \ufb01ring is spontaneous or stimulus-\nevoked; it may also induce direct hyperpolarization or depolarization. 9, 10, 17 Direct appli-\ncation of either neuromodulator or its agonist, paired with sensory stimulation, results in\na general enhancement of stimulus-evoked responses, as well as an increased propensity\nfor experience-dependent reorganization of cortical maps (in contrast, depletion of either\nsubstance attenuates cortical plasticity).9 More interestingly, ACh and NE both seem to\nselectively suppress intracortical and feedback synaptic transmission while enhancing tha-\nlamocortical processing.8, 12, 13, 15, 17, 18, 20 Based on these roughly similar anatomical and\nphysiological properties, cholinergic and noradrenergic systems have been attributed cor-\nrespondingly similar general computational roles, such as modulating the signal-to-noise\nratio in sensory processing.9, 10\nHowever, the effects of ACh and NE depletion in animal behavioral studies, as well as\nmicrodialysis of the neuromodulators during different conditions, point to more speci\ufb01c\nand distinct computational roles for ACh and NE. In our previous work on ACh, 6, 35 we\nsuggested that it reports on expected uncertainty, ie uncertainty associated with estimated\nparameters in an internal model of the external world. This is consistent with results from\nanimal conditioning experiments, in which animals learn faster about stimuli with variable\npredictive consequences.24 A series of lesion studies indicates cortical ACh innervation is\nessential for this sort of faster learning.14\nIn contrast to ACh, a large body of experimental data associates NE with the speci\ufb01c ability\nto learn new underlying relationships in the world, especially those contradicting existent\nknowledge. Locus coeruleus (LC) neurons \ufb01re phasically and robustly to novel objects\nencountered during free exploration, 34 novel sensory stimuli,25, 28 unpredicted changes in\nstimulus properties such as presentation time,2 introduction of association of a stimulus\n\n\fwith reinforcement,19, 28, 32 and extinction or reversal of that association. 19, 28 Moreover, this\nactivation of NE neurons habituates rapidly when there is no predictive value or contingent\nresponse associated with the stimuli, and also disappears when conditioning is expressed\nat a behavioral level.28\nThere are few sophisticated behavioral studies into the interactions between ACh and NE.\nHowever, it is known that NE and ACh both rise when contingencies in an operant condi-\ntioning task are changed, but while NE level rapidly habituates, ACh level is elevated in a\nmore sustained fashion.3, 28 In a task designed to tax sustained attention, lesions of the basal\nforebrain cholinergic neurons induced persistent impairments, 22 while deafferentation of\ncortical adrenergic inputs did not result in signi\ufb01cant impairment compared to controls. 21\nOne of the best worked-out computational theories of the drive and function of NE is that\nof Aston-Jones, Cohen and their colleagues. 1, 33 They studied NE in the context of vigi-\nlance and attention in well-learned tasks, showing how NE neurons are driven by selective\ntask-relevant stimuli, and that, in\ufb02uenced by increased electrotonic coupling in the locus\ncoeruleus, a transition from a high tonic, low phasic activity mode to a low tonic, high\nphasic activity mode is associated with increased behavioral performance through NE\u2019s\nsuggested effect of increasing the signal to noise ratio of target cortical cells. This is a\nvery impressive theory, with neural and computational support. However, its focus on\nwell-learned tasks, means that other drives of NE activity (particularly novelty) and effects\n(particularly plasticity) are downplayed, and a link to ACh is only a secondary concern. We\nfocus on these latter aspects, proposing that NE reports unexpected uncertainty, ie uncer-\ntainty induced by a mismatch between prediction and observation, such as when there is a\ndramatic change in the external environment. We do not claim that this is the only role of\nNE; but do see it as an important complement to other suggestions.\n\n3 Inference and Learning in Adaptive Factor Analysis\n\n\u00017\u000f8\u0010\n\n,(,(,.\u001e\n\n65\n\n\u0001\u0015\u001f! \n\n\"$#&%('*)\n\n\u0001 , and a noisy observed variable \u001e\n\u001e$+-,(,(,.\u001e\n\n\u000e\u0001\n\n\u000f\u0002\u0010\u0011\b\u0013\u0012\u0015\u0014\u0016\b\u0018\u0017\u001a\u0019\u001c\u001b , a discrete representational variable \u001d\n\nis the true contextual state, the less learning accorded to \u0004\u0006\u0005\n\nOur previous model of the role of ACh in cortical inference involved a generative scheme\nwith a discrete contextual variable \u0002\u0001 , evolving over time \u0003 with slow Markov dynamics\n\u0004\u0006\u0005\n\u0007\u0001\t\b\u000b\n\r\f\nthat was stochastically\n\u0014 (normal distribution). The\ndetermined by \n\u0001 ; the HMM structure makes this interesting\ninferential task was to determine \u001d\nbecause top-down (\u0002\u0001 ) and bottom-up (\u001e\n\u0001 ) information have to be integrated. Top down\n\u0001 should be\ninformation can be uncertain, in which case mainly bottom-up information \u001e\n\u0001 . We suggested that ACh reports the uncertainty in the top-down context,\nused to infer \u001d\n\u00044\u0005\n\u0014 , where 95\n65\nnamely /1032\nis the most likely value of the context and 2\nindicates the use of an approximation. ACh thereby reports expected uncertainty, as in the\nqualitative picture above, and appropriately controls cortical inference. However, if one\nalso considers learning, for instance if \u0004\u0006\u0005\n\u0014 is unknown, then the less certain the animal\n\u0014 . This is exactly\nis that 95\nthe opposite of what we should expect according to our empirically-supported arguments\nabove.\nIn fact, this way of viewing ACh is also not consistent with a more systematic reading 5, 16 of\nHolland & Gallagher\u2019s cholinergic results,14 which imply that ACh is better seen as a report\nof uncertainty in parameters rather than uncertainty in states. In order to model this more\n\ufb01tting picture of ACh, we need an explicit model of parameter uncertainty. We constrain\nthe problem to a single, implicit, context :\u0001;\b\nto develop the new picture in a continuous space, in which the parameter governing the\nrelationship between \n(scalar for convenience), which is imperfectly\nknown (hence the parameter uncertainty, reported by ACh), and indeed can change. Again,\n\u0001 stochastically speci\ufb01es \u001e\nSpecifying how <\n\n\u0001 can change over time requires making an assumption about the nature\nof the context. In particular, novelty plays a critical role in model evolution. In general,\n\n/ . It is easiest (and perhaps more realistic)\n\n/ and \u001d\n\nis <\n\nthrough a normal distribution.\n\n\u0001\n\u0005\n\u0001\n\f\n\u001e\n\u0010\n'\n\u0001\n\f\n\u001e\n\u0010\n\u0001\n\u0004\n\u001d\n\u0001\n\f\n\n\u0001\n\u0001\n\u001d\n\u0001\n\f\n\u0001\n\u0001\n\b\n\u0001\n\u0001\n\u001d\n\u0001\n\f\u0011\u0013\u0012\u0005\u0014\n\ny\n\n\u0011\u0016\u0015\u0017\u0014\n\nx\n\n20\n\n10\n\n0\n\n\u221210\n\n\u221210\n\np(y; m )\n\np(x|y)\n\n0\n\n10\n\n\u0011\u0016\u0018\u0019\u0014\n\n15\n10\n5\n0\n\u22125\n0\n\n\u0011\u0016\u001a\u0017\u0014\n\u0002\u0001\n\u0003\u0005\u0004\u0007\u0006\n\n\b\n\t\n\n4\n\n3\n\n2\n\n1\n\n0\n0\n\n1\n\n35\n\n70\n\n4\n\n2\n\n3\n\n\u0002\u0001\n\u0003\u000b\u0006\r\f\u000f\u000e\u0010\u0004\n\nFigure 1: Adaptive factor analysis model. (a) 2-layer adaptive factor analysis model, as speci\ufb01ed by\n\nspace.\n\n, 17?\n\n: DGF\n\nAIHJ+.\u001d , K\n\n, are denoted as large circles.\n\n:\n. (c) Same sequence\nspace, ie\n\nEq. 1 & 2. (b) Sample sequence of \u001c\u001e\u001d data points generated with parameters: \u001f! \"\u001d$#\n'( *),+\u0017-.+0/ ,\n\u001d&% , '\n 54 , 176\n132\n 8+ , 9;:< 5=\u0017>\n @4 . 4 major shifts in A occurred (including initial ACB ), whose\nspace, '\n: DLF\n: DGF\nAIHNM , O\nprojections into D\nAIHQP , R\n'\u0017A\nspace and fall along the line'\nA\rHJS\u000bP . Small T denotes U&V projected into D\n'WU\nDLF\nviewed in U\nV , Z : D optimally projected into U\n: U\nV , R\n, Y : A\nX : major shifts in A\n, where [\nis the mean of the posterior distribution ofU given only the\n'\u0017^b9`_\n'&a\u0005_\nD\r ]\\\u0013'\n'&^I9`_\nobservation D and \ufb02at priors. (d) Scatter plot ofFdc\nDhg\u0017F . X :iJf\nF vs. Fdc\n \"+.M , Z :ijf\n\u001d\u001eP ,\nU`V3Sec\nAIf\nU&V3S\n:i\n% , dashed line denotes parity. Largeri\ncorresponds to greater reliance onD\nV rather than\n kl$#\nV , while the intermediate value of i\nfor inferring c\n% exactly balances top-down uncertainty\n kl\u0010#\nwith bottom-up uncertainty in the inference of c\nU\u0007V .\nallow for this by modeling continual small changes in <\n\nwe might expect small amounts of novelty, as models continually readjust, and we can\n\u0001 . However, in order to allow for\nthe possibility of macroscopic changes implied by substantial novelty (as reported by NE),\nwhich are of evident importance in many experiments, we must add a speci\ufb01c component\nto the model. The interaction between microscopic and macroscopic novelty is essentially\nthe interaction between ACh and NE. In all, assume that\n\n k\u001d\u0010#\n\n'onQp\n\n'\u0005q\n\n'0q\n\n'0q\n\n\u00017\u000f$\u0010\u000brks\u0015\u0001trvu\n\u0001xw\n\u0004\u0006\u0005\n\u00148\b}|\n\n(1)\n(2)\nis\n\n.\n\n\u0004\u0006\u0005\n\nis the estimate of <\n\n'\u0005q\n\u0014 (see Figure 1). We will see later that the binary u\nwith the initial value <\u007f~\nthe key to the model of NE; it comes from an assumption that there can occasionally (|\u007f\u0081\n/ )\nbe dramatic changes in a model that force its radical revision. m\nis another parameter; we\nassume it is known and \ufb01xed. Figure 1(b) & (c) shows a sample sequence of a particular\nsetting of the model: the output \u001e can be quite noisy, although there are clear underlying\nregularities in \u001d\nAt time \u0003 , consider the case that we can make the approximation that <;\u0082\n\u001f\u0013 \n\u0014 ,\nwhere \u0083\nis its variance (uncertainty), which is reported by\n<L\u0082\nACh. Here, the open circles indicate that this estimate is made before \u001e\nis observed. We\n\u0001 ; then go on to study learning.\n\ufb01rst consider how the ACh term in\ufb02uences inference about \u001d\n\u0014 , where\nFor inference, it can easily be shown that \u001d\nm$\u008a\u000fn\n\b\u0088\u0087\n\u0001N\u0089\n(3)\nwhence the effect of ACh is exactly as in our qualitative picture. The more uncertainty\nin determining \u0083\n(ie the larger \u0085\n\u0001 .\nExamples of just such effects can be found in Figure 1 (d).\ny . In this case,\nFor learning, start with the distribution of <\nwriting \u008e\n\u0010W\u0091\n\n\u0001 ), the smaller the role of the top-down expectation \u0083\n<G\u0082\n\u0010 and assume u\n\n\u0010 given \u001e\n\u000f8\u0010\nm9'\n\np , we get\n\n\u0001 and \u0085\n\n'\u0086\u0083\n\u0001\u008c\u008b\n\nm\u0086\u008aGn\n\n\u0005\u0084\u0083\n<C\u0082\n\n\u0001Q\u0089\n\n\u000f$\u0010\n\n\u000f$\u0010\n\n\u000f8\u0010\n\n\u000f8\u0010\n\n\u0014C\u0092\n\n\u000f8\u0010\n\n\u0001\u0084\u008d\n\n\u000f8\u0010\n\nm\u008fm\n\n'o\u0085\n\n'0\u0085\n\n\u0005\u0019\u0083\n\n\u0001\n\u0004\n\n\n\u001b\n'\n'\nE\n'\n[\n'\nB\n:\n'\n'\nB\n'\n'\nB\n:\nD\nD\nV\n[\nV\nV\nK\nf\nV\nf\nV\nc\nA\nf\nV\nU\nf\nV\n\u001e\n\u0001\n\u001f\n \n\u0005\nm\nm\nm\n\u001d\n\u0001\n\u0014\n\u001d\n\u0001\n\u001f\n \n\u0005\n<\n\u0001\n+\n#\n\u0014\n<\n\u0001\n\b\n<\n\u0001\ns\n\u0001\n\u001f\n \n\u0005\ny\n+\nz\n\u0014\nw\n\u0001\n\u001f\n \n\u0005\ny\n+\n{\n\u0014\nu\n\u0001\n\b\n/\n\u0014\n\b\n/\n0\nu\n\u0001\n\b\ny\n\u001f\n \n\u0005\ny\n+\n\u0080\n\u0001\nm\nm\n\u0001\n\u0001\n\u0082\n\u0001\n\u0001\n\u0082\n\u0001\n\u0001\n\u0001\n\u001f\n \n\u001d\n\u0001\nq\n+\n\u0001\n\u0083\nq\n\u000f\n+\n\u0001\nq\n+\n#\nr\n\u0085\n\u0082\nr\nm\nm\np\nm\nm\nm\n\u0083\n\u001d\n\u0001\n\b\n\u0083\nq\n+\n\u0087\nq\n+\n#\nr\n\u0085\n\u0082\n\u0083\n<\n\u0082\n\u0001\nr\nm\nm\np\n\u001e\n\u0082\n\u0001\n\u001d\n\u0010\n\b\n\b\nm\nm\nm\nm\n\u008a\nq\n+\n#\nr\nn\n\u0090\n\u0005\n<\n\u0010\n\f\n\u001e\n\u0010\n'\nu\n\u0010\n\b\ny\n\u0014\n\b\n \n\u0005\nm\nm\nm\n\u008a\n\u008e\n\u001e\nm\nm\nm\n\u008a\n\u008e\nm\nm\n/\n\u0091\nm\nm\nm\n\u008a\n\u008e\nm\nm\nm\n \n\u0005\ny\n~\nr\nq\n+\nz\n\u0014\n\fwith the obvious semantics for the product of two Gaussian distributions. This is almost ex-\n, and leads to standard results, such\n\u0003 , but ultimately reaching an asymptote\n\u0001 .\nz and the rate of new information from the \u001e\n\u0001 does not depend on the prediction\n/ , then the posterior distribution\n\nactly the standard form for a Kalman \ufb01lter update for <\nas variance of the estimate going initially like /\nwhich balances the rate of change from q\nImportantly, in this simple model, the uncertainty in <\nerrors \u001e\nHowever, if one takes into account the possibility that u\nfor <\n\n\u0001 , but rather changes as a function only of time.\n\nis the two-component mixture\n\nmG\u0083\n\n\u0091\u0001\n\n(4)\n\n\u0010\u0015\f\n\n\u0004\u0006\u0005\n\u000f8\u0010\n\n\u0010:\f\nm9'\n\n\u000f8\u0010\n\n\u00147r\n\n\u000f$\u0010\n\n\u0004\u0006\u0005\n\n\u0014C\u0092\n\n\u0010\u0015\f\n\n'0\u0085\n\n'o\u0085\n,(,\u0007,\n\n\u0010.u\n\n\u00017\u000f$\u0010\n\n\u00017\u000f8\u0010\u0016\u001f\n\n\u00017\u000f$\u0010\u007fr\n\n\u00017\u000f8\u0010\n\n'0\u0085\n\n\u00017\u000f8\u0010\n\n, ie \u001e\n\n/\u00110\n\u0001 , since each setting of the \u0003\n\n\u0010\u0017\u0091\n\u00143r\nAs \u0003\nincreases, the number of mixture components in the posterior distribution increases\nlength binary string u\nexponentially as \u0003\nis, barring\nprobability zero accidents, associated with a different component in the mixture. Thus, just\nas for switching state-space models,7 exact inference is impractical.\nOne possibility would be to use a variational approximations. 7, 23 From the neural perspec-\ntive of the involvement of neuromodulators, we propose an approximate learning algorithm\nin which signals reporting uncertainty, corresponding to our conceptual roles for ACh and\nNE, control the interactions between the (approximate) distribution at \u0003\n\u00017\u000f$\u0010\n\u0014 ,\n/ , 2\n\b\u0006\u0005\n\u00017\u000f$\u0010\n\u00017\u000f8\u0010\b\u0007 , and bottom-up information relayed by the new obser-\nwhere \u0004\n,\u0007,(,\n\u0014 . To control the exponential expansion in the hidden space, we approximate\n\u0001\r\f\nvation,\u0090\n\u00017\u000f8\u0010\n\u00017\u000f8\u0010\n\u00017\u000f8\u0010\n\u0014 as a single Gaussian, <\nis our best\nthe posterior 2\n\u0089 .\n\u0003 , and \u0085\n\u00017\u000f$\u0010 after observing \u001e\n\u00017\u000f8\u0010 , corresponding to the ACh level, is the\n\u00017\u000f$\u0010\nestimate of <\nuncertainty in our estimate \u0083\n\u00017\u000f8\u0010 . In general, we might consider the NE level \t\n\u0001 as reporting\nthe posterior responsibility of the u6\u0001\u0015\b\n/ component of the equivalent mixture of equation 4.\nEven more straightforwardly, we can measure a Z-score, namely prediction error scaled by\n\u0089 , where \u0083\n\u00017\u000f8\u0010 and\nuncertainty in our estimates:\n\u0001 exceeds a threshold\nis unlikely to have come from an unmodi\ufb01ed version of the current com-\ny . Now the learning problem reduces to a\n(5)\n\nm\u001em\nvalue \n\nponent, we assume \u0083\nmodi\ufb01ed version of Kalman \ufb01lter:\n\u00017\u000f$\u0010\u000br\n\nnQp , assuming that u6\u0001\n\nprediction variance about <\n\nKalman gain\ncorrection variance\nestimated mean\n\n(6)\n(7)\n(8)\nThe difference from the conventional Kalman \ufb01lter is the additional component of the tran-\ny , \u000b\nif \u0083\nif \u0083\n\u0001 , which depends on \u0083\n\u0001\u001a\b\nsition noise variance, \u000b\n/ .\nCloser examination indicates that the ACh (\u0085\n\u0001 ) signals have the desired se-\nmantics. In the learning algorithm, large uncertainty about the mean estimate, \u0085\n\u0001 , results\n\u0010 . Large \u0085\nin large Kalman gain, \r\n\u0001 also weakens the\nin\ufb02uence of top-down information in inference as in equation 3. High NE levels also leads\nq\u0011\u0010\ny had\nto faster learning: large \t\n\u0001 been y ), ultimately resulting in a large Kalman gain and thus fast shifting of \u0083\n. High\nNE levels also enhances the dominance of bottom-up information in inference via its in-\n\u0001 . Note that this system predicts interesting\nteractions with ACh: large \t\nreciprocal relationships between ACh and NE: higher ACh leads to smaller normalized pre-\ndiction errors and therefore less active NE signalling, whereas greater NE would generally\nincrease estimator uncertainty and thus ACh level.\n\nu6\u0001 : \u000b\u001a\u0001\u001a\b\n\u0001 ) and NE (\t\n\u0001 , which causes a large shift in <\n\u0001 means \u0083\n/ , which causes \u000b\n\n\u0001 promotes large \u0085\n\n/ . Otherwise, \u0083\nr\f\u000b\nm\u0010\u0085\n\n0\u000b\u001e\ny . Whenever \t\n\n(rather than \u000b\n\nm\u008fm\nm;\u0083\n\nm\u0086\u0085\n\n\u00017\u000f8\u0010\n\n\u00017\u000f$\u0010\n\n\u00017\u000f8\u0010\n\n\u0001\u001a\b\n\nmG\u0083\n\n\u000f$\u0010\n\n\u000f8\u0010\n\n\u0001\u000f\u000e\n\n,\u0007,(, generated from a model (same pa-\nrameters as in Figure 1), and the estimated means using our approximate learning algo-\n\u0001 , although\n\nFigure 2(a) shows an example sequence of <\nrithm. The learning algorithm is clearly able to adjust to major changes in <\n\n+\n\u0001\n0\nm\nm\n<\n\u0010\n\b\n\u0010\n\u0090\n\u0005\n<\n\u001e\n\u0010\n\u0014\n\b\nu\n\u0010\n\b\ny\n\u0014\n\u0090\n\u0005\n<\n\u001e\n\u0010\n'\nu\n\u0010\n\b\ny\nu\n\u0010\n\b\n/\n\u0014\n\u0090\n\u0005\n<\n\u001e\n\u0010\n'\nu\n\u0010\n\b\n/\n\u0014\n\u0002\n \n\u0005\nm\nm\nm\n\u008a\n\u008e\n\u001e\nm\nm\nm\n\u008a\n\u008e\nm\nm\n/\n\u0091\nm\nm\nm\n\u008a\n\u008e\nm\nm\nm\n\u008b\n\u0087\n|\n\u0089\n \n\u0005\ny\n~\nr\nq\n+\nz\n|\n \n\u0005\ny\n~\nr\nq\n+\nz\nr\nq\n+\n{\n\u0014\n\u008d\n0\n+\nu\n\u0001\n0\n\u0090\n\u0005\n<\n\f\n\u0004\n\u001e\n\u0010\n'\n\u001e\n+\n'\n'\n\u001e\n\u0005\n\u001d\n\u001e\n\u0001\n\u0090\n\u0005\n<\n\f\n\u0004\n \n\u0087\n\u0083\n<\n\u0083\n<\n<\n\t\n\u0001\n\b\n\u0087\n\u0083\n\u001e\n\u0001\n0\n\u001e\n\u0001\n\u0089\n\u008a\n\u008e\n\u0001\n\u0087\n\u0083\n\u001e\n\u0001\n\u0001\n\u001e\n\u0001\n\b\nm\nm\n<\n\u008e\n\u0001\n\b\nm\nm\nm\nm\n\u008a\n\u0087\n\u0085\nq\n+\n#\nr\nq\n+\nz\n\u0089\nr\n\b\n\u0001\nu\n\u0001\n\b\nu\n\u0001\n\b\n\u0085\n\u0082\n\u0001\n\b\n\u0085\nq\n+\nz\n\u0001\n\u0001\n\n\u0001\n\b\n\u0085\n\u0082\n\u0001\nm\nm\nm\n\u008a\n\u0087\nm\nm\n\u0082\n\u0001\nm\nm\nm\n\u008a\nr\nm\nm\nm\nm\n\u008a\nq\n+\n#\nr\nn\np\n\u0089\n\u0085\n\u0001\n\b\n\u0085\n\u0082\n\u0001\n0\n\nm\nm\n\u0082\n\u0001\n\u0083\n<\n\u0001\n\b\n\u0083\n<\nr\n\n\u0001\n\u0087\n\u001e\n\u0001\n0\nm\nm\n<\n\u0089\ny\nu\n\u0001\n\b\nq\n+\n{\nu\nu\n\u0001\n\b\n\u0001\n\b\n\u0001\n\b\n\u0083\nu\n<\n\u0010\n'\n<\n+\n'\n\f\u0002\u0004\u0003\u0006\u0005\n\n\u0002\b\u0007\t\u0005\n\n15\n10\n5\n0\n\u22125\n0\n\n5\n\n0\n0\n\n35\n\n35\n\n\u0002\u000b\n\f\u0005\n\n5000\n\n4000\n\n3000\n\n2000\n\n1000\n\n70\n\n70\n\n0\n0\n\n3\n\n10\n\n) , as a function of\n\nAIV\n\na\f\u0013\n\n }l\n\n: NE,\n\n:\n\n\u0001 ) and NE (\t\n\nFigure 2: Approximate learning algorithm. (a)\n\n: DtV projected intoU space, Y : actualAIV , X : estimated\nV , though details may differ.\nV detected to be +\n\n(NE level exceeds\n\nl . (b) S\n\n\u000ek \n\ntrials. Mean square error for optimal\n\n(lower line). Model parameters were same as in Figure 1.\n\nis P\u0007\u001c\u001el , compared to exact learning\n\n\u0001 can miss detection, such as the third large shift in <\n\nV . General patterns of A\n\\\u001ec\n\nV are captured by c\n. ACh level rises whenver c\nVIS\n\n) and then\nsmoothly falls. NE level is constant monitor of prediction error. (c) Mean summed square error over\n. Error bars show standard errors of\n\nmeans c\n: ACh, S\n\u000f\u000b\u000f\u000b\u000f\n\u001d\u0017\u001d -step sequences trials (\nthe means over %\u001e\u001dW\u001d\nerror l$+.l\nmore subtle changes in <\n. Figure 2(b)\nshows higher ACh (\u0085\nlevels both correspond to fast learning, ie fast shifting\nof \u0083\n\u0001 . However, whereas NE is a constant monitor of prediction errors and \ufb02uctuates ac-\ncordingly with every data point, ACh falls smoothly and predictably, and only depends on\nthe observations when global changes in the environment have been detected. Figure 2(b)\n\f , on the threshold value \n\nshows ladle-shaped dependence of estimation error, \f\n. For\nthe particular setting of model parameters used here, learning is optimal for \n around\n4 Differential Effects of Disrupting ACh and NE Signalling\nThe different roles of the NE (\t\n\u0001 and \u0085\ndifferent manipulations of \t\n\n\u0001 ) can be teased apart by disrupting each\nand observing the subsequent effects on learning in our model. We will examine several\nthat disrupt normal learning, and relate the results to\nimpairments observed in experimental manipulation of ACh or NE levels in animals. Of\ncourse, the complete experimental circumstances are far more complicated; we consider\nthe general nature of the effects.\n\n\u0001 ) and ACh (\u0085\n\n.\n\n/\u001c\u0014\n\ny\u0086y\n\n\u0019\u001a\u0017\u001b\u0014\n\n) and exact learning (\n\n\u0001 shifts. Mean error over\n\ny . An example is shown in\nFirst, we simulate depletion of cortical NE by setting \t\nFigure 3(a). By ruling out the possibility of u\n/ , the system is unable to cope with\ntrials (same\nabrupt, global changes in the world, ie when <\n, more than an order of magnitude larger than\nsetting as in Figure 2(c)) without NE is\n\u0003\u0018\u0017\n). This is consistent with the large\nfull approximate learning (\n, which effectively blocks the\nerrors of similar magnitude in Figure 2(c) for very large \n\nNE system from reporting global changes. However, as long as the underlying parameters\n\u0001 does not change greatly, the inference process functions normally, as\nremain the same, ie <\ny steps in Figure 3(a). These results are consistent with experimen-\nwe can see in the \ufb01rst\ntal observations: NE-lesioned animals are impaired in learning changes in reinforcement\ncontingencies,26, 28 but have little dif\ufb01culty doing previously learned discrimination tasks. 21\nWe can also simulate depletion of cortical ACh by setting \u0085\nto a small constant value.\nFigure 3(b) shows severe damage is caused the learning algorithm, but the inference symp-\ntoms are distinct from NE depletion. Permanently small \u0085\n\u0001 corresponds to over-con\ufb01dence\nin estimates of \u0083\n\u0001 , thus making adaptation of that estimate slow, similar to NE depletion.\n\u0001 dramat-\nHowever, because the NE system is still intact, the system is able to detect when\nically differs from the prediction (which is often, since \u0083\nis slow to adapt and leaves little\n\u0001 directly on the bottom-up information\nroom for variance), and thus to base inference of \u001d\n\u0014 . Thus, inference is less impaired than learning, which has also been observed in\n\n\n\u0001\n\nA\nA\nZ\nS\n\u000e\n\u0010\n\u000e\n\u0011\n\u0012\nV\nA\n\u000e\n\u000e\n\u0001\n\u0089\n<\n\u0083\n<\n0\n<\n\u0014\n\u0001\n\u0001\n\b\n\u0001\n\b\n\u0015\n\u0016\ny\n\u0014\n\u0014\n\u0001\n<\n\u001d\n<\n\u0090\n\u0005\n\u001d\n\u0001\n\f\n\u001e\n\u0001\n\f\u0001\u0007\u0006\b\u0004\n\n\u0001\u0003\u0002\u0005\u0004\n\n10\n0\n\n10\n0\n\n1\n\n1\n\n35\n\n35\n\n70\n\n70\n\nFigure 3: Disrupting NE and ACh signals. (a) NE signal set to\u001d . (b) ACh signal set to \u001d$#\u0013+oM . S\n\nspace. Learning of\n\nis poor in both manipulations, but\n\n:\t\u000b\n\nV ,\n\ninference in ACh-depletion is less impaired.\n\n\t\u000b\n\n, \f\n\n:c\n\n:c\nU&V , Z :projection of DIV\n\ninto U\n\n\u000f8\u0010\n\n\u000f$\u0010\n\n\u0001 and bottom-up estimate, based on m\n\nACh-lesioned animals.31 Moreover, the system exhibits a peculiar hesitancy in inference,\n\u0001 , based\nie constantly switching back and forth between relying on top-down estimate of \u001d\nm . This tendency is particularly\non <\nsevere when the new <\nis similar to the previous one, which can be thought of as a form of\ninterference. Interestingly, hippocampal cholinergic deafferentation in animals also bring\nabout a stronger susceptibility to interference compared with controls. 10\nSaturation of ACh and NE are also easy to model, by setting \u0085\n\u0001 very high all the\n\u0001 and \t\ntime. The effect of these two manipulations are similar, both cause the estimation of <\n\u0001 (data not shown). The perfor-\nand inference of \u001d\n\u0001 are functions of the output\nmance decrements in the estimation of <\nnoise, nNp\nin our model, and do not worsen when there are global changes in contin-\ngencies. Unfortunately, directly relevant experimental data is scarce. Administration of\ncholinergic agonists in the cortex has failed to induce impairments in tasks with changing\ncontingencies, consistent with our predictions. However, to our knowledge, cholinergic\nand noradrenergic agonists have not yet been administered in combination with systematic\nmanipulation of variability in the predictive consequences of stimuli and so the validity of\nour predictions remains to be tested.\n\nto base strongly on the observation \u001e\n\n\u0001 and inference about \u001d\n\n'0q\n\n5 Discussion\n\nWe have suggested that ACh and NE report expected and unexpected uncertainty in rep-\nresentational learning and inference. As such, high levels of ACh and NE should both\ncorrespond to faster learning about the environment and enhancement of bottom-up pro-\ncessing in inference. However, whereas NE reports on dramatic changes, ACh has the\nsubtler role of reporting on uncertainties in internal estimates.\nWe formalized these ideas in an adaptive factor analysis model. The model is adaptive in\nthat the mean of the hidden variable is allowed to alter greatly from time to time, capturing\nthe idea of a generally stable context which occasionally undergoes large changes, leading\nto substantial novelty in inputs. As exact learning is intractable, we proposed an approx-\nimate learning algorithm in which the roles for ACh and NE are clear, and demonstrated\nthat it performs learning and inference competently. Moreover, by disrupting one or both\nof ACh and NE signalling systems, we showed that the two systems have interacting but\ndistinct patterns of malfunctioning that qualitatively resemble experimental results in an-\nimal studies. There is no single collection of de\ufb01nitive experimental studies, and teasing\napart the effects of NE and ACh is tricky, since they appear to share many properties. Our\nmodel helps understand why, and should also help with the design of experiments to clarify\nthe relationship.\n\n\nS\nS\nA\nc\nV\nm\nm\n\u008a\n\u008e\n\u001e\n\u0001\n\u0091\nm\nm\nm\n\u008a\n\u008e\nm\nm\n\u0001\n\u0001\n\u0001\n+\n#\n\fOf course, the adaptive factor analysis model is overly simple in many ways. In particular,\nit only considers one particular context; and so refers all the uncertainty to the parameters\nof that context. This is exactly the complement of our previous model, 6, 35 which referred\nall the uncertainty to the choice of context rather than the parameters within each context.\nThe main conceptual difference is that the idea that ACh reports on the latter form of con-\ntextual uncertainty sits ill with the data on how uncertainty boosts learning; this \ufb01ts better\nwithin the present model. Given multiple contexts, which could formally be handled within\nthe framework of a mixture model, the tricky issue is to decide whether the parameters of\nthe current context have changed, or a new (or pre-existing) context has taken over. Ex-\nploring this is important work for the future. More generally, a thoroughly hierarchical and\nnon-linear model is clearly required as at a minimum as a way of addressing some of the\ncomplexities of cortical inference.\n\nAcknowledgement\n\nWe are very grateful to Zoubin Ghahramani and Maneesh Sahani for helpful discussions.\nFunding was from the Gatsby Charitable Foundation and the NSF.\n\nReferences\n[1] Aston-Jones, G, Rajkowski, J, & Cohen, J (1999) Biol Psychiatry 46:1309-1320.\n[2] Carli, M, Robbins, TW, Evenden, JL, & Everitt, BJ (1983) Behav Brain Res 9:361-80.\n[3] Dalley, JW et al. (2001) J Neurosci 21:4908-4914.\n[4] Daw, ND, Kakade, S, & Dayan, P (2001) Neural Networks 15:603-616.\n[5] Dayan, P, Kakade, S, & Montague, PR (2000) In NIPS 2000:451-457.\n[6] Dayan, P & Yu, A (2002) In NIPS 2002.\n[7] Ghahramani, Z & Hinton, G (2000) Neural Computation 12:831-64.\n[8] Gil, Z, Conners, BW, & Amitai, Y (1997) Neuron 19:679-86.\n[9] Gu, Q (2002) Neuroscience, 111:815-835.\n[10] Hasselmo, ME (1995) Behavioural Brain Research 67:1-27.\n[11] Hasselmo, ME, Wyble, BP & Wallenstein, GV (1996) Hippocampus 6:693-708.\n[12] Hasselmo, ME & Cekic, M (1996) Behavioural Brain Research 79: 153-161.\n[13] Hasselmo, ME et al (1997) J Neurophysiology 78:393-408.\n[14] Holland, PC & Gallagher, M (1999) Trends In Cognitive Sciences 3:65-73.\n[15] Hsieh, CY, Cruikshank, SJ, & Metherate, R (2000) Brain Research 880:51064.\n[16] Kakade, S & Dayan, P (2002) Psychological Review 109:533-544.\n[17] Kimura, F, Fukuada, M, & Tsumoto, T (1999) Eur. Jour. of Neurosci. 11:3597-3609.\n[18] Kobayashi, M et al. (1999) European Journal of Neuroscience 12:264-272.\n[19] Mason, ST & Iversen, SD (1978) Brain Res150:135-48.\n[20] McCormick, DA (1989) Trends Neurosci 12:215-221.\n[21] McGaughy, J, Sandstrom, M, et al (1997) Behav Neurosci 111:646-52.\n[22] McGaughy, J & Sarter, M (1998) Behav Neurosci 112:1519-25.\n[23] Minka, TP (2001) A Family of Algorithms for Approximate Bayesian Inference. PhD, MIT.\n[24] Pearce, JM & Hall, G (1980) Psychological Review 87:532-552.\n[25] Rajkowski, J, Kubiak, P, & Aston-Jones, G (1994) Brain Res Bull 35:607-16.\n[26] Robbins, TW (1984) Psychological Medicine 14:13-21.\n[27] Robbins, TW, Everitt, BJ, & Cole, BJ (1985) Physiological Psychology 13:127-150.\n[28] Sara, SJ, Vankov, A, & Herve, A (1994) Brain Res Bull 35:457-65.\n[29] Sara, SJ (1998) Comptes Rendus de l\u2019Academie des Sciences Serie III 321:193-198.\n[30] Sarter, M, Bruno, JP (1997) Brain Research Reviews 23:28-46.\n[31] Sarter, M, Holley, LA, & Matell, M (2000) In SFN 2000 abstracts.\n[32] Sullivan, RM (2001) Ingegrative Physiological and Behavioral Science 36:293-307.\n[33] Usher, M, et al. (1999) Science 5401:549-554.\n[34] Vankov, A, Herve-Minvielle, A, & Sara, SJ (1995) Eur J Neurosci109:903-911.\n[35] Yu, A & Dayan, P (2002) Neural Networks 15:719-730\n\n\f", "award": [], "sourceid": 2246, "authors": [{"given_name": "Peter", "family_name": "Dayan", "institution": null}, {"given_name": "Angela", "family_name": "Yu", "institution": null}]}