{"title": "ACh, Uncertainty, and Cortical Inference", "book": "Advances in Neural Information Processing Systems", "page_first": 189, "page_last": 196, "abstract": null, "full_text": "ACh, Uncertainty, and Cortical Inference\n\nPeter Dayan\n\nAngela Yu\n\nGatsby Computational Neuroscience Unit\n\n17 Queen Square, London, England, WC1N 3AR.\n\ndayan@gatsby.ucl.ac.uk\n\nferaina@gatsby.ucl.ac.uk\n\nAbstract\n\nAcetylcholine (ACh) has been implicated in a wide variety of\ntasks involving attentional processes and plasticity. Following\nextensive animal studies, it has previously been suggested that\nACh reports on uncertainty and controls hippocampal, cortical and\ncortico-amygdalar plasticity. We extend this view and consider\nits effects on cortical representational inference, arguing that ACh\ncontrols the balance between bottom-up inference, in(cid:3)uenced by\ninput stimuli, and top-down inference, in(cid:3)uenced by contextual\ninformation. We illustrate our proposal using a hierarchical hid-\nden Markov model.\n\n1 Introduction\nThe individual and joint computational roles of neuromodulators such as\ndopamine, serotonin, norepinephrine and acetylcholine are currently the focus of\nintensive study.5, 7, 9(cid:150)11, 16, 27 A rich understanding of the effects of neuromodulators\non the dynamics of networks has come about through work in invertebrate sys-\ntems.21 Further, some general computational ideas have been advanced, such as\nthat they change the signal to noise ratios of cells. However, more recent studies,\nparticularly those focusing on dopamine,26 have concentrated on speci(cid:2)c compu-\ntational tasks.\nACh was one of the (cid:2)rst neuromodulators to be attributed a speci(cid:2)c role. Has-\nselmo and colleagues,10, 11 in their seminal work, proposed that cholinergic (and,\nin their later work, also GABAergic12) modulation controls read-in to and read-\nout from recurrent, attractor-like memories, such as area CA3 of the hippocampus.\nSuch memories fail in a characteristic manner if the recurrent connections are op-\nerational during storage, thus forcing new input patterns to be mapped to existing\nmemories. Not only would these new patterns lose their speci(cid:2)c identity, but,\nworse, through standard synaptic plasticity, the size of the basin of attraction of\nthe offending memory would actually be increased, making similar problems more\nlikely. Hasselmo et al thus suggested, and collected theoretical and experimental\nevidence in favor of, the notion that ACh (from the septum) should control the sup-\npression and plasticity of speci(cid:2)c sets of inputs to CA3 neurons. During read-in,\nhigh levels of ACh would suppress the recurrent synapses, but make them readily\nplastic, so that new memories would be stored without being pattern-completed.\nThen, during read-out, low levels of ACh would boost the impact of the recurrent\nweights (and reduce their plasticity), allowing auto-association to occur.\nThe ACh signal to the hippocampus can be characterized as reporting the unfa-\nmiliarity of the input with which its release is associated. This is analogous to its\n\n\fcharacterization as reporting the uncertainty associated with predictions in theories\nof attentional in(cid:3)uences over learning in classical conditioning.4 In an extensive\nseries of investigations in rats, Holland and his colleagues14, 15 have shown that\na cholinergic projection from the nucleus basalis to the (parietal) cortex is impor-\ntant when animals have to devote more learning (which, in conditioning, is es-\nsentially synonymous with paying incremental attention) to stimuli about whose\nconsequences the animal is uncertain.20 We have4 interpreted this in the statisti-\ncal terms of a Kalman (cid:2)lter, arguing that the ACh signal reported this uncertainty,\nthus changing plasticity appropriately. Note, however, that unlike the case of the\nhippocampus, the mechanism of action of ACh in conditioning is not well under-\nstood.\nIn this paper, we take the idea that ACh reports on uncertainty one step farther.\nThere is a wealth of analysis-by-synthesis unsupervised learning models of corti-\ncal processing.1, 3, 8, 13, 17, 19, 23 In these, top-down connections instantiate a generative\nmodel of sensory input; and bottom-up connections instantiate a recognition model,\nwhich is the statistical inverse of the generative model, and maps inputs into cate-\ngories established in the generative model. These models, at least in principle, per-\nmit stimuli to be processed according both to bottom-up input and top-down ex-\npectations, the latter being formed based on temporal context or information from\nother modalities. Top-down expectations can resolve bottom-up ambiguities, per-\nmitting better processing. However, in the face of contextual uncertainty, top-down\ninformation is useless. We propose that ACh reports on top-down uncertainty,\nand, as in the case of area CA3, differentially modulates the strength of synaptic\nconnections: comparatively weakening those associated with the top-down gen-\nerative model, and enhancing those associated with bottom-up, stimulus-bound\ninformation.2 Note that this interpretation is broadly consistent with existent elec-\ntrophysiology data, and documented effects on stimulus processing of drugs that\neither enhance (eg cholinesterase inhibitors) or suppress (eg scopolamine) the ac-\ntion of ACh.6, 25, 28\nThere is one further wrinkle.\nIn exact bottom-up, top-down, inference using a\ngenerative model, top-down contextual uncertainty does not play a simple role.\nRather, all possible contexts are treated simultaneously according to the individ-\nual posterior probabilities that they currently pertain. Given the neurobiologically\nlikely scenario in which one set of units has to be used to represent all possible\ncontexts, this exact inferential solution is not possible. Rather, we propose that a\nsingle context is represented in the activities of high level (presumably pre-frontal)\ncortical units, and uncertainty associated with this context is represented by ACh.\nThis cholinergic signal then controls the balance between bottom-up and top-down\nin(cid:3)uences over inference.\nIn the next section, we describe the simple hierarchical generative model that we\nuse to illustrate our proposal. The ACh-based recognition model is introduced in\nsection 3 and discussed in section 4.\n\n2 Generative and Recognition Models\n\nFigure 1A shows a very simple case of a hierarchical generative model. The gen-\nerative model is a form of hidden Markov model (HMM), with a discrete hidden\nstate \u0002\u0001 , which will capture the idea of a persistent temporal context, and a two-\ndimensional, real-valued, output\nlayer, between \n\u0001 , and controls which of a\nand\nset of 2d Gaussians (centered at the corners of the unit square) is used to generate\n\u0001 , and the key inference\n\u0001 . In this austere case,\n\nis stochastically determined from \nis the model\u2019s representation of\n\n. The state\n\n\u0001 . Crucially, there is an extra\n\n\u0003\n\u0004\n\u0003\n\u0004\n\u0001\n\u0003\n\u0004\n\u0001\n\u0003\n\f \n\n \n\n \n\n \n\n \n2\n\n\u0003\u0005\u0004\u0007\u0006\t\b\n\n\u0003\u0005\u0004\n\b\f\u000b\n\n\u0003\u0005\u0004\n\n\u0015\u0017\u0016\u0019\u0018\u001b\u001a\u0019\u001c\u001d\u0016\u0019\u0018\n\u0016\u001f\u0018\n !\u0018\n\n\u0003\u0005\u0004\u0013\u0003\u0014\u000b\n\n1\n\n0\n\n0\n\n\u22121\n\n\u22121\n\n2\n\n1\n\n4\n3\n2\n1\n4\n3\n2\n1\n0\n\n2\n1\n0\n\u22121\n\n200\n\n400\n\n\u22121 0 1 2\n\nto\n\n\u0001Cz\n\nitself.\n\nthat\n\n\u0001]\\\u0005^`_ba\n\n) in the \"\n\"\u0019?SGT@UI.J\n\n^4c\u0019d\u0019d\u001fd\tc\n\n\u0001]\\\t^fe and\n\nsymbols show samples from the different Gaussians shown in A).\n\nproblem will be to determine the distribution over which\n\n\"\u0019?A@B\"C?ED\fFHG.@BI.J\nL\u001fV ), and a Gaussian modelWX>\n\nin each direction. The model is rotationally invariant;\nonly some of the links are shown for convenience. B) Sample sequence showing the slow\n(different\n\nFigure 1: Generative model. A) Three-layer model \"$#&%('*),+.-\u0014/102#3%4'*)5+4-6/178#,9;: with\nlayer (=8>\ndynamics (<\n) from \";NO0\n(=8>\n0(?\f@P\"C?RQ\n04G with means at the corners of the unit\nsquare and standard deviation IZJ\ndynamics in \" ; the stochastic mapping into 0 and the substantial overlap in 7\n\u0001 generated\n\nK(L ), a probabilistic mapping (M\n7YQ\n\nis governed by the transition matrix s\n\nlayer stays the same for an average of about jZh\n\n\u0001 , given\nthe past experience[\nFigure 1B shows an example of a sequence of gihih steps generated from the model.\ntimesteps; and\nThe state in the \nthen switches to one of the other states, chosen equally at random. The transition\nmatrix is kTlnmSoqpRlRm . The state in the\nlayer is more weakly determined by the state\n\u0001 . The stochastic transition\nin the \nfrom \nis generated as a\n\u0001 . The standard deviation of these Gaussians\nGaussian about a mean speci(cid:2)ed by\nd\u0013v\nin each direction) is suf(cid:2)ciently large that the densities overlap substantially.\n(h\nis to use only the likelihood term (ie only the\nprobabilities wyx\n\n\u0001H{ ). The performance of this is likely to be poor, since the Gaus-\nsians in\noverlap so much. However, and this is why\nit is a paradigmatic case for our proposal, contextual information, in this case past\n\u0001 . We show how the putative effect of ACh in\nexperience, can help to determine\ncontrolling the balance between bottom-up and top-down inference in this model\ncan be used to build a good approximate inference model.\nIn order to evaluate our approximate model, we need to understand optimal in-\nference in this case. Figure 2A shows the standard HMM inference model, which\n\nlayer, with a probability of only j*r(g\n\n\u0001!_\nmHtum . Finally,\n\nThe naive solution to inferring\n\nfor the different values of\n\n\u0001Cz\n\n\u0001H{ and wyx\n\n\u0002\u0001Cz\n\n\u0001H{ . This is equivalent to just the\n\ncalculates the exact posteriors wyx\n\nforward part of the forwards-backwards algorithm22 (since we are not presently\ninterested in learning the parameters of the model). The adaptation to include the\nlayer is straightforward. Figures 3A;D;E show various aspects of exact inference\n\nfor a particular run. The histograms in (cid:2)gure 3A show that wyx\n\nwell the actual states\nthat generated the data. The upper plot shows the pos-\nterior probabilities of the actual states in the sequence (cid:150) these should be, and are,\nusually high; the lower histogram the posterior probability of the other possible\nstates; these should be, and are, usually low. Figure 3D shows the actual state se-\n\u0001 ; (cid:2)gure 3E shows the states that are individually most likely at each time\nquence \nstep (note that this is not the maximum likelihood state sequence, as found by the\nViterbi algorithm, for instance).\n\n{ captures quite\n\n\u0004!|\n\n\n\u0001\n\u0002\n\n\u0001\n\u0002\n\n\u000e\n\u000f\n\u0010\n\u0011\n\u0012\n\u000f\n\u001e\nV\n\u0004\n\u0003\n\u0003\n\u0003\n\u0003\n\u0001\n\u0004\n\u0004\n\n\u0004\nl\n\u0003\n\u0001\n\u0004\n\u0004\n\u0001\n\u0003\n\u0004\n\u0003\n\u0004\n\u0004\n\u0004\n[\n[\n\u0004\n\u0004\n\u0001\nz\n[\n\u0001\n\u0001\n|\n\f\u0003\u0005\u0004\u0007\u0006\n\n\u0001\t\b\u000b\n\r\f\n\n\u000e\u000f\u0001\t\b\u0014\n\u0011\u0010\n\n\u0001\u0013\f\n\n\u000e\u0017\u0001\u0018\u0010\n\n\u0003\u0012\u0004\u0007\u0006\n\u0001\u0013\f\n\u000e\u000f\u0001\t\b\u000b\n\u0011\u0010\n\u0003\u0005\u0004\u0016\u0015\n\u0002\u0001\n\n\u0003\u0012\u0004\n\n\u0001\u0013\f\n\n\u0001\t\b\u0014\n\u001b%\n\n\u0001\t\b\u0014\n&\u0010\n\n \"!\u0002#\n\n\u0001'\b\u000b\n\n\u0003\u001d\u0004\u0016\u0015\n\n\u0001\u001e\f\n\n\u000e\u0017\u0001\u001f\u0010\n\n\u0003\u001a\u0019\u0013\u0004\u0016\u0015\n\n\u0001\u001b\u0010\n\n\"\u0019?\n\n+`?ED\n\ninformation), and so the likelihood term dominates. C) ACh model. A single estimated\n\n?ED\fF\n+`?ED\n0(?\nG . B) Bottom-recognition model uses only a generic prior over0\nis used, in conjunction with its certainty -6?\n\n?ED\fF\nG is propagated to\nG (shown by the lengths of the thick vertical bars) and thus the\nG . This is combined with the likelihood term from the data7\u0005? to give the true\n? (which conveys no\nD\fF , reported by cholinergic activity, to\nF]G over\"C? (which is a mixture of a delta function and\n? . This is combined with the likelihood to\n\nFigure 2: Recognition models. A) Exact recognition model. =8>\nprovide the prior =8>\nprior=8>\n=8>\n\"\u0019?ED\nstate ,\nproduce an approximate prior ,\n\"C?nQ/,\n=8>.,\n\"C?ED\na uniform), and thus an approximate prior over0\ngive an approximate ,\n+`?\n=8>\nG , and a new cholinergic signal -\u0005?\n0f?\n021436587\n9;:\n\nis calculated.\n\n1\n\n1\n\n0.25\n\n 0 \n0\n0.8\n\n 0 \n0\n4\n3\n2\n1\n0\n\n1\n\n1\n\n021=<\n\n9;:\n\n?\u000b@\u001eA\u0007BC?ED=>\n\nA'\\]H_^\n\n400\n\n0\n0\n\n4\n3\n2\n1\n0\n\n0UTE1\u00073S7\n9K:\n021\n?EF&G\rHI?\u0013J\n\n1\n\n9K:\n\nA\t\\]HI^\n\n400\n\n0\n0\n\n4\n3\n2\n1\n0\n\n021\u00073S7\n\n9K:\n\n1\n\nA'\\]H_^\n\n400\n\n0f?\n\nFigure 3: Exact and approximation recognition. A) Histograms of the exact posterior distri-\n(lower,\n\n(upper) and the other possible states 0Kb\n7\f?SG (B) and the ACh-based approximation ,\n=8>\n0(?nQ\n\n+`G over the actual state 0a`\n0!Q\nbution =8>\n0c`\nG ). This shows the quality of exact representational inference. B;C) Comparison\nwritten=8>'d\nof the purely bottom up =\u000fe4>\nG (C) with\n0(?\nthe true=8>\n+`G across all values of0\n? . E) Highest probability \" state from the exact posterior distribution.\naccurate. D) Actual \"\n\" state in the ACh model.\nF) Single ,\nFigure 2B shows a purely bottom up model that only uses the likelihood terms\nto infer the distribution over\nis a\nnormalization factor. Figure 3B shows the representational performance of this\n\n. The ACh-based approximation is substantially more\n\n\u0001 . This has wKfix\n\n_hg\n\nmodel, through a scatter-plot of wkfZx\n\nbottom-up inference was correct, then all the points would lie on the line of equal-\nity (cid:150) the bow-shape shows that purely bottom-up inference is relatively poor. Fig-\nure 4C shows this in a different way, indicating the difference between the average\nsummed log probabilities of the actual states under the bottom up model and those\nunder the true posterior. The larger and more negative the difference, the worse\nlog\n\nthe approximate inference. Averaging over l\u001fhZhih\nruns, the difference is m_n4h\nunits (compared with a total log likelihood under the exact model of mpoql\u001fh ).\n\nrji where i\n{ against the exact posterior wyx\n\n{ . If\n\n\u0001]{\n\n\u0001Cz\n\n\u0001H{\n\n\u001c\n$\n\u001c\n\u001c\n\u0006\n\u001c\n\u0006\n$\n\u001c\n\u0006\n(\n)\n*\n\"\nQ\n+\nQ\nF\nQ\nF\n0\n?\nQ\n+\n?\nF\nQ\n3\n5\n7\n>\n5\n>\n7\nL\n>\nM\nN\nO\nP\nQ\nR\nM\nN\nO\nP\nQ\nR\nL\nV\nW\nX\nY\nZ\n[\n?\n@\n?\n0\n`\nQ\n+\nQ\n\u0004\n\u0004\n\u0001\nz\n\u0003\nx\n\u0003\n\u0004\n\u0004\n\u0001\nz\n\u0003\n\u0001\n\u0004\n\u0001\nz\n[\n\u0001\n\f3 ACh Inference Model\n\n\u0001uc\n\n\u0002\u0001\n\nwyx\n\n\u0001 .\n\n\u0002\u0001]\\\u0005^\n\nis used to in(cid:3)uence inference about\n\nsignal would be the uncertainty in the most likely contextual state\n\nFigure 2C shows the ACh-based approximate inference model. The information\n\u0001]\\\u0005^ , the approximated con-\nabout the context comes in the form of two quantities:\n\u0001]\\\u0005^ , and \u0001\n\u0001]\\\t^ , which is the measure of uncertainty in that\ntextual state having seen[\n\u0001]\\\t^\ncontextual state. The idea is that \u0001\nis reported by ACh, and is used to control\n(indicated by the (cid:2)lled-in ellipse) the extent to which top-down information based\non\nIf we were given the full exact\n\u0001]\\\u0005^\n\u0001]\\\t^\nposterior distribution wyx\nl\u0017m\u0003\u0002\u0005\u0004\u0007\u0006\n\nXz\n(1)\nFigure 4A shows the resulting ACh signal for the case of (cid:2)gure 3. As expected,\nACh is generally high at times when the true state \nis changing, and decreases\nduring the periods that \nis constant. During times of change, top-down infor-\nmation is confusing or potentially incorrect, and so bottom-up information should\ndominate. This is just the putative inferential effect of ACh.\nHowever, the ACh signal of (cid:2)gure 4A was calculated assuming knowledge of the\ntrue posterior, which is unreasonable. The model of (cid:2)gure 2C includes the key\nis in\n\n{ , then one natural de(cid:2)nition for this ACh\nwyx\n\n\u0001]\\\u0005^\n\u0001]\\\u0005^,_\n\n\u0001]\\\t^`_\n\n\u0001]\\\t^\n\nXz\nXz\n\n\u0002\u0001]\\\u0005^4c\u000b\u0001\n\n\u0001]\\\t^ . The full approximate inference algorithm\n\n\u0001]\\\t^ about the state of \n\u0001]\\\u0005^ approximation\nprior over \npropagation to\nconditioning\nmarginalization\nmarginalization\ncontextual inference\nACh level\n\nthe single choice of context variable\nbecomes\n\napproximation that the only other information from[\noqp\u0019\u0018\n\n\u0002\u0001]\\\t^\n\t\u000b\u0001\nwyx\b\n\u0002\u0001Cz\n\u0002\u0001]\\\t^\n\t\u000b\u0001\nx\b\n\u0002\u0001]\\\t^\n\t\u000b\u0001\nwyx\nwyx\nwyx\u001f\n\n\u0001]\\\u0005^ are used as approximate suf(cid:2)cient statistics for [\n\r\tt\n\n\u0001]\\\t^\n\u0001]\\\u0005^u{\u0005_\f\u0001\n\u0001]\\\t^\u0013\u0012\u0015\u0014\u0017\u0016\nr\n\r\tt\u000f\u000e\u0011\u0010\u0011l\u0017m\n\u0001]\\\u0005^\n\u0002\u0001]\\\u0005^!_\n\u0001]\\\u0005^u{\u0005_\u001b\u001a\nlul\nwyx\b\n\u001c\t\u000b\u0001\n\u0001]\\\u0005^u{\n\u0002\u0001]\\\t^\n\t\u000b\u0001\n\u0002\u0001Cz\n\u0001]\\\u0005^u{\u0005_\ntum\nwyx\b\n{\u001e\u001d\n\u0001]\\\t^\n\u0001]\\\u0005^\nwyx\nwyx\n\t\u0019\u0001\n{\u0005_\u001b\u001a\n\fz\nwyx\n{\u0005_\n_ argmaxl\nwyx\b\nl\u0017m \u0002!\u0004\"\u0006Al\nwyx\b\ng ), \u0014$#&%\n\r\tt\n\n(2)\n(3)\n(4)\n(5)\n(6)\n(7)\n(8)\n(9)\nwhere\n\u0001 , the num-\nis the Kronecker delta, and the constant of\nber of\nproportionality in equation 5 normalizes the full conditional distribution. The last\ntwo lines show the information that is propagated to the next time step; equation 6\n\u0001 given\nshows the representational answer from the model, the distribution over\n\u0001 . These computations are all local and straightforward, except for the represen-\n\u0001 , a point to which\ntation and normalization of the joint distribution over\n\u0001]\\\t^\nwe return later. Crucially, ACh exerts its in(cid:3)uence through equation 2. If \u0001\nis\nhigh, then the input stimulus controlled, likelihood term dominates in the condi-\n\u0001]\\\u0005^ ) and the\ntioning process (equation 5); if \u0001\nlikelihood terms balance.\nOne potentially dangerous aspect of this inference procedure is that it might get\n_Ud\u0019d\u0019d because it does not repre-\nunreasonably committed to a single state\n\u0001]\\\u0005^ given\nsent explicitly the probability accorded to the other possible values of \n\u0001]\\\u0005^ . A natural way to avoid this is to bound the ACh level from below by a con-\nstant,\n, making approximate inference slightly more stimulus-bound than exact\ninference. This approximation should add robustness. In practice, rather than use\nequation 9, we use\n\nis low, then temporal context (\n\nstates is\n\n\u0001 and\n\n\u0001uc\n\n(here\n\n\u0001]\\\u0005^\n\n\u0001]\\\u0005^\n\n\u0001$_\n\n'(\u000e\f\u0010\u001el\u0017m\u0003'\n\n\u0010\u001el\u000fm\u0003\u0002\u0005\u0004\u0007\u0006\n\n\u0001._\n\nwyx\u001f\n\n\fz\n\n{)\u0012\n\n(10)\n\n\n\n\n\u0004\n\nc\n\u0004\nz\n[\n\u0001\nl\n\n[\n{\n|\n\u0001\n|\n\u0001\n\n\n\n\u0001\nl\nm\n[\n\nw\n\nl\n\n{\nk\nm\n\n\u0004\n\nz\n\n\n\ns\n\u0016\nl\nm\n\u0004\n\n\u0004\n\u0001\nc\n\n\n\u0001\nz\n[\n\u0001\n\n\u0004\n\u0001\nc\n\n\n\u0001\nz\n\n\n{\n\u0003\n\u0001\nz\n\u0004\n\u0001\n{\n\n\u0004\n\u0001\nz\n[\n\u0001\nl\n\n\u0004\n\u0001\nc\n\n\n\u0001\n_\n[\n\u0001\n{\n\n\n\u0001\nz\n[\n\u0001\n\u001a\nt\n\nw\nx\n\u0004\n\u0001\n_\n\u0004\nc\n\n\n\u0001\nz\n[\n\u0001\n{\n\n\n\u0001\n\n\n\u0001\n_\n[\n\u0001\n{\n\n\u0001\n\u0001\n_\n\n\n\u0001\n_\n[\n\u0001\n{\n\u0003\n\n\u0004\n_\n\u0004\n[\n\u0004\n\n\n\n\n\n\n_\n\n\n\u0001\n[\n'\n\n\u0001\n\u0012\nl\n\n\n[\n\u0001\n\f1\n\n0.5\n\n0\n1\n\n0.5\n\n0\n0\n\u221210\n\u221230\n\u221250\n\u221270\n0\n\n\u0003\u0005\u0004\u0007\u0006\t\b\u000b\n\r\f\u000f\u000e\u0011\u0010\u0013\u0012\u0014\u0003\u0005\u0015\u0016\u0003\u0017\u0012\u0019\u0018\u001b\u001a\r\u001c\u001e\u001d\u001f\u0006\u0005\u0004! #\"%$'&\u0017(\n\n)+*\n\n\u0006-,.,./10\t\u0004\u001623\u001d4\u00065\n6\u00037\f\u000f\u000e\u0011\u00108\u0012\u0014\u0003\u0005\u0015\u0016\u0003\u0017\u0012\u0019\u0018\u00119\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n\u0012\u00140\u0007CL \u001eM-\"GFNH'*KJ\n\n@BA\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.6\n\u001d:23;'23\u001d8<=\u001d>\f\u000f\u000e\r\u0010:\u0012\u0014\u0003\u000b\u0015\u0007\u0003\u0005\u0012\u001b?\n\n0.5\n\n@BA\n\n\u0012\u00140\u0007CED\n\n #\"GFIH.*KJ\n\n0.7\n\n0.8\n\n0.9\n\n1\n\nFigure 4: ACh model. A) ACh level from the exact posterior for one run. B) ACh level\nin the approximate model in the same run. Note the coarse similarity between A and\n\nB. C) Solid: the mean extra representational cost for the true state 0\n\nposterior using the ACh model as a function of the minimum allowed ACh level\n. Dashed:\nthe same quantity for the pure bottom-up model (which is equivalent to the approximate\nmodel for\n\n? over that in the exact\n' ). Errorbars (which are almost invisible) show standard errors of the means\n\ntrials.\n\nO`@\n\nover '\n\nFigure 4B shows the approximate ACh level for the same case as (cid:2)gure 4A, using\n\nI\u001fIfI\nl . Although the detailed value of this signal is clearly different from that\narising from an exact knowledge of the posterior probabilities (in (cid:2)gure 4A), the\nin preventing the ACh level\ngross movements are quite similar. Note the effect of\nfrom dropping to h . Figure 3C shows that the ACh-based approximate posterior\n{ are much closer to the true values than for the purely bottom-up\nvalues \n\u0001]{ near h and l , where most data lie. Fig-\nmodel, particularly for values of wyx\nis\nure 3F shows that inference about \nis noisy, but the pattern of true values \non the quality of in-\ncertainly visible. Figure 4C shows the effect of changing\n\u0001 . This shows differences between approximate and\nference about the true states\nexact log probabilities of the true states\nl , then\ninference is completely stimulus-bound, just like the purely bottom-up model; val-\no appear to do well for this and other settings of the parameters\nues of\nof the problem. An upper bound on the performance of approximate inference can\n\u0001 ,\n\u0001 and \u0001\nbe calculated in three steps by: i) using the exact posterior to work out\n{ as in equation 2, and iii) using this\nii) using these values to approximate wyx\u001f\napproximate distribution in equation 4 and the remaining equations. The average\nresulting cost (ie the average resulting difference from the log probability under\nlog units. Thus, the ACh-based approximation performs\nexact inference) is m\n\n\u0001 , averaged over l\u001fhihZh cases. If\n\nwell, and much better than purely bottom-up inference.\n\nless thanh\n\n\t\u000b\u0001\n\n4 Discussion\n\nWe have suggested that one of the roles of ACh in cortical processing is to report\ncontextual uncertainty in order to control the balance between stimulus-bound,\nbottom-up, processing, and contextually-bound, top-down processing. We used\nthe example of a hierarchical HMM in which representational inference for a mid-\ndle layer should correctly re(cid:3)ect such a balance, and showed that a simple model\nof the drive and effects of ACh leads to competent inference.\nThis model is clearly overly simple. In particular, it uses a localist representation\nfor the state  , and so exact inference would be feasible. In a more realistic case,\ndistributed representations would be used at multiple levels in the hierarchy, and\nso only one single context could be entertained at once. Then, it would also not be\npossible to represent the degree of uncertainty using the level of activities of the\n\n\n\u0001\n\u0002\n&\n-\n?\n`\nO\n'\n_\nh\nd\n'\nw\nx\n\u0004\nz\n[\n\u0004\n\u0001\nz\n[\n|\n\u0001\n'\n\u0004\n|\n\u0004\n|\n'\n_\n'\nd\n\n\n\n\u0001\n\u0001\nj\nd\nv\n\funits representing the context, at least given a population-coded representation. It\nwould also be necessary to modify the steps in equations 4 and 5, since it would\nbe hard to represent the joint uncertainty over representations at multiple levels\nin the hierarchy. Nevertheless, our model shows the feasibility of using an ACh\nsignal in helping propagate and use approximate information over time.\nSince it is straightforward to administer cholinergic agonists and antagonists, there\nare many ways to test aspects of this proposal. We plan to start by using the\nparadigm of Ress et al,24 which uses fMRI techniques to study bottom-up and\ntop-down in(cid:3)uences on the detection of simple visual targets. Preliminary simu-\nlation studies indicate that a hidden Markov model under controllable cholinergic\nmodulation can capture several aspects of existent data on animal signal detection\ntasks.18\n\nAcknowledgements\nWe are very grateful to Michael Hasselmo, David Heeger, Sham Kakade and Sz-\nabolcs K\u00b7ali for helpful discussions. Funding was from the Gatsby Charitable Foun-\ndation and the NSF. Reference [28] is an extended version of this paper.\n\nReferences\n[1] Carpenter, GA & Grossberg, S, editors (1991) Pattern Recognition by Self-\n\nOrganizing Neural Networks. Cambridge, MA: MIT Press.\n\n[2] Dayan, P (1999). Recurrent sampling models for the Helmholtz machine. Neu-\n\nral Computation, 11:653-677.\n\n[3] Dayan, P, Hinton, GE, Neal, RM & Zemel, RS (1995) The Helmholtz machine.\n\nNeural Computation 7:889-904.\n\n[4] Dayan, P, Kakade, S & Montague, PR (2000). Learning and selective attention.\n\n[5] Doya, K (1999) Metalearning, neuromodulation and emotion. The 13th Toyota\n\nNature Neuroscience, 3:1218-1223.\n\nConference on Affective Minds, 46-47.\n\n[6] Everitt, BJ & Robbins, TW (1997) Central cholinergic systems and cognition.\n\nAnnual Review of Psychology 48:649-684.\n\n[7] Fellous, J-M, Linster, C (1998) Computational models of neuromodulation.\n\nNeural Computation 10:771-805.\n\n[8] Grenander, U (1976-1981) Lectures in Pattern Theory I, II and III: Pattern Analy-\n\nsis, Pattern Synthesis and Regular Structures. Berlin:Springer-Verlag.\n\n[9] Hasselmo, ME (1995) Neuromodulation and cortical function: Modeling the\n\nphysiological basis of behavior. Behavioural Brain Research 67:1-27.\n\n[10] Hasselmo, M (1999) Neuromodulation: acetylcholine and memory consolida-\n\ntion. Trends in Cognitive Sciences 3:351-359.\n\n[11] Hasselmo, ME & Bower, JM (1993) Acetylcholine and memory. Trends in Neu-\n\nrosciences 16:218-222.\n\n[12] Hasselmo, ME, Wyble, BP & Wallenstein, GV (1996) Encoding and retrieval\nof episodic memories: Role of cholinergic and GABAergic modulation in the\nhippocampus. Hippocampus 6:693-708.\n\n[13] Hinton, GE, & Ghahramani, Z (1997) Generative models for discovering\nsparse distributed representations. Philosophical Transactions of the Royal So-\nciety of London. B352:1177-1190.\n\n[14] Holland, PC (1997) Brain mechanisms for changes in processing of condi-\ntioned stimuli in Pavlovian conditioning: Implications for behavior theory.\nAnimal Learning & Behavior 25:373-399.\n\n\f[15] Holland, PC & Gallagher, M (1999) Amygdala circuitry in attentional and rep-\n\nresentational processes. Trends In Cognitive Sciences 3:65-73.\n\n[16] Kakade, S & Dayan, P (2000). Dopamine bonuses. In TK Leen, TG Dietterich\n\n& V Tresp, editors, NIPS 2000.\n\n[17] MacKay, DM (1956) The epistemological problem for automata. In CE Shan-\nnon & J McCarthy, editors, Automata Studies. Princeton, NJ: Princeton Univer-\nsity Press, 235-251.\n\n[18] McGaughy, J, Kaiser, T, & Sarter, M. (1996). Behavioral vigilance following\ninfusions of 192 IgG=saporin into the basal forebrain: selectivity of the be-\nhavioral impairment and relation to cortical AChE-positive (cid:2)ber density. Be-\nhavioral Neuroscience 110: 247-265.\n\n[19] Mumford, D (1994) Neuronal architectures for pattern-theoretic problems.\nIn C Koch & J Davis, editors, Large-Scale Theories of the Cortex. Cambridge,\nMA:MIT Press, 125-152.\n\n[20] Pearce, JM & Hall, G (1980) A model for Pavlovian learning: Variation in\nthe effectiveness of conditioned but not unconditioned stimuli. Psychological\nReview 87:532-552.\n\n[21] P(cid:3)uger, HJ (1999) Neuromodulation during motor development and behav-\n\nior. Current Opinion in Neurobiology 9:683-689.\n\n[22] Rabiner, LR (1989) A tutorial on hidden Markov models and selected applica-\n\ntions in speech recognition. Proceedings of the IEEE 77:257-286.\n\n[23] Rao, RPN & Ballard, DH (1997) Dynamic model of visual recognition predicts\nneural response properties in the visual cortex. Neural Computation 9:721-763.\n[24] Ress, D, Backus, BT & Heeger, DJ (2000) Activity in primary visual cortex\npredicts performance in a visual detection task. Nature Neuroscience 3:940-945.\n[25] Sarter, M, Bruno, JP (1997) Cognitive functions of cortical acetylcholine: To-\n\nward a unifying hypothesis. Brain Research Reviews 23:28-46.\n\n[26] Schultz, W (1998) Predictive reward signal of dopamine neurons. Journal of\n\nNeurophysiology 80:1(cid:150)27.\n\n[27] Schultz, W, Dayan, P & Montague, PR (1997). A neural substrate of prediction\n\nand reward. Science, 275, 1593-1599.\n\n[28] Yu, A & Dayan, P (2002). Acetylcholine in cortical inference. Submitted to\n\nNeural Networks.\n\n\f", "award": [], "sourceid": 2103, "authors": [{"given_name": "Peter", "family_name": "Dayan", "institution": null}, {"given_name": "Angela", "family_name": "Yu", "institution": null}]}