{"title": "Probabilistic Interpretation of Population Codes", "book": "Advances in Neural Information Processing Systems", "page_first": 676, "page_last": 684, "abstract": null, "full_text": "Probabilistic Interpretation of Population \n\nCodes \n\nRichard S. Zemel \n\nPeter Dayan \n\nzemeleu.arizona.edu dayaneai.mit.edu \n\nAlexandre Pouget \nalexesalk.edu \n\nAbstract \n\nWe present a theoretical framework for population codes which \ngeneralizes naturally to the important case where the population \nprovides information about a whole probability distribution over \nan underlying quantity rather than just a single value. We use \nthe framework to analyze two existing models, and to suggest and \nevaluate a third model for encoding such probability distributions. \n\n1 \n\nIntroduction \n\nPopulation codes, where information is represented in the activities of whole pop(cid:173)\nulations of units, are ubiquitous in the brain. There has been substantial work on \nhow animals should and/or actually do extract information about the underlying \nencoded quantity. 5,3,11,9,12 With the exception of Anderson, l this work has con(cid:173)\ncentrated on the case of extracting a single value for this quantity. We study ways \nof characterizing the joint activity of a population as coding a whole probability \ndistribution over the underlying quantity. \n\nTwo examples motivate this paper: place cells in the hippocampus of freely moving \nrats that fire when the animal is at a particular part of an environment,S and cells in \narea MT of monkeys firing to a random moving dot stimulus.7 Treating the activity \nof such populations of cells as reporting a single value of their underlying variables \nis inadequate if there is (a) insufficient information to be sure (eg if a rat can be \nuncertain as to whether it is in place XA or XB then perhaps place cells for both \nlocations should fire; or (b) if multiple values underlie the input, as in the whole \ndistribution of moving random dots in the motion display. Our aim is to capture the \ncomputational power of representing a probability distribution over the underlying \nparameters. 6 \n\nRSZ is at University of Arizona, Tucson, AZ 85721; PD is at MIT, Cambridge, MA \n02139; AP is at Georgetown University, Washington, DC 20007. This work was funded by \nMcDonnell-Pew, NIH, AFOSR and startup funds from all three institutions. \n\n\fProbabilistic Interpretation of Population Codes \n\n677 \n\nIn this paper, we provide a general statistical framework for population codes, use \nit to understand existing methods for coding probability distributions and also to \ngenerate a novel method. We evaluate the methods on some example tasks. \n\n2 Population Code Interpretations \nThe starting point for almost all work on neural population codes is the neurophys(cid:173)\niological finding that many neurons respond to particular variable( s) underlying a \nstimulus according to a unimodal tuning function such as a Gaussian. This char(cid:173)\nacterizes cells near the sensory periphery and also cells that report the results of \nmore complex processing, including receiving information from groups of cells that \nthemselves have these tuning properties (in MT, for instance). Following Zemel \n& Hinton's13 analysis, we distinguish two spaces: the explicit space which consists \nof the activities r = {rd of the cells in the population, and a (typically low di(cid:173)\nmensional) implicit space which contains the underlying information X that the \npopulation encodes in which they are tuned. All processing on the basis of the \nactivities r has to be referred to the implicit space, but it itself plays no explicit \nrole in determining activities. \n\nFigure 1 illustrates our framework. At the top is the measured activities of a popu(cid:173)\nlation of cells. There are two key operations. Encoding: What is the relationship \nbetween the activities r of the cells and the underlying quantity in the world X \nthat is represented? Decoding: What information about the quantity X can be \nextracted from the activities? Since neurons are generally noisy, it is often con(cid:173)\nvenient to characterize encoding (operations A and B) in a probabilistic way, by \nspecifying P[rIX]. The simplest models make a further assumption of conditional in(cid:173)\ndependence of the different units given the underlying quantity P[rIX] = I1i P[riIX] \nalthough others characterize the degree of correlation between the units. If the en(cid:173)\ncoding model is true, then a Bayesian decoding model specifies that the information \nr carries about X can be characterized precisely as: P[Xlr] ex P[rIX]P[X], where \nP[ X] is the prior distribution about X and the constant of proportionality is set \nso that Ix P[Xlr]dX = 1. Note that starting with a deterministic quantity X in \nthe world, encoding in the firing rates r, and decoding it (operation C) results in \na probability distribution over X. This uncertainty arises from the stochasticity \nrepresented by P[rIX]. Given a loss function, we could then go on to extract a \nsingle value from this distribution (operation D). \n\nWe attack the common assumption that X is a single value of some variable x, eg \nthe single position of a rat in an environment, or the single coherent direction of \nmotion of a set of dots in a direction discrimination task. This does not capture \nthe subtleties of certain experiments, such as those in which rats can be made to be \nuncertain about their position, or in which one direction of motion predominates yet \nthere are several simultaneous motion directions. 7 Here, the natural characterization \nof X is actually a whole probability distribution P[xlw] over the value of the variable \nx (perhaps plus extra information about the number of dots), where w represents \nall the available information. We can now cast two existing classes of proposals for \npopulation codes in terms of this framework. \n\nThe Poisson Model \n\nUnder the Poisson encoding model, the quantity X encoded is indeed one particular \nvalue which we will call x, and the activities of the individual units are independent, \n\n\f678 \n\nR. S. Zemel, P. Dayan and A. Pouget \n\n1'1''''11,--_0_ +_ 0 _,_t_f_t_o _ ..... \n\nenmde \n\ni \niD \nf \n\n\" x \n\nx' \n\nx \n\nFigure 1: Left: encoding maps X from the world through tuning functions (A) into mean activ(cid:173)\nities (B), leading to Top: observed activities r. We assume complete knowledge of the variables \ngoverning systematic changes to the activities of the cells. Here X is a single value x\u00b7 in the space \nof underlying variables. Right: decoding extracts 1'[Xlr) (C)j a Single value can be picked (D) \nfrom this distribution given a loss function. \n\nwith the terms P[rilx] = e-h(x) (h(x)t' jriL The activity ri could, for example, be \nthe number of spikes the cell emits in a fixed time interval following the stimulus \nonset. A typical form for the tuning function h(x) is Gaussian h(x) j in (xj, Xj+l], and Ji(x) by a piece-wise constant histogram \nthat take the values Jij in (xj, xj+d . Generally, the maximum a posteriori estimate \nfor {\u00a2>j} can be shown to be derived by maximizing: \n\nwhere \u20ac \nis the variance of a smoothness prior. We use a form of EM to maximize \nthe likelihood and adopt the crude \u00b7approximation of averaging neighboring values \n\n(4) \n\n\f680 \n\nR. S. Zemel, P. Dayan and A. Pouget \n\nOperation \n\nExtended Poisson \n\nKDE (Projection) \n\nKDE(EM) \n\nEncode \n\n(r.) \n\nDecode \n\npr(x) \n\n(r.) = h [I\" P(xlw]f.(x)dxj \n\n(r.) = h [R.n \u2022\u2022 I\" P[xlw]f.(x)dx] \n\nf;(x) = R.n .. N(x \u2022\u2022 u) \n\nf.(x) = L:J Aijl.pj(x) \nA.j = I\" .p.(x).pj(x)dx \n\n(r.) = h [Rm .. r:J \n\nri to max. L \n\npr(x) to max. L \n\npr(x) = L:. ri.p.(x) \n\npr(x) = L:. r:.p.(x) \n\nri = I% pr(x)f.(x)dx::::: L:j tPilij \n\nr: = r./ L:J rj \n\nLikelihood \n\nL = log P [{tPi}l{ri}] ::::: L:.r;logf. \n\nL = I\" P[xIwJlogpr(x)dx \n\nError \n\nG = L:. ri log(r;!f.) \n\nE = I\" [pr(x) - P[xlwJ] 2 dx \n\nG = I\" P[xlw] log ~~X)JdX \n\nTable 1: A summary of the key operations with respect to the framework of the \ninterpretation methods compared here. hO is a rounding operator to ensure integer \nfiring rates, and 'l/Ji(X) = N(xi, 0') are the kernel functions for the KDE method. \n\nof ~j on successive iterations. By comparison with the linear decoding of the KDE \nmethod, Equation 4 offers a non-linear way of combining a set of activities {rd to \ngive a probability distribution pr(x) over the underlying variable x. The computa(cid:173)\ntional complexities of Equation 4 are irrelevant, since decoding is only an implicit \noperation that the system need never actually perform. \n\n4 Comparing the Models \n\nWe illustrate the various models by showing the faithfulness with which they can \nrepresent two bimodal distributions. We used 0' = 0.3 for the kernel functions \n(KDE) and the tuning functions (extended Poisson model) and used 50 units whose \nXi were spaced evenly in the range x = [-10,10]. Table 1 summarizes the three \nmethods. \n\nFigure 2a shows the decoded version of a mixture of two broad Gaussians \n1/2N[-2, 1] + 1/2N[2,1]. Figure 2b shows the same for a mixture of two nar(cid:173)\nrow Gaussians tN[-2, .2] + tN[2, .2]. All the models work well for representing \nthe broad Gaussians; both forms of the KDE model have difficulty with the nar(cid:173)\nrow Gaussians. The EM version of KDE puts all its weight on the nearest kernel \nfunctions, and so is too broad; the projection version 'rings' in its attempt to rep(cid:173)\nresent the narrow components of the distributions. The extended Poisson model \nreconstructs with greater fidelity. \n\n5 Discussion \n\nInformally, we have examined the consequences of the seemingly obvious step of \nsaying that if a rat, for instance, is uncertain about whether it is at one of two places, \nthen place cells representing both places could be activated. The complications \n\n\fProbabilistic Interpretation of Population Codes \n\n681 \n\n... ...,(cid:173)\n\nIM~_-\n\nA A \n\n01 \n\ntIl \n\n01 \n\nOJ \n\n11 \n\nOJ \n\nu \n\nu \n\n01 \n\nOIl \n\n01 \n\nOJ \n\nO. \n\n.. \n\n01 \n\n... ...,(cid:173)\n\nIMIBI) -\n\nA A \n\n01 \n\nOil \n\n01 \n\nO~--~--~--~--~ \n\u00b710 \n\n... ...,(cid:173)\n\n~-\n\n01 \n\nOJ \n\n'T \nD.I \n\nOJ \n\nOl \n\n02 \n\n01 \n\nFigure 2: a) (upper) All three methods provide a good fit to the bimodal Gaussian \ndistribution when its variance is sufficiently large (7 = 1.0). b) (lower) The KDE \nmodel has difficulty when 7 = 0.2. \n\ncome because the structure of the interpretation changes - for instance, one can \nno longer think of maximum likelihood methods to extract a single value from the \ncode directly. \n\nOne main fruit of our resulting framework is a method for encoding and decoding \nprobability distributions that is the natural extension of the (provably inadequate) \nstandard Poisson model for encoding and decoding single values. Cells have Pois(cid:173)\nson statistics about a mean determined by the integral of the whole probability \ndistribution, weighted by the tuning function of the cell. We suggested a particular \ndecoding model, based on an approximation to maximum likelihood decoding to a \ndiscretized version of the whole probability distribution, and showed that it recon(cid:173)\nstructs broad, narrow and multimodal distributions more accurately than either the \nstandard Poisson model or the kernel density model. Stochasticity is built into our \nmethod, since the units are supposed to have Poisson statistics, and it is therefore \nalso quite robust to noise. The decoding method is not biologically plausible, but \nprovides a quantitative lower bound to the faithfulness with which a set of activities \ncan code a distribution. \n\nStages of processing subsequent to a population code might either extract a single \nvalue from it to control behavior, or integrate it with information represented in \nother population codes to form a combined population code. Both operations must \nbe performed through standard neural operations such as taking non-linear weighted \nsums and possibly products of the activities. We are interested in how much in(cid:173)\nformation is preserved by such operations, as measured against the non-biological \n\n\f682 \n\nR. S. Zemel. P. Dayan and A. Pouget \n\nstandard of our decoding method. Modeling extraction requires modeling the loss \nfunction - there is some empirical evidence about this from a motion experiment \nin which electrical stimulation of MT cells was pitted against input from a moving \nstimulus.lO However, much works remains to be done. \n\nIntegrating two or more population codes to generate the output in the form of \nanother population code was stressed by Hinton,6 who noted that it directly relates \nto the notion of generalized Hough transforms. We are presently studying how a \nsystem can learn to perform this combination, using the EM-based decoder to gen(cid:173)\nerate targets. One special concern for combination is how to understand noise. For \ninstance, the visual system can be behaviorally extraordinarily sensitive - detecting \njust a handful of photons. However, the outputs of real cells at various stages in \nthe system are apparently quite noisy, with Poisson statistics. If noise is added at \nevery stage of processing and combination, then the final population code will not \nbe very faithful to the input. There is much current research on the issue of the \ncreation and elimination of noise in cortical synapses and neurons. \n\nA last issue that we have not treated here is certainty or magnitude. Hinton's6 idea \nof using the sum total activity of a population to code the certainty in the existence \nof the quantity they represent is attractive, provided that there is some independent \nway of knowing what the scale is for this total. We have used this scaling idea in \nboth the KDE and the extended Poisson models. In fact, we can go one stage \nfurther, and interpret greater activity still as representing information about the \nexistence of multiple objects or multiple motions. However, this treatment seems \nless appropriate for the place cell system -\nthe rat is presumably always certain \nthat it is somewhere. There it is plausible that the absolute level of activity could \nbe coding something different, such as the familiarity of a location. \n\nAn entire collection of cells is a terrible thing to waste on representing just a single \nvalue of some quantity. Representing a whole probability distribution, at least with \nsome fidelity, is not more difficult, provided that the interpretation of the encoding \nand decoding are clear. We suggest some steps in this direction. \n\nReferences \n\n[1) Anderson, CH (1994). International Journal of Modern Physics C, 5, 135-137. \n[2) Anderson, CH & Van Essen, DC (1994) . In Computational Intelligence Imitating Life, 213-222. \n\nNew York: IEEE Press. \n\n[3) Baldi, P & Heiligenberg, W (1988). Biological Cybernetics, 59, 313-318. \n(4) Dempster, AP, Laird, NM & Rubin, DB (1997) . Proceedings of the Royal Statistical Society, B \n\n[5) \n[6) \n(7) \n(8) \n[9) \n(10) \n[11) \n\n39, 1-38. \nGeorgopoulos, AP, Schwartz, AB & Kettner, RE (1986). Science, 243, 1416-1419. \nHinton, GE (1992). Scientific American, 267(3) 105-109. \nNewsome, WT, Britten, KH & Movshon, JA (1989). Nature, 341, 52-54. \nO'Keefe, J & Dostrovsky, J (1971). Brain Research, 34,171-175. \nSalinas, E & Abbott, LF (1994). Journal of Computational Neuroscience, 1, 89-107. \nSalzman, CD & Newsome, WT (1994). Science, 264, 231-237. \nSeung, HS & SompoJinsky, H (1993). Proceedings of the National Academy of Sciences, USA, \n90, 10749-10753. \n\n(12) Snippe, HP (1996). Neural Computation, 8, 29-37. \n(13) Zemel, RS & Hinton, GE (1995). Neural Computation, 7, 549-564. \n\n\fPART V \n\nIMPLEMENTATION \n\n\f\f", "award": [], "sourceid": 1324, "authors": [{"given_name": "Richard", "family_name": "Zemel", "institution": null}, {"given_name": "Peter", "family_name": "Dayan", "institution": null}, {"given_name": "Alexandre", "family_name": "Pouget", "institution": null}]}