{"title": "Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference", "book": "Advances in Neural Information Processing Systems", "page_first": 1304, "page_last": 1312, "abstract": null, "full_text": "Ef\ufb01cient coding provides a direct link between prior\n\nand likelihood in perceptual Bayesian inference\n\nXue-Xin Wei and Alan A. Stocker\u2217\nDepartments of Psychology and\n\nElectrical and Systems Engineering\n\nUniversity of Pennsylvania\n\nPhiladelphia, PA-19104, U.S.A.\n\nAbstract\n\nA common challenge for Bayesian models of perception is the fact that the two\nfundamental Bayesian components, the prior distribution and the likelihood func-\ntion, are formally unconstrained. Here we argue that a neural system that emulates\nBayesian inference is naturally constrained by the way it represents sensory infor-\nmation in populations of neurons. More speci\ufb01cally, we show that an ef\ufb01cient\ncoding principle creates a direct link between prior and likelihood based on the\nunderlying stimulus distribution. The resulting Bayesian estimates can show bi-\nases away from the peaks of the prior distribution, a behavior seemingly at odds\nwith the traditional view of Bayesian estimation, yet one that has been reported\nin human perception. We demonstrate that our framework correctly accounts for\nthe repulsive biases previously reported for the perception of visual orientation,\nand show that the predicted tuning characteristics of the model neurons match\nthe reported orientation tuning properties of neurons in primary visual cortex.\nOur results suggest that ef\ufb01cient coding is a promising hypothesis in constrain-\ning Bayesian models of perceptual inference.\n\n1 Motivation\n\nHuman perception is not perfect. Biases have been observed in a large number of perceptual tasks\nand modalities, of which the most salient ones constitute many well-known perceptual illusions. It\nhas been suggested, however, that these biases do not re\ufb02ect a failure of perception but rather an ob-\nserver\u2019s attempt to optimally combine the inherently noisy and ambiguous sensory information with\nappropriate prior knowledge about the world [13, 4, 14]. This hypothesis, which we will refer to as\nthe Bayesian hypothesis, has indeed proven quite successful in providing a normative explanation of\nperception at a qualitative and, more recently, quantitative level (see e.g. [15]). A major challenge in\nforming models based on the Bayesian hypothesis is the correct selection of two main components:\nthe prior distribution (belief) and the likelihood function. This has encouraged some to criticize the\nBayesian hypothesis altogether, claiming that arbitrary choices for these components always allow\nfor unjusti\ufb01ed post-hoc explanations of the data [1].\nWe do not share this criticism, referring to a number of successful attempts to constrain prior beliefs\nand likelihood functions based on principled grounds. For example, prior beliefs have been de\ufb01ned\nas the relative distribution of the sensory variable in the environment in cases where these statistics\nare relatively easy to measure (e.g. local visual orientations [16]), or where it can be assumed that\nsubjects have learned them over the course of the experiment (e.g.\ntime perception [17]). Other\nstudies have constrained the likelihood function according to known noise characteristics of neurons\nthat are crucially involved in the speci\ufb01c perceptual process (e.g motion tuned neurons in visual cor-\n\n\u2217http://www.sas.upenn.edu/ astocker/lab\n\n1\n\n\fFigure 1: Encoding-decoding framework. A stimulus representing a sensory variable \u03b8 elicits a \ufb01ring\nrate response R = {r1, r2, ..., rN} in a population of N neurons. The perceptual task is to generate a\ngood estimate \u02c6\u03b8(R) of the presented value of the sensory variable based on this population response.\nOur framework assumes that encoding is ef\ufb01cient, and decoding is Bayesian based on the likelihood\np(R|\u03b8), the prior p(\u03b8), and a squared-error loss function.\n\ntex [18]). However, we agree that \ufb01nding appropriate constraints is generally dif\ufb01cult and that prior\nbeliefs and likelihood functions have been often selected on the basis of mathematical convenience.\nHere, we propose that the ef\ufb01cient coding hypothesis [19] offers a joint constraint on the prior and\nlikelihood function in neural implementations of Bayesian inference. Ef\ufb01cient coding provides a\nnormative description of how neurons encode sensory information, and suggests a direct link be-\ntween measured perceptual discriminability, neural tuning characteristics, and environmental statis-\ntics [11]. We show how this link can be extended to a full Bayesian account of perception that\nincludes perceptual biases. We validate our model framework against behavioral as well as neural\ndata characterizing the perception of visual orientation. We demonstrate that we can account not\nonly for the reported perceptual biases away from the cardinal orientations, but also for the spe-\nci\ufb01c response characteristics of orientation-tuned neurons in primary visual cortex. Our work is a\nnovel proposal of how two important normative hypotheses in perception science, namely ef\ufb01cient\n(en)coding and Bayesian decoding, might be linked.\n\n2 Encoding-decoding framework\n\nWe consider perception as an inference process that takes place along the simpli\ufb01ed neural encoding-\ndecoding cascade illustrated in Fig. 11.\n\n2.1 Ef\ufb01cient encoding\n\nEf\ufb01cient encoding proposes that the tuning characteristics of a neural population are adapted to\nthe prior distribution p(\u03b8) of the sensory variable such that the population optimally represents the\nsensory variable [19]. Different de\ufb01nitions of \u201coptimally\u201d are possible, and may lead to different\nresults. Here, we assume an ef\ufb01cient representation that maximizes the mutual information between\nthe sensory variable and the population response. With this de\ufb01nition and an upper limit on the total\n\ufb01ring activity, the square-root of the Fisher Information must be proportional to the prior distribu-\ntion [12, 21].\nIn order to constrain the tuning curves of individual neurons in the population we also impose a\nhomogeneity constraint, requiring that there exists a one-to-one mapping F (\u03b8) that transforms the\nphysical space with units \u03b8 to a homogeneous space with units \u02dc\u03b8 = F (\u03b8) in which the stimulus\ndistribution becomes uniform. This de\ufb01nes the mapping as\n\n(cid:90) \u03b8\n\n\u2212\u221e\n\nF (\u03b8) =\n\np(\u03c7)d\u03c7 ,\n\n(1)\n\nwhich is the cumulative of the prior distribution p(\u03b8). We then assume a neural population with iden-\ntical tuning curves that evenly tiles the stimulus range in this homogeneous space. The population\nprovides an ef\ufb01cient representation of the sensory variable \u03b8 according to the above constraints [11].\nThe tuning curves in the physical space are obtained by applying the inverse mapping F \u22121(\u02dc\u03b8). Fig. 2\n\n1In the context of this paper, we consider \u2018inferring\u2019, \u2018decoding\u2019, and \u2018estimating\u2019 as synonymous.\n\n2\n\nworldneural representationefficientencodingperceptBayesiandecoding\fFigure 2: Ef\ufb01cient encoding constrains the likelihood function. a) Prior distribution p(\u03b8) derived\nfrom stimulus statistics. b) Ef\ufb01cient coding de\ufb01nes the shape of the tuning curves in the physical\nspace by transforming a set of homogeneous neurons using a mapping F \u22121 that is the inverse of\nthe cumulative of the prior p(\u03b8) (see Eq. (1)). c) As a result, the likelihood shape is constrained by\nthe prior distribution showing heavier tails on the side of lower prior density. d) Fisher information,\ndiscrimination threshold, and average \ufb01ring rates are all uniform in the homogeneous space.\n\nillustrates the applied ef\ufb01cient encoding scheme, the mapping, and the concept of the homogeneous\nspace for the example of a symmetric, exponentially decaying prior distribution p(\u03b8). The key idea\nhere is that by assuming ef\ufb01cient encoding, the prior (i.e.\nthe stimulus distribution in the world)\ndirectly constrains the likelihood function. In particular, the shape of the likelihood is determined\nby the cumulative distribution of the prior. As a result, the likelihood is generally asymmetric, as\nshown in Fig. 2, exhibiting heavier tails on the side of the prior with lower density.\n\n2.2 Bayesian decoding\n\nLet us consider a population of N sensory neurons that ef\ufb01ciently represents a stimulus variable \u03b8\nas described above. A stimulus \u03b80 elicits a speci\ufb01c population response that is characterized by the\nvector R = [r1, r2, ..., rN ] where ri is the spike-count of the ith neuron over a given time-window\n\u03c4. Under the assumption that the variability in the individual \ufb01ring rates is governed by a Poisson\nprocess, we can write the likelihood function over \u03b8 as\n\nN(cid:89)\n\np(R|\u03b8) =\n\nri!\n\ni=1\n\n(\u03c4 fi(\u03b8))ri\n\ne\u2212\u03c4 fi(\u03b8) ,\n\n(2)\n\nwith fi(\u03b8) describing the tuning curve of neuron i. We then de\ufb01ne a Bayesian decoder \u02c6\u03b8LSE as\nthe estimator that minimizes the expected squared-error between the estimate and the true stimulus\nvalue, thus\n\n\u02c6\u03b8LSE(R) =\n\n,\n\n(3)\n\n(cid:82) \u03b8p(R|\u03b8)p(\u03b8)d\u03b8\n(cid:82) p(R|\u03b8)p(\u03b8)d\u03b8\n\nwhere we use Bayes\u2019 rule to appropriately combine the sensory evidence with the stimulus prior\np(\u03b8).\n\n3 Bayesian estimates can be biased away from prior peaks\n\nBayesian models of perception typically predict perceptual biases toward the peaks of the prior den-\nsity, a characteristic often considered a hallmark of Bayesian inference. This originates from the\n\n3\n\nfiring rate [ Hz]andandefficient encodingsamples #stimulus distribution likelihoodhomogeneous spacephysical spaceasymmetricsymmetriclikelihood functionabcdFF-1Fisher informationdiscriminabilityaverage firing rates\fFigure 3: Bayesian estimates biased away from the prior. a) If the likelihood function is symmetric,\nthen the estimate (posterior mean) is, on average, shifted away from the actual value of the sensory\nvariable \u03b80 towards the prior peak. b) Ef\ufb01cient encoding typically leads to an asymmetric likelihood\nfunction whose normalized mean is away from the peak of the prior (relative to \u03b80). The estimate\nis determined by a combination of prior attraction and shifted likelihood mean, and can exhibit an\noverall repulsive bias. c) If p(\u03b80)(cid:48) < 0 and the likelihood is relatively narrow, then (1/p(\u03b8)2)\n(cid:48)\n> 0\n(blue line) and the estimate is biased away from the prior peak (see Eq. (6)).\n\ncommon approach of choosing a parametric description of the likelihood function that is computa-\ntionally convenient (e.g. Gaussian). As a consequence, likelihood functions are typically assumed to\nbe symmetric (but see [23, 24]), leaving the bias of the Bayesian estimator to be mainly determined\nby the shape of the prior density, i.e. leading to biases toward the peak of the prior (Fig. 3a).\nIn our model framework, the shape of the likelihood function is constrained by the stimulus prior\nvia ef\ufb01cient neural encoding, and is generally not symmetric for non-\ufb02at priors. It has a heavier tail\non the side with lower prior density (Fig. 3b). The intuition is that due to the ef\ufb01cient allocation\nof neural resources, the side with smaller prior density will be encoded less accurately, leading to a\nbroader likelihood function on that side. The likelihood asymmetry pulls the Bayes\u2019 least-squares\nestimate away from the peak of the prior while at the same time the prior pulls it toward its peak.\nThus, the resulting estimation bias is the combination of these two counter-acting forces - and both\nare determined by the prior!\n\n3.1 General derivation of the estimation bias\n\nIn the following, we will formally derive the mean estimation bias b(\u03b8) of the proposed encoding-\ndecoding framework. Speci\ufb01cally, we will study the conditions for which the bias is repulsive i.e.\naway from the peak of the prior density.\nWe \ufb01rst re-write the estimator \u02c6\u03b8LSE (3) by replacing \u03b8 with the inverse of its mapping to the homo-\ngeneous space, i.e., \u03b8 = F \u22121(\u02dc\u03b8). The motivation for this is that the likelihood in the homogeneous\nspace is symmetric (Fig. 2). Given a value \u03b80 and the elicited population response R, we can write\nthe estimator as\n\n(cid:82) \u03b8p(R|\u03b8)p(\u03b8)d\u03b8\n(cid:82) p(R|\u03b8)p(\u03b8)d\u03b8\n\n(cid:82) F \u22121(\u02dc\u03b8)p(R|F \u22121(\u02dc\u03b8))p(F \u22121(\u02dc\u03b8))dF \u22121(\u02dc\u03b8)\n(cid:82) p(R|F \u22121(\u02dc\u03b8))p(F \u22121(\u02dc\u03b8))dF \u22121(\u02dc\u03b8)\n\n.\n\n=\n\n\u02c6\u03b8LSE(R) =\n\nCalculating the derivative of the inverse function and noting that F is the cumulative of the prior\ndensity, we get\n\ndF \u22121(\u02dc\u03b8) = (F \u22121(\u02dc\u03b8))(cid:48)d\u02dc\u03b8 =\n\nHence, we can simplify \u02c6\u03b8LSE(R) as\n\n\u02c6\u03b8LSE(R) =\n\nWith\n\n1\n\np(F \u22121(\u02dc\u03b8))\n\nd\u02dc\u03b8.\n\n.\n\nd\u02dc\u03b8 =\n\n1\nF (\u03b8)(cid:48) d\u02dc\u03b8 =\n\n1\n\np(\u03b8)\n\n(cid:82) F \u22121(\u02dc\u03b8)p(R|F \u22121(\u02dc\u03b8))d\u02dc\u03b8\n(cid:82) p(R|F \u22121(\u02dc\u03b8))d\u02dc\u03b8\n(cid:82) p(R|F \u22121(\u02dc\u03b8))d\u02dc\u03b8\n\np(R|F \u22121(\u02dc\u03b8))\n\nK(R, \u02dc\u03b8) =\n\n4\n\naposterior meanposterior meanprior attractionprior attractionlikelihood repulsion!priorpriorrepulsive biascblikelihoodlikelihoodlikelihood meanprior\fwe can further simplify the notation and get\n\n\u02c6\u03b8LSE(R) =\n\n(cid:90)\n\nF \u22121(\u02dc\u03b8)K(R, \u02dc\u03b8)d\u02dc\u03b8 .\n\n(4)\n\nIn order to get the expected value of the estimate, \u02c6\u03b8LSE(\u02dc\u03b8), we marginalize (4) over the population\nresponse space S,\n\n(cid:90)\n\n\u02c6\u03b8LSE(\u02dc\u03b8) =\n\np(R)F \u22121(\u02dc\u03b8)K(R, \u02dc\u03b8)d\u02dc\u03b8dR\n\n(cid:90)\n\n=\n\nF \u22121(\u02dc\u03b8)(\n\np(R)K(R, \u02dc\u03b8)dR)d\u02dc\u03b8 =\n\nF \u22121(\u02dc\u03b8)L(\u02dc\u03b8)d\u02dc\u03b8,\n\n(cid:90)\n\n(cid:90)\n\nS\n\n(cid:90)\n\n(cid:90)\n\nS\n\n(cid:90)\n\n(cid:90) \u02dc\u03b80+h\n(cid:90) \u02dc\u03b80+h\n\n\u02dc\u03b80\u2212h\n\n1\n2\n\n\u02dc\u03b80\u2212h\n\n=\n\n1\n2\n\n\u2212(\n\np(\u03b8x)(cid:48)\np(\u03b8x)3 )(\u02dc\u03b8 \u2212 \u02dc\u03b80)2L(\u02dc\u03b8)d\u02dc\u03b8 =\n\n\u03b8\n\nwhere we de\ufb01ne\n\nIt follows that(cid:82) L(\u02dc\u03b8)d\u02dc\u03b8 = 1. Due to the symmetry in this space, it can be shown that L(\u02dc\u03b8) is\n\np(R)K(R, \u02dc\u03b8)dR.\n\nL(\u02dc\u03b8) =\n\nS\n\nsymmetric around the true stimulus value \u02dc\u03b80. Intuitively, L(\u02dc\u03b8) can be thought as the normalized\naverage likelihood in the homogeneous space. We can then compute the expected bias at \u03b80 as\n\nb(\u03b80) =\n\nF \u22121(\u02dc\u03b8)L(\u02dc\u03b8)d\u02dc\u03b8 \u2212 F \u22121(\u02dc\u03b80)\n\n(5)\n\nThis is expression is general where F \u22121(\u02dc\u03b8) is de\ufb01ned as the inverse of the cumulative of an arbitrary\nprior density p(\u03b8) (see Eq. (1)) and the dispersion of L(\u02dc\u03b8) is determined by the internal noise level.\nAssuming the prior density to be smooth, we expand F \u22121 in a neighborhood (\u02dc\u03b80 \u2212 h, \u02dc\u03b80 + h) that\nis larger than the support of the likelihood function. Using Taylor\u2019s theorem with mean-value forms\nof the remainder, we get\n\nF \u22121(\u02dc\u03b8) = F \u22121(\u02dc\u03b80) + F \u22121(\u02dc\u03b80)(cid:48)(\u02dc\u03b8 \u2212 \u02dc\u03b80) +\n\nF \u22121(\u02dc\u03b8x)(cid:48)(cid:48)(\u02dc\u03b8 \u2212 \u02dc\u03b80)2 ,\n\nwith \u02dc\u03b8x lying between \u02dc\u03b80 and \u02dc\u03b8. By applying this expression to (5), we \ufb01nd\n\nb(\u03b80) =\n\nF \u22121(\u02dc\u03b8x)(cid:48)(cid:48)\n\n\u02dc\u03b8 (\u02dc\u03b8 \u2212 \u02dc\u03b80)2L(\u02dc\u03b8)d\u02dc\u03b8 =\n\n\u02dc\u03b8(\u02dc\u03b8 \u2212 \u02dc\u03b80)2L(\u02dc\u03b8)d\u02dc\u03b8\n)(cid:48)\n\n1\n2\n\n(cid:90) \u02dc\u03b80+h\n(cid:90) \u02dc\u03b80+h\n\n\u02dc\u03b80\u2212h\n\n\u02dc\u03b80\u2212h\n\n1\n2\n\n1\n4\n\n1\n\n(\n\n(\n\np(F \u22121(\u02dc\u03b8x))\np(\u03b8x)2 )(cid:48)\n\n1\n\n\u03b8(\u02dc\u03b8 \u2212 \u02dc\u03b80)2L(\u02dc\u03b8)d\u02dc\u03b8.\n\nIn general, there is no simple rule to judge the sign of b(\u03b80). However, if the prior is monotonic\non the interval F \u22121((\u02dc\u03b80 \u2212 h, \u02dc\u03b80 + h)), then the sign of (\np(\u03b8x)2 )(cid:48) is always the same as the sign of\np(\u03b80)2 )(cid:48),\np(\u03b80)2 )(cid:48). Also, if the likelihood is suf\ufb01ciently narrow we can approximate (\n(\nand therefore approximate the bias as\n\np(\u03b8x)2 )(cid:48) by (\n\n1\n\n1\n\n1\n\n1\n\nb(\u03b80) \u2248 C(\n\n1\n\np(\u03b80)2 )(cid:48) ,\n\n(6)\n\nwhere C is a positive constant.\nThe result is quite surprising because it states that as long as the prior is monotonic over the support\nof the likelihood function, the expected estimation bias is always away from the peaks of the prior!\n\n3.2\n\nInternal (neural) versus external (stimulus) noise\n\nThe above derivation of estimation bias is based on the assumption that all uncertainty about the\nsensory variable is caused by neural response variability. This level of internal noise depends on the\nresponse magnitude, and thus can be modulated e.g. by changing stimulus contrast. This contrast-\ncontrolled noise modulation is commonly exploited in perceptual studies (e.g. [18]). Internal noise\nwill always lead to repulsive biases in our framework if the prior is monotonic. If internal noise is\nlow, the likelihood is narrow and thus the bias is small. Increasing internal noise leads to increasingly\n\n5\n\n\flarger biases up to the point where the likelihood becomes wide enough such that monotonicity of\nthe prior over the support of the likelihood is potentially violated.\nStimulus noise is another way to modulate the noise level in perception (e.g. random-dot motion\nstimuli). Such external noise, however, has a different effect on the shape of the likelihood function\nas compared to internal noise. It modi\ufb01es the likelihood function (2) by convolving it with the noise\nkernel. External noise is frequently chosen as additive and symmetric (e.g. zero-mean Gaussian). It\nis straightforward to prove that such symmetric external noise does not lead to a change in the mean\nof the likelihood, and thus does not alter the repulsive effect induced by its asymmetry. However, by\nincreasing the overall width of the likelihood, the attractive in\ufb02uence of the prior increases, resulting\nin an estimate that is closer to the prior peak than without external noise2.\n\n4 Perception of visual orientation\n\nWe tested our framework by modelling the perception of visual orientation. Our choice was based\non the fact that i) we have pretty good estimates of the prior distribution of local orientations in\nnatural images, ii) tuning characteristics of orientation selective neurons in visual cortex are well-\nstudied (monkey/cat), and iii) biases in perceived stimulus orientation have been well characterized.\nWe start by creating an ef\ufb01cient neural population based on measured prior distributions of local\nvisual orientation, and then compare the resulting tuning characteristics of the population and the\npredicted perceptual biases with reported data in the literature.\n\n4.1 Ef\ufb01cient neural model population for visual orientation\n\nPrevious studies measured the statistics of the local orientation in large sets of natural images and\nconsistently found that the orientation distribution is multimodal, peaking at the two cardinal orien-\ntations as shown in Fig. 4a [16, 20]. We assumed that the visual system\u2019s prior belief over orientation\np(\u03b8) follows this distribution and approximate it formally as\n\np(\u03b8) \u221d 2 \u2212 | sin(\u03b8)| (black line in Fig. 4b) .\n\n(7)\n\nBased on this prior distribution we de\ufb01ned an ef\ufb01cient neural representation for orientation. We\nassumed a population of model neurons (N = 30) with tuning curves that follow a von-Mises\ndistribution in the homogeneous space on top of a constant spontaneous \ufb01ring rate (5 Hz). We then\napplied the inverse transformation F \u22121(\u02dc\u03b8) to all these tuning curves to get the corresponding tuning\ncurves in the physical space (Fig. 4b - red curves), where F (\u03b8) is the cumulative of the prior (7). The\nconcentration parameter for the von-Mises tuning curves was set to \u03ba \u2248 1.6 in the homogeneous\nspace in order to match the measured average tuning width (\u223c 32 deg) of neurons in area V1 of the\nmacaque [9].\n\n4.2 Predicted tuning characteristics of neurons in primary visual cortex\n\nThe orientation tuning characteristics of our model population well match neurophysiological data\nof neurons in primary visual cortex (V1). Ef\ufb01cient encoding predicts that the distribution of neurons\u2019\npreferred orientation follows the prior, with more neurons tuned to cardinal than oblique orientations\nby a factor of approximately 1.5. A similar ratio has been found for neurons in area V1 of mon-\nkey/cat [9, 10]. Also, the tuning widths of the model neurons vary between 25-42 deg depending\non their preferred tuning (see Fig. 4c), matching the measured tuning width ratio of 0.6 between\nneurons tuned to the cardinal versus oblique orientations [9].\nAn important prediction of our model is that most of the tuning curves should be asymmetric. Such\nasymmetries have indeed been reported for the orientation tuning of neurons in area V1 [6, 7, 8].\nWe computed the asymmetry index for our model population as de\ufb01ned in previous studies [6, 7],\nand plotted it as a function of the preferred tuning of each neuron (Fig. 4d). The overall asymmetry\nindex in our model population is 1.24\u00b1 0.11, which approximately matches the measured values for\nneurons in area V1 of the cat (1.26\u00b1 0.06) [6]. It also predicts that neurons tuned to the cardinal and\noblique orientations should show less symmetry than those tuned to orientations in between. Finally,\n\n2Note, that these predictions are likely to change if the external noise is not symmetric.\n\n6\n\n\fFigure 4: Tuning characteristics of model neurons. a) Distribution of local orientations in natural\nimages, replotted from [16]. b) Prior used in the model (black) and predicted tuning curves according\nto ef\ufb01cient coding (red). c) Tuning width as a function of preferred orientation. d) Tuning curves\nof cardinal and oblique neurons are more symmetric than those tuned to orientations in between. e)\nBoth narrowly and broadly tuned neurons neurons show less asymmetry than neurons with tuning\nwidths in between.\n\nneurons with tuning widths at the lower and upper end of the range are predicted to exhibit less\nasymmetry than those neurons whose widths lie in between these extremes (illustrated in Fig. 4e).\nThese last two predictions have not been tested yet.\n\n4.3 Predicted perceptual biases\n\nOur model framework also provides speci\ufb01c predictions for the expected perceptual biases. Humans\nshow systematic biases in perceived orientation of visual stimuli such as e.g. arrays of Gabor patches\n(Fig. 5a,d). Two types of biases can be distinguished: First, perceived orientations show an absolute\nbias away from the cardinal orientations, thus away from the peaks of the orientation prior [2, 3].\nWe refer to these biases as absolute because they are typically measured by adjusting a noise-free\nreference until it matched the orientation of the test stimulus. Interestingly, these repulsive absolute\nbiases are the larger the smaller the external stimulus noise is (see Fig. 5b). Second, the relative bias\nbetween the perceived overall orientations of a high-noise and a low-noise stimulus is toward the\ncardinal orientations as shown in Fig. 5c, and thus toward the peak of the prior distribution [3, 16].\nThe predicted perceptual biases of our model are shown Fig. 5e,f. We computed the likelihood\nfunction according to (2) and used the prior in (7). External noise was modeled by convolving\nthe stimulus likelihood function with a Gaussian (different widths for different noise levels). The\npredictions well match both, the reported absolute bias away as well as the relative biases toward\nthe cardinal orientations. Note, that our model framework correctly accounts for the fact that less\nexternal noise leads to larger absolute biases (see also discussion in section 3.2).\n\n5 Discussion\n\nWe have presented a modeling framework for perception that combines ef\ufb01cient (en)coding and\nBayesian decoding. Ef\ufb01cient coding imposes constraints on the tuning characteristics of a pop-\nulation of neurons according to the stimulus distribution (prior). It thus establishes a direct link\nbetween prior and likelihood, and provides clear constraints on the latter for a Bayesian observer\nmodel of perception. We have shown that the resulting likelihoods are in general asymmetric, with\n\n7\n\n0103050preferred tuning(deg)width (deg)e01.02.0dpreferred tuning(deg)tuning width (deg)probability00.0190-900abcorientation(deg)firing rate(Hz)00.01tuning widthasymmetry asymmetry vs. tuning width image statisticsefficient representationasymmetry indexasymmetry indexorientation(deg)90-90090-90090-9000252530354001.02.0\fFigure 5: Biases in perceived orientation: Human data vs. Model prediction. a,d) Low- and high-\nnoise orientation stimuli of the type used in [3, 16]. b) Humans show absolute biases in perceived\norientation that are away from the cardinal orientations. Data replotted from [2] (pink squares)\nand [3] (green (black) triangles: bias for low (high) external noise). c) Relative bias between stimuli\nwith different external noise level (high minus low). Data replotted from [3] (blue triangles) and [16]\n(red circles). e,f) Model predictions for absolute and relative bias.\n\nheavier tails away from the prior peaks. We demonstrated that such asymmetric likelihoods can lead\nto the counter-intuitive prediction that a Bayesian estimator is biased away from the peaks of the\nprior distribution. Interestingly, such repulsive biases have been reported for human perception of\nvisual orientation, yet a principled and consistent explanation of their existence has been missing so\nfar. Here, we suggest that these counter-intuitive biases directly follow from the asymmetries in the\nlikelihood function induced by ef\ufb01cient neural encoding of the stimulus. The good match between\nour model predictions and the measured perceptual biases and orientation tuning characteristics of\nneurons in primary visual cortex provides further support of our framework.\nPrevious work has suggested that there might be a link between stimulus statistics, neuronal tun-\ning characteristics, and perceptual behavior based on ef\ufb01cient coding principles, yet none of these\nstudies has recognized the importance of the resulting likelihood asymmetries [16, 11]. We have\ndemonstrated here that such asymmetries can be crucial in explaining perceptual data, even though\nthe resulting estimates appear \u201canti-Bayesian\u201d at \ufb01rst sight (see also models of sensory adapta-\ntion [23]).\nNote, that we do not provide a neural implementation of the Bayesian inference step. However,\nwe and others have proposed various neural decoding schemes that can approximate Bayes\u2019 least-\nsquares estimation using ef\ufb01cient coding [26, 25, 22]. It is also worth pointing out that our estimator\nis set to minimize total squared-error, and that other choices of the loss function (e.g. MAP estima-\ntor) could lead to different predictions. Our framework is general and should be directly applicable\nto other modalities. In particular, it might provide a new explanation for perceptual biases that are\nhard to reconcile with traditional Bayesian approaches [5].\n\nAcknowledgments\n\nWe thank M. Jogan and A. Tank for helpful comments on the manuscript. This work was partially\nsupported by grant ONR N000141110744.\n\n8\n\nbias(deg)bias(deg)43-3-400-90900909090-90-90-90000absolute bias (data)absolute bias (model)relative bias (model)relative bias (data)orientation (deg)orientation (deg)repulsionattractionhigh external noiselow external noiselow-noise stimulushigh-noise stimulusbcaefd\fReferences\n[1] M. Jones, and B. C. Love. Bayesian fundamentalism or enlightenment? On the explanatory status and the-\noretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, 34, 169\u2013231,2011.\n\n[2] D. P. Andrews. Perception of contours in the central fovea. Nature, 205:1218- 1220, 1965.\n[3] A. Tomassini, M. J.Morgam. and J. A. Solomon. Orientation uncertainty reduces perceived obliquity.\n\nVision Res, 50, 541\u2013547, 2010.\n\n[4] W. S. Geisler, D. Kersten. Illusions, perception and Bayes. Nature Neuroscience, 5(6):508- 510, 2002.\n[5] M. O. Ernst Perceptual learning: inverting the size-weight illusion. Current Biology, 19:R23- R25, 2009.\n[6] G. H. Henry, B. Dreher, P. O. Bishop. Orientation speci\ufb01city of cells in cat striate cortex. J Neurophysiol,\n\n37(6):1394-409,1974.\n\n[7] D. Rose, C. Blakemore An analysis of orientation selectivity in the cat\u2019s visual cortex. Exp Brain Res.,\n\nApr 30;20(1):1-17, 1974.\n\n[8] N. V. Swindale. Orientation tuning curves: empirical description and estimation of parameters. Biol\n\nCybern., 78(1):45-56, 1998.\n\n[9] R. L. De Valois, E. W. Yund, N. Hepler. The orientation and direction selectivity of cells in macaque visual\n\ncortex. Vision Res.,22, 531544,1982.\n\n[10] B. Li, M. R. Peterson, R. D. Freeman. The oblique effect: a neural basis in the visual cortex. J. Neuro-\n\nphysiol., 90, 204217, 2003.\n\n[11] D. Ganguli and E.P. Simoncelli. Implicit encoding of prior probabilities in optimal neural populations. In\n\nAdv. Neural Information Processing Systems NIPS 23, vol. 23:658\u2013666, 2011.\n\n[12] M. D. McDonnell, N. G. Stocks. Maximally Informative Stimuli and Tuning Curves for Sigmoidal Rate-\n\nCoding Neurons and Populations. Phys Rev Lett., 101(5):058103, 2008.\n\n[13] H Helmholtz. Treatise on Physiological Optics (transl.). Thoemmes Press, Bristol, U.K., 2000. Original\n\npublication 1867.\n\n[14] Y. Weiss, E. Simoncelli, and E. Adelson. Motion illusions as optimal percept. Nature Neuroscience,\n\n5(6):598\u2013604, June 2002.\n\n[15] D.C. Knill and W. Richards, editors. Perception as Bayesian Inference. Cambridge University Press,\n\n1996.\n\n[16] A R Girshick, M S Landy, and E P Simoncelli. Cardinal rules: visual orientation perception re\ufb02ects\n\nknowledge of environmental statistics. Nat Neurosci, 14(7):926\u2013932, Jul 2011.\n\n[17] M. Jazayeri and M.N. Shadlen. Temporal context calibrates interval timing. Nature Neuroscience,\n\n13(8):914\u2013916, 2010.\n\n[18] A.A. Stocker and E.P. Simoncelli. Noise characteristics and prior expectations in human visual speed\n\nperception. Nature Neuroscience, pages 578\u2013585, April 2006.\n\n[19] H.B. Barlow. Possible principles underlying the transformation of sensory messages. In W.A. Rosenblith,\n\neditor, Sensory Communication, pages 217\u2013234. MIT Press, Cambridge, MA, 1961.\n\n[20] D.M. Coppola, H.R. Purves, A.N. McCoy, and D. Purves The distribution of oriented contours in the real\n\nworld. Proc Natl Acad Sci U S A., 95(7): 4002\u20134006, 1998.\n\n[21] N. Brunel and J.-P. Nadal. Mutual information, Fisher information and population coding. Neural Com-\n\nputation, 10, 7, 1731\u20131757, 1998.\n\n[22] X-X. Wei and A.A. Stocker. Bayesian inference with ef\ufb01cient neural population codes. In Lecture Notes\nin Computer Science, Arti\ufb01cial Neural Networks and Machine Learning - ICANN 2012, Lausanne, Switzer-\nland, volume 7552, pages 523\u2013530, 2012.\n\n[23] A.A. Stocker and E.P. Simoncelli. Sensory adaptation within a Bayesian framework for perception. In\nY. Weiss, B. Sch\u00a8olkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages\n1291\u20131298. MIT Press, Cambridge, MA, 2006. Oral presentation.\n\n[24] D.C. Knill. Robust cue integration: A Bayesian model and evidence from cue-con\ufb02ict studies with stereo-\n\nscopic and \ufb01gure cues to slant. Journal of Vision, 7(7):1\u201324, 2007.\n\n[25] Deep Ganguli. Ef\ufb01cient coding and Bayesian inference with neural populations. PhD thesis, Center for\n\nNeural Science, New York University, New York, NY, September 2012.\n\n[26] B. Fischer. Bayesian estimates from heterogeneous population codes. In Proc. IEEE Intl. Joint Conf. on\n\nNeural Networks. IEEE, 2010.\n\n9\n\n\f", "award": [], "sourceid": 4489, "authors": [{"given_name": "Xue-xin", "family_name": "Wei", "institution": null}, {"given_name": "Alan", "family_name": "Stocker", "institution": null}]}