{"title": "Optimal integration of visual speed across different spatiotemporal frequency channels", "book": "Advances in Neural Information Processing Systems", "page_first": 3201, "page_last": 3209, "abstract": "How does the human visual system compute the speed of a coherent motion stimulus that contains motion energy in different spatiotemporal frequency bands? Here we propose that perceived speed is the result of optimal integration of speed information from independent spatiotemporal frequency tuned channels. We formalize this hypothesis with a Bayesian observer model that treats the channel activity as independent cues, which are optimally combined with a prior expectation for slow speeds. We test the model against behavioral data from a 2AFC speed discrimination task with which we measured subjects' perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. We find that perceived speed of the combined stimuli is independent of the relative phase of the underlying grating components, and that the perceptual biases and discrimination thresholds are always smaller for the combined stimuli, supporting the cue combination hypothesis. The proposed Bayesian model fits the data well, accounting for perceptual biases and thresholds of both simple and combined stimuli. Fits are improved if we assume that the channel responses are subject to divisive normalization, which is in line with physiological evidence.  Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for stimuli of arbitrary spatial structure.", "full_text": "Optimal integration of visual speed across different\n\nspatiotemporal frequency channels\n\nMatja\u02c7z Jogan and Alan A. Stocker\n\nDepartment of Psychology\nUniversity of Pennsylvania\n\nPhiladelphia, PA 19104\n\n{mjogan,astocker}@sas.upenn.edu\n\nAbstract\n\nHow do humans perceive the speed of a coherent motion stimulus that contains\nmotion energy in multiple spatiotemporal frequency bands? Here we tested the\nidea that perceived speed is the result of an integration process that optimally com-\nbines speed information across independent spatiotemporal frequency channels.\nWe formalized this hypothesis with a Bayesian observer model that combines the\nlikelihood functions provided by the individual channel responses (cues). We ex-\nperimentally validated the model with a 2AFC speed discrimination experiment\nthat measured subjects\u2019 perceived speed of drifting sinusoidal gratings with differ-\nent contrasts and spatial frequencies, and of various combinations of these single\ngratings. We found that the perceived speeds of the combined stimuli are inde-\npendent of the relative phase of the underlying grating components. The results\nalso show that the discrimination thresholds are smaller for the combined stimuli\nthan for the individual grating components, supporting the cue combination hy-\npothesis. The proposed Bayesian model \ufb01ts the data well, accounting for the full\npsychometric functions of both simple and combined stimuli. Fits are improved if\nwe assume that the channel responses are subject to divisive normalization. Our\nresults provide an important step toward a more complete model of visual mo-\ntion perception that can predict perceived speeds for coherent motion stimuli of\narbitrary spatial structure.\n\n1\n\nIntroduction\n\nLow contrast stimuli are perceived to move slower than high contrast ones [17]. This effect can\nbe explained with a Bayesian observer model that assumes a prior distribution with a peak at slow\nspeeds [18, 8, 15]. This assumption has been veri\ufb01ed by reconstructing subjects\u2019 individual prior\ndistributions from psychophysical data [16]. Based on a noisy sensory measurement m of the true\nstimulus speed s the Bayesian observer model computes the posterior probability\n\np(s|m) =\n\np(m|s)p(s)\n\np(m)\n\n(1)\n\nby multiplying the likelihood function p(m|s) with the probability p(s) representing the observer\u2019s\nprior expectation. If the measurement is unreliable (e.g. if stimulus contrast is low), the likelihood\nfunction is broad and the posterior probability distribution is shifted toward the peak of the prior,\nresulting in a perceived speed that is biased toward slow speeds. While this model is able to account\nfor changes in perceived speed as a function of different internal noise levels (modulated by stimulus\ncontrast), it does not possess the power to predict the in\ufb02uence of other factors known to modulate\nperceived speed such as for example the spatial frequency of the stimulus [14, 10, 2].\n\n1\n\n\fFigure 1: a) A natural stimulus in motion exhibits a rich spatiotemporal frequency spectrum that\ndetermines how humans perceive its speed s. b) Spatiotemporal energy diagram for motion in a\ngiven direction (i.e. speed) showing individual spatiotemporal frequency channels (white circles).\nA stimulus that contains spatial frequencies of 0.5 c/deg and 1.5 c/deg and moves with a speed\nof 2 deg/s will trigger responses (cid:126)r = {r1, r2} in two corresponding channels (red circles). The\nuncertainty about s given the response vector (cid:126)r is expressed in the joint likelihood function p((cid:126)r|s).\n\nIn this paper we make a step toward a more general observer model of visual speed perception that,\nin the longterm, will allow us to predict perceived speed for arbitrary complex stimuli (Fig. 1a).\nInspired by physiological and psychophysical evidence we present an extension of the standard\nBayesian model (Eq. 1), which decomposes complex motion stimuli into simpler components pro-\ncessed in separate spatiotemporal frequency channels. Based on the motion energy model [1, 12],\nwe assume that each channel is sensitive to a narrow spatiotemporal frequency band. The observed\nspeed of a stimulus is then a result of combining the sensory evidence provided by these individual\nchannels with a prior expectation for slow speeds. Optimal integration of different sources of sen-\nsory evidence has been well documented in cue-combination experiments using cues of different\nmodalities (see e.g. [4, 7]). Here we employ an analogous approach by treating the responses of\nindividual spatiotemporal frequency channels as independent cues about a stimulus\u2019 motion.\nWe validated the model against the data of a series of psychophysical experiments in which we mea-\nsured how humans\u2019 speed percept of coherent motion depends on the stimulus energy in different\nspatial frequency bands. Stimuli consisted of drifting sinusoidal gratings at two different spatial\nfrequencies and contrasts, and various combinations of these single gratings. For a given stimulus\nspeed s, single gratings target only one channel while the combined stimuli target multiple chan-\nnels. A joint \ufb01t to the psychometric functions of all conditions demonstrates that our new model\nwell captures human behavior both in terms of perceptual biases and discrimination thresholds.\n\n2 Bayesian model\n\nTo de\ufb01ne the new model, we start with the stimulus. We consider s to be the speed of locally coherent\nand translational stimulus motion (Fig. 1a). This motion can be represented by its power spectrum in\nspatiotemporal frequency space. For a given motion direction the energy lies in a two-dimensional\nplane spanned by a temporal frequency axis \u03c9t and a spatial frequency axis \u03c9s and is constrained\nto coordinates that satisfy s = \u03c9t/\u03c9s (Fig. 1b; red dashed line). According to the motion energy\nmodel, we assume that the visual system contains motion units that are tuned to speci\ufb01c locations in\nthis plane [1, 12]. A coherent motion stimulus with speed s and multiple spatial frequencies \u03c9s will\ntherefore drive only those units whose tuning curves are centered at coordinates (\u03c9s, \u03c9ss).\nWe formulate our Bayesian observer model in terms of k spatiotemporal frequency channels, each\ntuned to a narrow spatiotemporal frequency band (Fig. 1b). A moving stimulus will elicit a total\nresponse (cid:126)r = [r1, r2, ..., rk] from these channels. The response of each channel provides a likelihood\n\n2\n\nsab\u03c9s(c/deg)\u03c9t(Hz)0.51.513s(deg/s)0.626r1r2p(r|s)\fFigure 2: Bayesian observer model of speed perception with multiple spatiotemporal channels. A\nmoving stimulus with speed s is decomposed and processed in separate channels that are sensitive to\nenergy in speci\ufb01c spatiotemporal frequency bands. Based on the channel response ri we formulate\na likelihood function p(ri|s) for each channel. The posterior distribution p(s|(cid:126)r) is de\ufb01ned by the\ncombination of the likelihoods with a prior distribution p(s). Here we assume perceived speed \u02c6s to\nbe the mode of the posterior. We consider a model with and without response normalization across\nchannels (red dashed line).\n\nfunction p(ri|s). Assuming independent channel noise, we can formulate the posterior probability\nof an Bayesian observer model that performs optimal integration as\n\np(s|(cid:126)r) \u221d p(s)\n\np(ri|s) .\n\n(2)\n\n(cid:89)\n\ni\n\nWe rely on the results of Stocker and Simoncelli [16] for the characterization of the likelihood func-\ntions and the speed prior. Likelihoods are assumed to be Gaussians when considered in a transformed\nlogarithmic speed space of the form s = log(1 + slinear/s0), where s0 is a small constant [9]. If we\nassume that each channel represents a large number of similarly tuned neurons with Poisson \ufb01ring\nstatistics, then the average channel likelihood is centered on the value of s for which the activity in\nthe channel peaks, and the width of the likelihood \u03c3i is inversely proportional to the square-root of\nthe channel\u2019s response [11]. Also based on [16] we locally approximate the logarithm of the speed\nprior as linear, thus log(p(s)) = as + b.\nFor reasons of simplicity and without loss of generality, we focus on the case where the stimulus\nactivates two channels with responses (cid:126)r = [ri], i \u2208 {1, 2}. Given our assumptions, the likelihoods\n\u221a\nare normal distributions with mean \u00b5(ri) and standard deviation \u03c3i \u221d 1/\nri. The posterior (2) can\ntherefore be written as\n\np(s|(cid:126)r) \u221d exp\n\n+ as + b\n\n.\n\n(3)\n\n(cid:20)\n\u2212 (s \u2212 \u00b5(r1))2\n\n2\u03c32\n1\n\n\u2212 (s \u2212 \u00b5(r2))2\n\n2\u03c32\n2\n\n(cid:21)\n\nWe assume that the model observer\u2019s speed percept \u02c6s re\ufb02ects the value of s that maximizes the\nposterior. Thus, maximizing the exponent in Eq. 3 leads to\n\n\u02c6s =\n\n\u03c32\n2\n\u03c32\n1 + \u03c32\n2\n\n\u00b5(r1) +\n\n\u03c32\n1\n\u03c32\n1 + \u03c32\n2\n\n\u00b5(r2) + a\n\n\u03c32\n1\u03c32\n2\n1 + \u03c32\n\u03c32\n2\n\n.\n\n(4)\n\nA full probabilistic account over many trials (observations) requires the characterization of the full\ndistribution of the estimates p(\u02c6s|s). Assuming that E (cid:104)\u00b5(ri)|s(cid:105) approximates the stimulus speed s,\nthe expected value of \u02c6s is\n\n3\n\nstimuluschannelsnormalizationlikelihoodslow speed priorposteriorestimate\fFollowing the approximation in [16], the variance of the estimates \u02c6s is\n\nE (cid:104)\u02c6s|s(cid:105) =\n\n=\n\nvar (cid:104)\u02c6s|s(cid:105) \u2248\n\n=\n\n\u03c32\n2\n\u03c32\n1 + \u03c32\n2\n\u03c32\n2\n\u03c32\n1 + \u03c32\n2\n\n(cid:18) \u03c32\n(cid:18) \u03c32\n\n2\n\n2\n\n\u03c32\n1 + \u03c32\n2\n\n1 + \u03c32\n\u03c32\n2\n\n(cid:19)2\n(cid:19)2\n\nE (cid:104)\u00b5(r1)|s(cid:105) +\n\nE (cid:104)\u00b5(r2)|s(cid:105) + a\n\n\u03c32\n1\n1 + \u03c32\n\u03c32\n2\n\u03c32\n1\u03c32\n2\n\u03c32\n1 + \u03c32\n2\n\ns + a\n\ns +\n\n\u03c32\n1\n\u03c32\n1 + \u03c32\n2\n\n\u03c32\n1\u03c32\n2\n\u03c32\n1 + \u03c32\n2\n\u03c32\n1\u03c32\n2\n\u03c32\n1 + \u03c32\n2\n\n.\n\nvar (cid:104)\u00b5(r2)|s(cid:105)\n\n= s + a\n\n(cid:19)2\n\n(cid:18) \u03c32\n(cid:19)2\n\n1\n\n\u03c32\n1 + \u03c32\n2\n\n\u03c32\n2 =\n\n\u03c32\n1\u03c32\n2\n1 + \u03c32\n\u03c32\n2\n\n.\n\nvar (cid:104)\u00b5(r1)|s(cid:105) +\n\n(cid:18) \u03c32\n\n1\n\n\u03c32\n1 +\n\n1 + \u03c32\n\u03c32\n2\n\n(5)\n\n(6)\n\nThe noisy observer\u2019s percept is fully determined by Eqs. (5) and (6). By a similar derivation it is\nalso easy to show that for a single active channel the distribution has mean E (cid:104)\u02c6s|s(cid:105) = s + a\u03c32\n1 and\nvariance var (cid:104)\u02c6s|s(cid:105) = \u03c32\n1.\nThe model makes the following predictions: First, the variance of the speed estimates (i.e., percepts)\nfor stimuli that activate both channels is always smaller than the variances of estimates that are\nbased on each of the channel responses alone (\u03c32\n2). This improved reliability is a hallmark\nof optimal cue combination as has been demonstrated for cross-modal integration [4, 7]. Second,\nbecause of the slow speed prior a is negative, and perceived speeds are more biased toward slower\nspeeds the larger the sensory uncertainty. As a result, the perceived speed of combined stimuli that\nactivate both channels is always faster than the percepts based on each of the individual channel\nresponses alone. Finally, the model predicts that the perceived speed of a combined stimulus solely\ndepends on the responses of the channels to its constituent components, and is therefore independent\nof the relative phase of the components we combined [5].\n\n1 and \u03c32\n\n2.1 Response normalization\n\nSo far we assumed that the channels do not interact, i.e., their responses are independent of the num-\nber of active channels and the overall activity in the system. Here we extend our proposal with the\nadditional hypothesis that channels interact via divisive normalization. Divisive normalization [6]\nhas been considered one of the canonical neural computations responsible for e.g., contrast gain con-\ntrol, ef\ufb01cient coding, attention or surround suppression [13] (see [3] for a comprehensive review).\nHere we assume that the response of an individual channel ri is normalized such that its normalized\nresponse r\u2217\n\ni is given by\n\n.\n\n(7)\n\nr\u2217\ni = ri\n\ni(cid:80)\n\nrn\nj rn\nj\n\nNormalization typically increases the contrast (i.e., the relative difference) between the individual\nchannel responses for increasing values of the exponent n. For large n it typically acts like a winner-\ntakes-all mechanism. Note that normalization affects only the responses ri, thus modulating the\nwidth of the individual likelihood functions. The integration based on the normalized responses r\u2217\ni\nremains optimal (see Fig. 2). By explicitly modeling the encoding of visual motion in spatiotemporal\nfrequency channels, we already extended the Bayesian model of speed perception toward a more\nphysiological interpretation. Response normalization is one more step in this direction.\n\n3 Results\n\nIn the second part of this paper we test the validity of our model with and without channel normaliza-\ntion against data from a psychophysical two alternative forced choice (2AFC) speed discrimination\nexperiment.\n\n3.1 Speed discrimination experiment\n\nSeven subjects performed a 2AFC visual speed discrimination task. In each trial, subjects were pre-\nsented for 1250ms with a reference and a test stimulus on either side of a \ufb01xation mark (eccentricity\n\n4\n\n\fFigure 3: Single frequency gratings were combined in either a \u201dpeaks-add\u201d or a \u201dpeaks-subtract\u201d\nphase con\ufb01guration (0 deg and 60 deg phase, respectively) [5]. The red bar indicates that the two\ncon\ufb01gurations have different overall contrast levels even though they are composed of the same\nfrequencies. We used these two phase-combinations to test whether the channel hypothesis is valid\nor not.\n\n6 deg, size 4 deg). Both stimuli were drifting gratings, both drifting either leftwards or rightwards\nat different speeds. Motion directions and the order of the gratings were randomly selected for each\ntrial. After stimulus presentation, a brief \ufb02ash appeared on the left or right side of the \ufb01xation mark\nand subjects had to answer whether the grating that was presented on the indicated side was moving\nfaster or slower than the grating on the other side. This procedure was chosen in order to prevent\npotential decision biases.\nThe stimulus test set comprised 10 stimuli. Four of these stimuli were simple sinewave gratings of\na single spatial frequency, either \u03c9s = 0.5 or 3\u03c9s = 1.5 c/deg. The low frequency test stimulus\nhad a contrast of 22.5%, while the three higher frequency stimuli had contrasts 7.5, 22.5 and 67.5%,\nrespectively. The other six stimuli were pair-wise combinations of the single frequency gratings\n(Fig. 3), combined in either a \u201dpeaks-add\u201d or a \u201dpeaks-subtract\u201d phase con\ufb01guration [5] (i.e. 0 deg\nand 60 deg phase). All test stimuli were drifting at a speed of 2 deg/s. The reference stimulus was\na broadband stimulus stimulus whose speed was regulated by an adaptive staircase procedure. Each\nof the 10 stimulus conditions were run for 190 trials. Data from all seven subjects were combined.\nThe simple stimuli were designed to target individual spatiotemporal frequency channels while the\ncombined stimuli were meant to target two channels simultaneously. The two phase con\ufb01gura-\ntions (peaks-add and peaks-subtract) were used to test the multiple channel hypothesis: if combined\nstimuli are decomposed and processed in separate channels, their perceived speeds should be in-\ndependent of the phase con\ufb01guration. In particular, the difference in overall contrast of the two\ncon\ufb01gurations should not affect perceived speed (Fig 3).\nMatching speeds (PSEs) and relative discrimination thresholds (Weber-fraction) were extracted from\na maximum-likelihood \ufb01t of each of the 10 psychometric functions with a cumulative Gaussian.\nFig. 4a,b shows the extracted discrimination thresholds and the relative matching speed, respec-\ntively. The data faithfully reproduce the general prediction of the Bayesian model for speed per-\nception [16] that perceived speed decreases with increasing uncertainty, which can be nicely seen\nfrom the inverse relationship between matching speeds and discrimination thresholds for each of the\ndifferent test stimuli. We found no signi\ufb01cant difference in perceived speeds and thresholds between\nthe combined grating stimuli in \u201dpeaks-add\u201d and \u201dpeaks-subtract\u201d con\ufb01guration (Fig. 4a,b; right),\ndespite the fact that the effective contrast of both con\ufb01gurations differs signi\ufb01cantly (by 30, 22 and\n11% for the {22.5, 7.5}, {22.5, 22.5} and {22.5, 67.5}% contrast conditions, respectively). This\nsuggests that the perceived speed of combined stimuli is independent of the relative phase between\nthe individual stimulus components, and therefore is processed in independent channels.\n\n3.2 Model \ufb01ts\n\nIn order to \ufb01t the model observer to the data, we assumed that on every trial of the 2AFC task, the\nobserver \ufb01rst makes individual estimates of the test and the reference speeds [\u02c6st, \u02c6sr] according to\nthe corresponding distributions p(\u02c6s|s) (see Section 2), and then, based on these estimates, decides\n\n5\n\npeaks-addpeaks-subtractamplitude\u03c9s = 0.53\u03c9s = 1.5\fFigure 4: Data and model \ufb01ts for speed discrimination task: a) relative discrimination thresholds\n(Weber-fraction) and b) matching speeds (PSEs). Error bars represent the 95% con\ufb01dence interval\nfrom 100 bootstrapped samples of the data. For the single frequency gratings, the perceived speed\nincreases with contrast as predicted by the standard Bayesian model. For the combined stimuli, there\nis no signi\ufb01cant difference (based on 95% con\ufb01dence intervals) in perceived speeds between the\ncombined grating stimuli in \u201dpeaks-add\u201d and \u201dpeaks-subtract\u201d con\ufb01guration. The Bayesian model\nwith normalized responses (red line) better accounts for the data than the model without interaction\nbetween the channels (blue line).\n\nP (\u02c6sr > \u02c6st) =\n\n0\n\n(cid:90) \u02c6sr\n\n0\n\n(cid:90) \u221e\n\nwhich stimulus is faster. According to signal detection theory, the resulting psychometric function\nis described by the cumulative probability distribution\np(\u02c6sr|sr)\n\n(8)\nwhere p(\u02c6sr|sr) and p(\u02c6st|st) are the distributions of speed estimates for the reference and the test\nstimulus according to our Bayesian observer model. The model without normalization has six pa-\nrameters: four channel responses ri for each simple stimulus re\ufb02ecting the individual likelihood\nwidths, the reference response rref and the local slope of the prior a.1 The model with normalization\nhas two additional parameters n1 and n2, re\ufb02ecting the exponents of the normalization in each of\nthe two channels (Eq. 7).\nThe model with and without response normalization was simultaneously \ufb01t to the psychomet-\nric functions of all 10 test conditions using the cumulative probability distribution (Eq. 8) and a\n\np(\u02c6st|st)d\u02c6std\u02c6sr\n\n1Alternatively, channel responses as function of contrast could be modeled according to a contrast response\nc2\n+c2 , where M is the baseline response, Rmax the maximal response, and c50 is\n\nfunction ri = M + Rmax\nthe semi saturation contrast level.\n\nc2\n50\n\n6\n\nrelative threshold0.050.10.2 1.523matching speed (deg/s)simple0.5 c/degsimple1.5 c/degcombinedpeaks-addcombinedpeaks-subtractdatachannel model+norm.95% CIchannel modelabc=22.57.522.567.5\fFigure 5: Psychometric curves for the ten testing conditions in Figure 4 (upper left to lower right\ncorner): Gaussian \ufb01ts (black curves) to the psychometric data (circles) are compared to the \ufb01ts of the\nBayesian channel model (blue curves) and the Bayesian channel model with normalized responses\n(red curves). Histograms re\ufb02ect the distributions of trials for the average subject.\n\nmaximum-likelihood optimization procedure. Figure 5 shows the \ufb01tted psychometric functions for\nboth models as well as a generic cumulative Gaussian \ufb01t to the data. From these \ufb01ts we extracted the\nmatching speeds (PSEs) and relative discrimination thresholds (Weber-fractions) shown in Fig. 4.\nIn general, the Bayesian model is quite well supported by the data. In particular, the data re\ufb02ect\nthe inverse relationship between relative matching speeds and discrimination thresholds predicted\nby the slow-speed prior of the model. The model with response normalization, however, better cap-\ntures subjects\u2019 precepts in particular in conditions where very low contrast stimuli were combined.\nThis is evident from a visual comparison of the full psychometric functions (Fig. 5) as well as the\nextracted discrimination thresholds and matching speeds (Fig. 4). This impression is supported by\na log-likelihood ratio in favor of the model with normalized responses. Computing the Akaike In-\nformation Criterion (AIC) furthermore reveals that this advantage is not due to the larger number\nof free parameters of the normalization model with an advantage of \u2206AIC = 127 (with signi\ufb01cance\np = 10e \u2212 28) in favor of the latter. Further support of the normalized model comes form the \ufb01t-\nted parameter values: for the model with no normalization, the response level of the highest contrast\nstimulus r4 was not well constrained2 (r1=6.18, r2=5.50, r3=8.69, r4= 6e+07, rref=11.66, a=-1.83),\nwhile the \ufb01t to the normalized model led to more reasonable parameter values (r1=10.33, r2=9.96,\nr3=11.99, r4=37.73, rref=13.44, n1=2e-16, n2=6.8, a=-3.39). In particular, the \ufb01t prior slope pa-\nrameter is in good agreement with values from a previous study [16]. Note that the exponent n1 is\nnot well-constrained because the stimulus set only included one contrast level for the low-frequency\nchannel.\nThe results suggest that the perceived speed of a combined stimulus can be accurately described\nas an optimal combination of sensory information provided by individual spatiotemporal frequency\nchannels that interact via response normalization.\n\n4 Discussion\n\nWe have shown that human visual speed perception can be accurately described by a Bayesian\nobserver model that optimally combines sensory information from independent channels, each sen-\nsitive to motion energies in a speci\ufb01c spatiotemporal frequency band. Our model expands the previ-\nously proposed Bayesian model of speed perception [16]. It no longer assumes a single likelihood\nfunction affected by stimulus contrast but rather considers the combination of likelihood functions\nbased on the motion energies in different spatiotemporal frequency channels. This allows the model\nto account for stimuli with more complex spatial structures.\n\n2The \ufb01t essentially assumed \u03c34 = 0.\n\n7\n\n0.20.50.812340.20.50.81234123412341234reference speed (deg/s)Pgaussian fitchannel model+norm.channel model\fWe tested our model against data from a 2AFC speed discrimination experiment. Stimuli consisted\nof drifting sinewave gratings at different spatial frequencies and combinations thereof. Subjects\u2019\nperceived speeds of the combined stimuli were independent of the phase con\ufb01guration of the con-\nstituent sinewave gratings even though different phases resulted in different overall contrast values.\nThis supports the hypothesis that perceived speed is processed across multiple spatiotemporal fre-\nquency channels (Graham and Nachmias used a similar approach to demonstrate the existence of\nindividual spatial frequency channels [5]). The proposed observer model provided a good \ufb01t to\nthe data, but the \ufb01t was improved when the channel responses were assumed to be subject to nor-\nmalization by the overall channel response. Considering that divisive normalization is arguably an\nubiquitous process in neural representations, we see this result as a consequence of our attempt to\nformulate Bayesian observer models at a level that is closer to a physiological description. Note\nthat we consider the integration of the sensory information still optimal albeit based on the normal-\nized responses r\u2217\ni . Future experiments that will test more stimulus combinations will help to further\nimprove the characterization of the channel responses and interactions.\nAlthough we did not discuss alternative models, it is apparent that the presented data eliminates\nsome obvious candidates. For example, both a winner-take-all model that only uses the sensory in-\nformation from the most reliable channel, or an averaging model that equally weighs each channel\u2019s\nresponse independent of its reliability, would make predictions that signi\ufb01cantly diverge from the\ndata. Both models would not predict a decrease in sensory uncertainty for the combined stimuli,\nwhich is a key feature of optimal cue-combination. This decrease is nicely re\ufb02ected in the measured\ndecrease in discrimination thresholds for the combined stimuli when the thresholds for both indi-\nvidual gratings were approximately the same (Fig. 4b). Note, that because of the slow speed prior,\na Bayesian model predicts that the perceived speed are inversely proportional to the discrimination\nthreshold, a prediction that is well supported by our data. The \ufb01tted model parameters are also in\nagreement with previous accounts of the estimated shape of the speed prior: the slope of the linear\napproximation of the log-prior probability density is negative and comparable to previously reported\nvalues [16].\nIn this paper we focused on speed perception. However, there is substantial evidence that the visual\nsystem in general decomposes complex stimuli into their simpler constituents. The problem of how\nthe scattered information is then integrated into a coherent percept poses many interesting questions\nwith regard to the optimality of this integration across modalities [4, 7]. Our study generalizes cue-\nintegration to the pooling of information within a single perceptual modality. Here we provide a\nbehavioral account for both discrimination thresholds and matching speeds by directly estimating\nthe parameters of the likelihoods and the speed prior from psychophysical data.\nFinally, the fact that the Bayesian model can account for both the perception of simple and complex\nstimuli speaks for its generality. In the long term, the goal is to be able to predict the perceived\nmotion for an arbitrarily complex natural stimulus, and we believe the proposed model is a step in\nthis direction.\n\nAcknowledgments\n\nThis work was supported by the Of\ufb01ce of Naval Research (grant N000141110744).\n\nReferences\n\n[1] E. H. Adelson and J. R. Bergen. Spatiotemporal energy models for the perception of motion.\n\nJournal of the Optical Society of America A Optics and image science, 2(2):284\u201399, 1985.\n\n[2] K. R. Brooks, T. Morris, and P. Thompson. Contrast and stimulus complexity moderate the\nrelationship between spatial frequency and perceived speed: Implications for MT models of\nspeed perception. Journal of Vision, 11(14), 2011.\n\n[3] M. Carandini and D. J. Heeger. Normalization as a canonical neural computation. Nature\n\nReviews Neuroscience, 13(1):51\u201362, 2012.\n\n[4] M. O. Ernst and M. S. Banks. Humans integrate visual and haptic information in a statistically\n\noptimal fashion. Nature, 415(6870):429\u201333, 2002.\n\n8\n\n\f[5] N. Graham and J. Nachmias. Detection of grating patterns containing two spatial frequencies:\na comparison of single-channel and multiple-channel models. Vision Research, pages 251\u2013259,\n1971.\n\n[6] D. J. Heeger. Normalization of cell responses in cat striate cortex. Visual Neuroscience,\n\n9(2):181\u2013197, 1992.\n\n[7] J. M. Hillis, S. J. Watt, M. S. Landy, and M. S. Banks. Slant from texture and disparity cues :\n\nOptimal cue combination. Journal of Vision, 4(12):967\u2013992, 2004.\n\n[8] F. H\u00a8urlimann, D. C. Kiper, and M. Carandini. Testing the Bayesian model of perceived speed.\n\nVision Research, 42:2253\u20132257, 2002.\n\n[9] H. Nover, C. H. Anderson, and G. C. DeAngelis. A logarithmic, scale-invariant representation\nof speed in macaque middle temporal area accounts for speed discrimination performance. J.\nNeurosci, 25:10049\u201360, 2005.\n\n[10] N. J. Priebe and S. G. Lisberger. Estimating target speed from the population response in visual\n\narea MT. Journal of Neuroscience, 24(8):1907\u20131916, 2004.\n\n[11] T. D. Sanger. Probability density estimation for the interpretation of neural population codes.\n\nJ. Neurophysiology, 76(4):2790\u201393, 1996.\n\n[12] E. P. Simoncelli and D. J. Heeger. A model of neuronal responses in visual area MT. Vision\n\nResearch, 38(5):743\u2013761, 1998.\n\n[13] E. P. Simoncelli and O. Schwartz. Modeling surround suppression in V1 neurons with a\nstatistically-derived normalization model. Advances in Neural Information Processing Sys-\ntems (NIPS), 11, 1999.\n\n[14] A. T. Smith and G. K. Edgar. Perceived speed and direction of complex gratings and plaids.\nJournal of the Optical Society of America A Optics and image science, 8(7):1161\u20131171, 1991.\n[15] A. A. Stocker. Analog integrated 2-D optical \ufb02ow sensor. Analog Integrated Circuits and\n\nSignal Processing, 46(2):121\u2013138, February 2006.\n\n[16] A. A. Stocker and E. P. Simoncelli. Noise characteristics and prior expectations in human\n\nvisual speed perception. Nat Neurosci, 4(9):578\u201385, 2006.\n\n[17] L. S. Stone and P. Thompson. Human speed perception is contrast dependent. Vision Research,\n\n32(8):1535\u20131549, 1992.\n\n[18] Y. Weiss, E. P. Simoncelli, and E. H. Adelson. Motion illusions as optimal percepts. Nature\n\nNeuroscience, 5(6):598\u2013604, 2002.\n\n9\n\n\f", "award": [], "sourceid": 1489, "authors": [{"given_name": "Matjaz", "family_name": "Jogan", "institution": "University of Pennsylvania"}, {"given_name": "Alan", "family_name": "Stocker", "institution": "University of Pennsylvania"}]}