{"title": "Predicting response time and error rates in visual search", "book": "Advances in Neural Information Processing Systems", "page_first": 2699, "page_last": 2707, "abstract": "A model of human visual search is proposed. It predicts both response time (RT) and error rates (RT) as a function of image parameters such as target contrast and clutter. The model is an ideal observer, in that it optimizes the Bayes ratio of tar- get present vs target absent. The ratio is computed on the firing pattern of V1/V2 neurons, modeled by Poisson distributions. The optimal mechanism for integrat- ing information over time is shown to be a \u2018soft max\u2019 of diffusions, computed over the visual field by \u2018hypercolumns\u2019 of neurons that share the same receptive field and have different response properties to image features. An approximation of the optimal Bayesian observer, based on integrating local decisions, rather than diffusions, is also derived; it is shown experimentally to produce very similar pre- dictions. A psychophyisics experiment is proposed that may discriminate between which mechanism is used in the human brain.", "full_text": "Predicting response time and error rates in visual\n\nsearch\n\nBo Chen\nCaltech\n\nbchen3@caltech.edu\n\nVidhya Navalpakkam\n\nYahoo! Research\n\nnvidhya@yahoo-inc.com\n\nPietro Perona\n\nCaltech\n\nperona@caltech.edu\n\nAbstract\n\nA model of human visual search is proposed. It predicts both response time (RT)\nand error rates (RT) as a function of image parameters such as target contrast and\nclutter. The model is an ideal observer, in that it optimizes the Bayes ratio of target\npresent vs target absent. The ratio is computed on the \ufb01ring pattern of V1/V2 neu-\nrons, modeled by Poisson distributions. The optimal mechanism for integrating\ninformation over time is shown to be a \u2018soft max\u2019 of diffusions, computed over\nthe visual \ufb01eld by \u2018hypercolumns\u2019 of neurons that share the same receptive \ufb01eld\nand have different response properties to image features. An approximation of the\noptimal Bayesian observer, based on integrating local decisions, rather than diffu-\nsions, is also derived; it is shown experimentally to produce very similar predic-\ntions to the optimal observer in common psychophysics conditions. A psychophy-\nisics experiment is proposed that may discriminate between which mechanism is\nused in the human brain.\n\nA\n\nFigure 1: Visual search. (A) Clutter and camou\ufb02age make visual search dif\ufb01cult. (B,C) Psychologists and\nneuroscientists build synthetic displays to study visual search. In (B) the target \u2018pops out\u2019 (\u2206\u03b8 = 450), while\nin (C) the target requires more time to be detected (\u2206\u03b8 = 100) [1].\n\nB\n\nC\n\n1\n\nIntroduction\n\nAnimals and humans often use vision to \ufb01nd things: mushrooms in the woods, keys on a desk, a\npredator hiding in tall grass. Visual search is challenging because the location of the object that\none is looking for is not known in advance, and surrounding clutter may generate false alarms. The\nthree ecologically relevant performance parameters of visual search are the two error rates (ER):\nfalse alarms (FA) and false rejects (FR), and response time (RT). The design of a visual system is\ncrucial in obtaining low ER and RT. These parameters may be traded off by manipulating suitable\nthresholds [2, 3, 4].\nPsychologists and physiologists have long been interested in understanding the performance and the\nmechanisms of visual search. In order to approach this dif\ufb01cult problem they present human sub-\njects with synthetic stimuli composed of a variable number of \u2018items\u2019 which may include a \u2018target\u2019\n\n1\n\n\fand multiple \u2018distractors\u2019 (see Fig. 1). By varying the number of items one may vary the amount of\nclutter; by designing different target-distractor pairs one may probe different visual cues (contrast,\norientation, color, motion) and by varying the visual distinctiveness of the target vis-a-vis the dis-\ntractors one may study the effect of the signal-to-noise ratio (SNR). Several studies since 1980s have\ninvestigated how RT and ER are affected by the complexity of the stimulus (number of distractors),\nand by target-distractor discriminability with different visual cues. One early observation is that\nwhen the target and distractor features are widely separated in feature space (e.g., red target among\ngreen distractors), the target \u2018pops out\u2019. In these situations the ER is nearly zero, and the slope of\nRT vs. setsize is \ufb02at, i.e., RT to \ufb01nd the target is independent of number of items in the display [1].\nDecreasing the discriminability between the target and distractor increases error rates, and increases\nthe slope of RT vs. setsize [5]. Moreover, it was found that the RT for displays with no target is\nlonger than where the target is present (see review in [6]). Recent studies investigated the shape of\nRT distributions in visual search [7, 8].\nNeurophysiologically plausible models have been recently proposed to predict RTs in visual discrim-\nination tasks [9] and various other 2AFC tasks [10] at a single spatial location in the visual \ufb01eld.\nThey are based on sequential tests of statistical hypotheses (target present vs target absent) [11] com-\nputed on the response of stimulus-tuned neurons [2, 3]. We do not yet have satisfactory models for\nexplaining RTs in visual search, which is harder as it involves integrating information across several\nlocations across the visual \ufb01eld, as well as time. Existing models predicting RT in visual search are\neither qualitative (e.g. [12]) or descriptive (e.g., the drift-diffusion model [13, 14, 15]), and do not\nattempt to predict experimental results with new set sizes, target and distractor settings.\nWe propose a Bayesian model of visual search that predicts both ER and RT. Our study makes a\nnumber of contributions. First, while visual search has been modeled using signal-detection theory\nto predict ER [16], our model is based on neuron-like mechanisms and predicts both ER and RT.\nSecond, our model is an optimal observer, given a physiologically plausible front-end of the visual\nsystem. Third, our model shows that in visual search the optimal computation is not a diffusion, as\none might believe by analogy with single-location discrimination models [17, 18], rather, it is a \u2018soft-\nmax\u2019 nonlinear combination of locally-computed diffusions. Fourth, we study a physiologically\nparsimonious approximation to the optimal observer, we show that it is almost optimal when the\ncharacteristics of the task are known in advance and held constant, and we explore whether there are\npsychophysical experiments that could discriminate between the two models.\nOur model is based on a number of simplifying assumptions. First, we assume that stimulus items\nare centered on cortical hypercolumns [19] and at locations where there is no item neuronal \ufb01ring is\nnegligible. Second, retinal and cortical magni\ufb01cation [19] are ignored, since psychophysicists have\ndeveloped displays that sidestep this issue (by placing items on a constant-eccentricity ring as shown\nin Fig 1). Third, we do not account for overt and covert attentional shifts. Overt attentional shifts\nare manifested by saccades (eye motions), which happen every 200ms or so. Since the post-decision\nmotor response to a stimulus by pressing a button takes about 250-300ms, one does not need to\nworry about eye motions when response times are shorter than 500ms. For longer RTs, one may\nenforce eye \ufb01xation at the center of the display so as to prevent overt attentional shifts. Furthermore,\nour model explains serial search without the need to invoke covert attentional shifts [20] which are\ndif\ufb01cult to prove neurophysiologically.\n\n2 Target discrimination at a single location with Poisson neurons\n\nWe \ufb01rst consider probabilistic reasoning at one location, where two possible stimuli may appear.\nThe stimuli differ in one respect, e.g. they have different orientations \u03b8(1) and \u03b8(2). We will call\nthem distractor (D) and target (T), also labeled C = 1 and C = 2 (call c \u2208 {1, 2} the generic value\nof C). Based on the response of N neurons (a hypercolumn) we will decide whether the stimulus\nwas a target or a distractor. Crucially, a decision should be reached as soon as possible, i.e. as soon\nas there is suf\ufb01cient evidence for T or D [11].\nGiven the evidence T (de\ufb01ned further below in terms of the neurons\u2019 activity) we wish to decide\nwhether the stimulus was of type 1 or 2. We may do so when the probability P (C = 1|T ) of the\nstimulus being of type 1 given the observations in T exceeds a given threshold T1 (T1 = 0.99).\nWe may instead decide in favor of C = 2 e.g. when P (C = 1|T ) < T2 (e.g. T2 = 0.01). If\n\n2\n\n\fFigure 2: (Left three panels) Model of a hypercolumn in V1/V2 cortex composed of four orientation-tuned\nneurons (our simulations use 32). The left panel shows the neurons\u2019 tuning curve \u03bb(\u03b8) representing the expected\nPoisson \ufb01ring rate when the stimulus has orientation \u03b8. The middle plot shows the expected \ufb01ring rate of the\npopulation of neurons for two stimuli whose orientation is indicated with a red (distractor) and green (target)\nvertical line. The third plot shows the step-change in the value of the diffusion when an action potential is\nregistered from a given neuron. (Right panel) Diagram of the decision models. (A) One-location Bayesian\nobserver. The action potentials of a hypercolumn of neurons (top) are integrated in time to produce a diffusion.\nWhen the diffusion reaches either an upper bound T1 or a lower bound T0 the decision is taken that either\nthe target is present (1) or the target is absent (0). (B\u2013D) Multi-location ideal Bayesian observer. (B) While\nnot a diffusion, it may be seen as a \u2018soft maximum\u2019 combination of local diffusions: the local diffusions are\n\ufb01rst exponentiated, then averaged; the log of the result is compared to two thresholds to reach a decision. (C)\nThe \u2018Max approximation\u2019 is a simpli\ufb01ed approximation of the ideal observer, where the maximum of local\ndiffusions replaces a soft-maximum. (D) Equivalently, in the Max approximation decisions are reached locally\nand combined by logical operators. The white AND in a dark \ufb01eld indicates inverted AND of multiple inverted\ninputs.\n\nP (C = 1|T ) \u2208 (T2, T1) we will wait for more evidence. Thus, we need to compute P (C = 1|T ) :\n\nPr(C = 1|T ) =\n\nwhere R(T ) =\n\n1\n\n1 + P (C=2|T )\nP (C=1|T )\nP (T |C = 2)\nP (T |C = 1)\n\n1\n\n=\n1 + R(T ) P (C=2)\nP (C = 2|T )\nP (C = 1|T )\n\nP (C=1)\n\nP (C = 1)\nP (C = 2)\n\n=\n\n(1)\nwhere P (C = 1) = 1 \u2212 P (C = 2) is the prior probability of C = 1. Thus, it is equivalent to take\ndecisions by thresholding log R(T )1; we will elaborate on this in Sec. 3.\nWe will model the \ufb01ring rate of the neurons with a Poisson pdf: the number n of action potentials\nthat will be observed during one second is distributed as P (n|\u03bb) = \u03bbne\u2212\u03bb/n!. The constant \u03bb is\nthe expectation of the number of action potentials per second. Each neuron i \u2208 {1, . . . , N} is tuned\nto a different orientation \u03b8i; for the sake of simplicity we will assume that the width of the tuning\ncurve is the same for all neurons; i.e. each neuron i will respond to stimulus c with expectation\nc = f (|\u03b8(c)\u2212\u03b8i|) (in spikes per second) which are determined by the distance between the neuron\u2019s\n\u03bbi\npreferred orientation \u03b8i and by the stimulus orientation \u03b8(c).\nLet Ti = {ti\n\nk} be the set of action potentials from neuron i produced starting at t = 0 and until\nthe end of the observation period t = T . Indicate with T = {tk} = (cid:83)i Ti the complete set of\naction potentials from all neurons (where the tk are sorted). We will indicate with i(k) the index of\nthe neuron who \ufb01red the action potential at time tk. Call Ik = (tk tk+1) the intervals of time in\nbetween action potentials, where I0 = (0 t1). These intervals are open i.e. they do not contain the\nboundaries, hence they do not contain the action potentials.\nThe signal coming from the neurons is thus a concatenation of \u2018spikes\u2019 and \u2018intervals\u2019, and the\ninterval (0, T ) may be viewed as the union of instants tk and open intervals (tk, tk+1). i.e. (0, T ) =\n\nI0(cid:83) t1(cid:83) I1(cid:83) t2(cid:83)\u00b7\u00b7\u00b7\n\nSince the spike trains Ti and T are Poisson processes, once we condition on the class of the stimulus\nthe spike times are independent. This implies that: P (T |C = c) = \u03a0kP (Ik|C = c)P (tk|C = c).\nThis may be proven by dividing up (0, T ) into smaller and smaller intervals and taking the limit for\n\n1We use base 10 for all our logarithms and exponentials, i.e. log(x) \u2261 log10(x) and exp(x) \u2261 10x.\n\n3\n\n05010015001234567891011Stimulus orientation (cid:101) (degrees)Expected firing rate (cid:104) (spikes per second)Neurons\u2019 tuning curves050100150024681012Mean spiking rate per secondNeuron\u2019s preferred orientation (cid:101) (degrees)Poisson (cid:104) (spikes/s)  (cid:101)D(cid:101)T(cid:104) (D,(cid:101)i) per s(cid:104) (T,(cid:101)i) per s050100150(cid:239)0.25(cid:239)0.2(cid:239)0.15(cid:239)0.1(cid:239)0.0500.050.10.150.20.25Diffusion jump caused by action potentialNeuron\u2019s preferred orientation (cid:101) (degrees)diffusion jump per spike  (cid:101)D=90o(cid:101)T=105ojump on spikeinterspike drift per s=0.01<T0>T101explog\uffffBexp...<T0>T101<T0>T101<T0>T101<T0>T101\uffffdt\uffffdtmaxAC\uffffdt...\uffffdt\uffffdt...\uffffdt\uffffdt...<T00ANDOR01...D\fthe size of the intervals going to zero. The intervals containing action potentials converge to the ti\nand the intervals not containing action potentials may be merged into the intervals Ii.\nLet\u2019s analyze separately the log likelihood ratio for the intervals and for the spikes.\n\nDiffusion drift during the intervals. During the intervals no neuron spiked. The ratio therefore\nis computed as a function of the Poissons P (n = 0|\u03bb) when the spike count n is zero. The Poisson\nexpectation has to be multiplied by the time-length of the interval; call \u2206tk = tk+1 \u2212 tk the length\nof the interval Ik. Assuming that the neurons i = 1, . . . , N are independent we obtain:\n\n1 \u2212 \u03bbi\n2)\n\n(\u03bbi\n\nN(cid:88)i=1\n\nlog R(Ik) = log\n\nP (n = 0|C = 2, t \u2208 Ik)\nP (n = 0|C = 1, t \u2208 Ik)\n\n= log\n\ni=1P (n = 0|\u03bbi\n\u03a0N\ni=1P (n = 0|\u03bbi\n\u03a0N\n\n2\u2206tk)\n1\u2206tk)\n\n= \u2206tk\n\n(2)\nThus, during the time-intervals where no action potential is observed, the diffusion drifts linearly\nwith a slope equal to the sum over all neurons of the difference between the expected \ufb01ring rate with\nstimulus 1 and the expected \ufb01ring rate with stimulus 2.\nNotice that if there are neurons that \ufb01re equally well to targets and distractors, and if the population\nof neurons is large and made of neurons whose tuning curve\u2019s shape is identical and whose preferred\n2, thus the diffusion has drift with slope close\n\nto zero and the drift term may be ignored. In this case intervals carry no information.\n\norientation \u03b8i is regularly spaced, then(cid:80)i \u03bbi\n\n1 \u2248(cid:80)i \u03bbi\n\nDiffusion jump at the action potentials.\nIf the neurons are uncorrelated, then the probability of\ntwo or more action potentials happening at the same time is zero. Thus, at any time tk there is only\none action potential from one neuron. We can compute the likelihood ratio by taking a limit for the\nlength \u03b4t of the interval t \u2208 (tk \u2212 \u03b4t/2, tk + \u03b4t/2) going to zero. As seen in the previous section,\nthe contribution from the neurons who did not register a spike is \u03b4t(\u03bbi\n2) and goes to zero as\n\u03b4t \u2192 0. Thus we are only left with the contribution of the neuron i(k) whose spike happened at\ntime tk.\n\n1 \u2212 \u03bbi\n\nlog R(tk) = lim\n\u03b4t\u21920\n\nlog\n\nP (n = 1|\u03bbi(k)\nP (n = 1|\u03bbi(k)\n\n2\n\n1\n\n\u03b4t)\n\n\u03b4tk)\n\n= lim\n\u03b4t\u21920\n\nlog\n\n2\n\n(\u03bbi(k)\n(\u03bbi(k)\n\n1\n\n\u03b4t)1e\u2212\u03bbi(k)\n\u03b4t)1e\u2212\u03bbi(k)\n\n2\n\n1\n\n\u03b4t\n\n\u03b4t\n\n= log\n\n\u03bbi(k)\n2\n\u03bbi(k)\n1\n\n(3)\n\nAs a result, at each action potential tk the diffusion jumps by an amount that is the log of the ratio\nof the expected \ufb01ring rate of the neuron i(k)\u2019s response to target vs distractor. Thus:\n\n1. Neurons that are equally tuned to target and distractor, whether they respond much or not,\nwill not contribute to the diffusion, while neurons whose response is very different for\ntarget and distractor will contribute substantially to the diffusion.\n\n2. A larger number of neurons will produce more action potentials and thus a faster action-\n\npotential-driven drift in the diffusion.\n\nDiffusion overall. Given the analysis presented above:\n\nlog R(T ) =(cid:88)k\n\n\u2206tk(cid:88)i\n\n1 \u2212 \u03bbi\n\n(\u03bbi\n\n2) +(cid:88)k\n\nlog\n\n\u03bbi(k)\n2\n\u03bbi(k)\n1\n\n= |T |(cid:88)i\n\n1 \u2212 \u03bbi\n\n(\u03bbi\n\n2) +(cid:88)k\n\nlog\n\n\u03bbi(k)\n2\n\u03bbi(k)\n1\n\n(4)\n\nIgnoring diffusion during the intervals, the diffusion at a single location where the stimulus is of\ntype c can be described as:\n\nlog R(T ) \u223c N(cid:88)i=1\n\nc|T |)\nc|T |\nE[log R(T )] = ac|T |, V[log R(T )] = b2\n\n)P oiss(\u03bbi\n\n\u03bbi\n2\n\u03bbi\n1\n\n(log\n\nwhere P oiss(\u03bb) denotes a Poisson distributed variable with mean \u03bb, ac \u2261 (cid:80)N\nc \u2261(cid:80)N\n\ni=1(log \u03bbi\n2\n\u03bbi\n1\nc. The mean and variance of the diffusion grows linearly with time.\n\ni=1(log \u03bbi\n\n)2\u03bbi\n\n2\n\u03bbi\n1\n\nb2\n\n4\n\n(5)\n\n(6)\n\n)\u03bbi\n\nc and\n\n\fA\n\nB\n\nC\n\nD\n\nFigure 3: (A) Diffusions realized at 10 spatial locations when the target is absent (black). The ideal\nobserver Bayes ratio is shown in green, the max-model approximation is shown in red. Thresholds\n\u03981 = \u22122, \u03982 = 2 are shown, which produce 1% error rates in the ideal observer. (B) Target present\ncase. Notice that the decision takes longer when the target is absent (see also Fig. 4). (C) Error rates\nvs. number of items and (D) vs target contrast when decision thresholds are held constant. Decision\nthresholds were chosen to obtain 5% error rates in the condition M = 10,\u2206\u03b8 = \u03c0/6. As we change\ntarget contrast and the number of targets the optimal observer has constant error rates, while the\nMax approximation produces variable error rates. Testing human subjects with a mix of stimuli\nwith different values of M and \u2206\u03b8 may prevent them from adjusting decision thresholds between\nstimuli; thus, one would expect constant error rates if the visual system uses the ideal observer and\nvariable error rates if it uses the Max approximation.\n\n3 Visual search: detection across M locations with Poisson neurons\n\nWe now consider the case with M locations with N Poisson neurons each. At each location we\nmay either have a target T or a distractor D. In the whole display we have two hypotheses: no target\n(C = 1) or one target at a location l (C = 2). The second hypothesis may be broken up into the\ntarget being at any of M locations l. Because of this, the numerator of the likelihood ratio is the sum\nof M terms. The Bayesian observer must integrate the action potentials from each unit to a central\nunit that computes the posterior beliefs. The multi-location Bayesian observer may be computed by\nobserving that the target-present event is the union of the target-present events in each one of the\nlocations, while the target absent event implies that each location has no target. Thus, the likelihood\ncan be computed as the weighted sum of local likelihoods at each location in the display.\nWe assume that\n\n1. The likelihood at each location is independent from the rest when the stimulus type at that\n\nlocation is known; i.e. P (T |C l,\u2200l) =(cid:81)l P (T l|C l) .\n\n\u2200l, P (C l = 2, C l = 1|C = 2) = 1/M.\n\n2. The target,\n\nif present,\n\nis equally likely to occur at any location in the display;\n\ni.e.\n\nCalling l a location and l the complement of that location (i.e. all locations but l) we have:\n\nP (T l|C l = 1)\n\nP (T |C l = 2, C l = 1)P (C l = 2, C l = 1|C = 2)\n\nRl(T l)\n\nRl(T l) = log\n\nM(cid:88)l=1\n\n1\nM\n\nM(cid:88)l=1\n\nexp(log Rl(T l))\n\n(7)\n\nM(cid:88)l=1\n\n1\nM\n\n5\n\nP (T |C = 1) =\n\nP (T |C = 2) =\n\nM(cid:89)l=1\nM(cid:88)l=1\n\n=\n\n1\nM\n\nlog Rtot(T ) = log\n\n(\n\nM(cid:89)l=1\n\nP (T l|C l = 1))\n\nP (T |C = 2)\nP (T |C = 1)\n\n= log\n\nEqn. 7 tells us two things:\n\n050010001500200025003000(cid:239)6(cid:239)5(cid:239)4(cid:239)3(cid:239)2(cid:239)10123Time (ms)log R  Full bayesMAX05001000150020002500(cid:239)5(cid:239)4(cid:239)3(cid:239)2(cid:239)10123Time (ms)log R  Full bayesMAX10010110210(cid:239)310(cid:239)210(cid:239)1100MError Rate (%)ER=1.0%  FPR,Full BayesFNR,Full BayesFPR,MaxFNR,Max10010110210(cid:239)310(cid:239)210(cid:239)1100ER=10.0%M00.511.5210(cid:239)310(cid:239)210(cid:239)1100(cid:54)(cid:101)Error Rate (%)ER=1.0%  FPR,Full BayesFNR,Full BayesFPR,MaxFNR,Max00.511.5210(cid:239)310(cid:239)210(cid:239)1100ER=10.0%(cid:54)(cid:101)\f1. The process log Rtot is not a diffusion, in that log Rtot at time t + 1 can not be computed\n2. The process log Rtot may be computed easily from the local diffusions log Rl(T l) (in\n\nby incrementing its value at time t by a term that depends only on the interval (t, t + 1).\n\nSec. 4 we \ufb01nd an approximation that has a natural neural implementation).\n\nNow that we know how to compute log R(T ) for single and multi-location Bayesian observer, we\nmay take our decision by thresholding log R(T ) (Eqn. 1). Speci\ufb01cally, we choose separate thresh-\nolds for making the target absent and the target present decision, and adjusted the thresholds based\non tolerance levels for the false positive rate (FPR) and the false negative rate (FNR). We keep\naccumulating evidence until either decision can be made.\nThe relationship between FPR, FNR and the two thresholds can be derived using analysis similar\nto [11]. When log Rtot(T ) reaches the target present threshold (\u03982), with probability P (C = 2|T ),\nthe target is present and with probability P (C = 1|T ) the target is absent, i.e. F P R = P (C = 1|T )\nand 1 \u2212 F N R = P (C = 2|T ). We have:\n\n\u03982 = log Rtot(T ) = log\n\nP (C = 2|T )\nP (C = 1|T )\n\n= log\n\n1 \u2212 F N R\n\nF P R\nSimilarly, when log R(T ) reaches the target absent threshold (\u03981), we have:\nF P R\n\n\u03981 = log Rtot(T ) = log\n\nP (C = 2|T )\nP (C = 1|T )\n\n= log\n\n1 \u2212 F N R\n\n(8)\n\n(9)\n\nTherefore, given desired FPR and FNR, we can analytically compute the upper and lower thresholds\nfor the Full Bayesian model using Eqn. 8 and 9.\n\n4 Max approximation\n\nAn alternative, more economic, approach to full Bayesian decision is to approximate the global\nbelief using the largest local diffusion and suppress the rest. This is because, in the limit where\n|T | is large, the diffusion at the location where the target is present will dominate over the other\ndiffusions and thus it is a good approximation of the sum in Eq. 7. We will call this approach \u201cmax\napproximation\u201d and also \u201cMax model\u201d. In this scheme, at each location a diffusion based on the\nlocal Bayesian observer is computed. If any location \u2018detects\u2019 a target, then a target is declared.\nIf all locations detect a distractor, then the \u2018no target\u2019 condition is declared. This may not be the\noptimal method, but it has the advantage of requiring only two low-frequency communication lines\nbetween each location and the central decision unit. Equivalently, the max approximation can be\nimplemented by computing the maximum of the local diffusions and comparing it to an adjusted\nhigh and a low threshold for target present/absent decision (see Fig. 2).\nMore speci\ufb01cally, let l\u2217 denote the location of maximum diffusion in the display, and log Rl\u2217\ndenote\nl=1 log Rl(T l)). From eqn 7 we know that the global\nthe maximum diffusion (i.e., log Rl\u2217\nlikelihood ratio is the average of the local likelihood ratios, and equivalently, the log likelihood ratio\nis the soft-max of the local diffusions:\n\n= maxM\n\nlog Rtot(T ) = log(cid:32) 1\n\nM\n\nexp(cid:0)log Rl(T l)(cid:1)(cid:33)\nM(cid:88)l=1\n+ log\uf8eb\uf8ed 1\n(1 +(cid:88)l(cid:54)=l\u2217\n\nM\n\n))\uf8f6\uf8f8\n\n= log Rl\u2217\n\nexp(log Rl \u2212 log Rl\u2217\n\n(10)\n\nTarget present \u2013 When the target is present in the display, if the target is different from the distrac-\ntor, the diffusion at the target\u2019s location will frequently become much higher than at other locations,\nand the terms corresponding to Rl\nRl\u2217 may be safely ignored. Thus, the total log likelihood ratio may\nbe approximated by the maximum local diffusions minus a constant:\n\nlog Rtot \u2248 log Rl\u2217 \u2212 log M\n\nif Rl << Rl\u2217\n\n(11)\n\n6\n\n\fA\n\nB\n\nC\n\nFigure 4: (A) Histogram of response-times (RT) when the target is present (green) and when the target is\nabsent (red) for M = 10 for different values of target contrast (\u2206\u03b8). Response times are longer when the\ncontrast is smaller (see Fig. 1). Also, they are longer when the target is absent (see Fig. 3). Notice that the\nresponse times have a Gaussian-like distribution when time is plotted on a log scale, and the width of the\ndistribution does not change signi\ufb01cantly as the dif\ufb01culty of the task changes; thus, the mean and median\nresponse time are equivalently informative statistics of RT. (B) Mean RT as a function of the number M of\nitems for different values of target contrast; the curves appear linear as a function of log M [21]. Notice that RT\nslope is almost zero (\u2018parallel search\u2019) when the target has high contrast, while when target contrast is low RT\nincreases signi\ufb01cantly with M (\u2018serial search\u2019) [1]. The response times observed using the Max approximation\nare almost identical to those obtained with the ideal observer.\n(C) Error vs. RT tradeoff curves obtained\nby changing systematically the value of the decision threshold. The mean RT \u00b1\u03c3 is shown. Ideal bayesian\nobserver (blue) and Max approximation (cyan) are almost identical indicating that the Max approximation\u2019s\nperformance is almost as good as that of the optimal observer.\n\nFrom Eqn. 5 and 6 we know that the difference in diffusion value between the target location and\nthe distractor location grows linearly in time. Thus, the longer the process lasts, the better the\napproximation. Conversely, when t = |T | is small, the approximation is unreliable, and a different\napproximation term must be introduced (see supplementary material2 for derivation):\n\nlog Rtot \u2248 log Rl\u2217 \u2212(cid:18)a2t + log(\n\n1\nM\n\n+\n\n(M \u2212 1)\n\nM\n\nexp((a1 \u2212 a2 +\n\nb2\n1 + b2\n2\n\n2\n\n)t))(cid:19) if Rl \u2248 Rl\u2217\n\n(12)\n\nTarget absent\n\u2013 When the target is absent in the display, the value of all the local diffusions at\ntime t will be distributed according to the same density. According to Eqn. 6, the standard deviation\nt, hence the expected value of log Rl\u2217 \u2212 log Rl is monotonically increasing. When this\ngrows as\nexpected difference is large enough, we can make the same approximation as Eqn. 11:\n\n\u221a\n\nlog Rtot \u2248 log Rl\u2217 \u2212 log M\n\n(13)\nOn the other hand, when |T | is small, we resort to another approximation (see supplementary mate-\nrial for derivation):\n\nif Rl << Rl\u2217\n\nwhere \u00b5M \u2261 M(cid:82) \u221e\n\nlog Rtot \u2248 log Rl\u2217 \u2212 \u00b5M b1\n\nt +\n\n(14)\n\u2212\u221e z\u03a6M\u22121(z)N (z)dz, and N (z) and \u03a6(z) denote the pdf and cdf of normal\n\nlog(\n\nM\n\n)\n\nif Rl \u2248 Rl\u2217\n\ndistribution.\nSince the max diffusion does not represent the global log likelihood ratio, its thresholds can not be\ncomputed directly from the error rates. Nonetheless we can \ufb01rst compute analytically the thresholds\nfor the Bayesian observer (Eqn. 8 and 9), and adjust them based on the approximations stated above\n(Eqn. 11, 12, 13 and 14). Finally, we threshold the maximum local diffusion log Rl\u2217\nwith respect to\nthe adjusted upper and lower threshold to make our decision.\n\n\u221a\n\nb2\n1t\n2\n\n\u2212 1\n2\n\nexp(b2t) + M \u2212 1\n\n5 Experiments\n\nExperiment 1. - Overall model predictions.\nIn this experiment we explore the model\u2019s prediction\nof response time over a series of interesting conditions. The default parameters are the number of\n\n2http://vision.caltech.edu/\u02dcbchen3/nips2011/supplementary.pdf\n\n7\n\n10010210400.050.10.150.20.25Response time (ms)Normalized counts(cid:54)(cid:101)=pi/18  5.0%,TA5.0%,TP10010210400.050.10.150.20.25(cid:54)(cid:101)=pi/610010210400.050.10.150.20.25(cid:54)(cid:101)=pi/2100101102020040060080010001200MMean response time (ms)(cid:54)(cid:101)=pi/18,Full Bayes  TATP100101102020040060080010001200(cid:54)(cid:101)=pi/6,Full Bayes100101102020040060080010001200(cid:54)(cid:101)=pi/2,Full Bayes10010150100150200250Error Rate (%)Response time (ms)M=3.00  TA,Full BayesTP,Full BayesTA,MaxTP,Max10010150100150200250M=10.0010010150100150200250M=30.00\fneurons per location N = 32, the tuning width of each neuron = \u03c0/8, the maximum expected\n\ufb01ring rate (\u03bb = 10 action potentials per second) and minimum expected \ufb01ring rate (\u03bb = 1 a.p./s)\nof a neuron, which re\ufb02ects the signal-to-noise ratio of the neuron\u2019s tuning curves, the number of\nitems (locations) in the display M = 10 and the stimulus contrast \u2206\u03b8 = \u03c0/6. Both M and \u2206\u03b8\nrefers to the display, while the other parameters refer to the brain. We will focus on how predictions\nchange when the display parameters are changed over a set of discrete settings: M \u2208 {3, 10, 30}\nand \u2206\u03b8 \u2208 {\u03c0/18, \u03c0/6, \u03c0/2}. For each setting of the parameters, we simulate the bayesian and the\nmax model for 1000 runs. The length of simulation is set to a large value (4 seconds) to make sure\nthat all decisions are made before the simulation terminates.\nWe are also interested in the trade-off between RT and ER \u03b7 for \u03b7 = {1%, 5%, 10%}. For each\n\u03b7 we search for the best pair of upper and lower thresholds that achieve F N R \u2248 FPR \u2248 \u03b7. We\nsearch over the interval [0 3.5] for the optimal upper threshold and over [\u22123.5 0] for the optimal\nlower threshold (an upper threshold of 3.5 corresponds to a FPR of 0.03%). The search is conducted\nexhaustively over an [80 \u00d7 80] discretization of the joint space of the thresholds. We record the\nresponse time distributions for all parameter settings and for all values of \u03b7 (Fig. 4).\n\nExperiment 2. - Conditions where Bayesian and Max models differ maximally\nIn this exper-\niment we test the robustness of Bayesian and Max models with respect to a \ufb01xed threshold. For a\nBayesian observer, the thresholds yielding a given error rate can be computed exactly independent\nof the display (Eqn. 9 and 8). On the contrary, in order for the max model to achieve the equivalent\nperformance, its threshold must be adjusted differently depending on the number of items M and\nthe target contrasts \u2206\u03b8 (Eqn. 11-14). As a result, if a constant threshold is used for all conditions,\nwe would expect the Bayesian observer ER to be roughly constant, whereas the Max model would\nhave considerable ER variability. The error rates are shown in Fig. 3 as we vary M and \u2206\u03b8. The\nthreshold is set as the optimal threshold that produces 5% error for the Bayesian observer at a single\nlocation M = 1 and with \u2206\u03b8 = \u03c0/18.\n\n6 Discussion and conclusions\n\nWe presented a Bayesian ideal observer model of visual search. To the best of our knowledge, this\nis the \ufb01rst model that can predict the statistics of both response times (RT) and error rates (ER)\npurely from physiologically relevant constants (number, tuning width, signal-to-noise ratio of cor-\ntical mechanisms) and from image parameters (target contrast and number of distractors). Neurons\nare modeled as Poisson units and the model has only four free parameters: the number of neurons per\nhypercolumn, the tuning width of their response curve, the maximum and the minimum \ufb01ring rate\nof each neuron. The model predicts qualitatively the main phenomena that are observed in visual\nsearch: serial vs. parallel search [1], the Gaussian-like shape of the response time histograms in log\ntime [7] and the faster response times when the target is present [3]. The model is easily adaptable\nto predictions involving multiple targets, different image features and conjunction of features.\nUnlike the case of binary detection/decision, the ideal observer may not be implemented by a diffu-\nsion. However, it may be implemented using a precisely de\ufb01ned \u2018soft-max\u2019 combination of diffu-\nsions, each one of which is computed at a different location across the visual \ufb01eld. We discuss an\napproximation of the ideal observer, the Max model, which has two natural and simple implemen-\ntations in neural hardware. The Max model is found experimentally to have a performance that is\nvery close to that of the ideal observer when the task parameters do not change.\nWe explored whether any combinations of target contrast and number of distractors would produce\nsigni\ufb01cantly different predictions of the ideal observer vs the Max model approximation and found\nnone in the case where the visual system can estimate decision thresholds in advance. However, our\nsimulations predict different error rates when interleaving images containing diverse contrast levels\nand distractor numbers.\n\nAcknowledgements: We thank the three anonymous referees for many insightful comments and\nsuggestions; thanks to M. Shadlen for a tutorial discussion on the history of discrimination models.\nThis research was supported by the California Institute of Technology.\n\n8\n\n\fReferences\n[1] A.M. Treisman and G. Gelade. A feature-integration theory of attention. Cognitive psychology, 12(1):97\u2013\n\n136, 1980.\n\n[2] W.T. Newsome, K.H. Britten, and J.A. Movshon. Neuronal correlates of a perceptual decision. Nature,\n\n341(6237):52\u201354, 1989.\n\n[3] P. Verghese. Visual search and attention:: A signal detection theory approach. Neuron, 31(4):523\u2013535,\n\n2001.\n\n[4] Vidhya Navalpakkam and Laurent Itti. Search goal tunes visual features optimally. Neuron, 53(4):605\u201317,\n\nFeb 2007.\n\n[5] J. Duncan and G.W. Humphreys. Visual search and stimulus similarity. Psychological review, 96(3):433,\n\n1989.\n\n[6] J.M. Wolfe. Attention (Ed. H. Pashler), chapter Visual Search, pages 13\u201373. University College London\n\nPress, London, U.K., 1998.\n\n[7] J.M. Wolfe, E.M. Palmer, and T.S. Horowitz. Reaction time distributions constrain models of visual\n\nsearch. Vision research, 50(14):1304\u20131311, 2010.\n\n[8] E.M. Palmer, T.S. Horowitz, A. Torralba, and J.M. Wolfe. What are the shapes of response time dis-\ntributions in visual search? Journal of Experimental Psychology: Human Perception and Performance,\n37(1):58, 2011.\n\n[9] Jeffrey M Beck, Wei Ji Ma, Roozbeh Kiani, Tim Hanks, Anne K Churchland, Jamie Roitman, Michael N\nShadlen, Peter E Latham, and Alexandre Pouget. Probabilistic population codes for bayesian decision\nmaking. Neuron, 60(6):1142\u201352, Dec 2008.\n\n[10] R. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J.D. Cohen. The physics of optimal decision making: A\nformal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review,\n113(4):700, 2006.\n\n[11] A. Wald. Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16(2):117\u2013186,\n\n1945.\n\n[12] M.M. Chun and J.M. Wolfe. Just say no: How are visual searches terminated when there is no target\n\npresent? Cognitive Psychology, 30(1):39\u201378, 1996.\n\n[13] R. Ratcliff. A theory of memory retrieval. Psychological Review, 85(2):59\u2013108, 1978.\n[14] Philip L Smith and Roger Ratcliff. Psychology and neurobiology of simple decisions. Trends Neurosci,\n\n27(3):161\u20138, Mar 2004.\n\n[15] Roger Ratcliff and Gail McKoon. The diffusion decision model: theory and data for two-choice decision\n\ntasks. Neural Comput, 20(4):873\u2013922, Apr 2008.\n\n[16] D.G. Pelli. Uncertainty explains many aspects of visual contrast detection and discrimination. JOSA A,\n\n2(9):1508\u20131531, 1985.\n\n[17] R. Ratcliff. A theory of order relations in perceptual matching. Psychological Review, 88(6):552, 1981.\n[18] Joshua I Gold and Michael N Shadlen. The neural basis of decision making. Annu Rev Neurosci, 30:535\u2013\n\n74, 2007.\n\n[19] R.L. De Valois and K.K. De Valois. Spatial vision. Oxford University Press, USA, 1990.\n[20] MI Posner, Y. Cohen, and RD Rafal. Neural systems control of spatial orienting. Philosophical Transac-\n\ntions of the Royal Society of London. B, Biological Sciences, 298(1089):187, 1982.\n\n[21] W.E. Hick. On the rate of gain of information. Quarterly Journal of Experimental Psychology, 4(1):11\u2013\n\n46, 1952.\n\n9\n\n\f", "award": [], "sourceid": 1477, "authors": [{"given_name": "Bo", "family_name": "Chen", "institution": null}, {"given_name": "Vidhya", "family_name": "Navalpakkam", "institution": null}, {"given_name": "Pietro", "family_name": "Perona", "institution": null}]}