{"title": "Experience-Guided Search: A Theory of Attentional Control", "book": "Advances in Neural Information Processing Systems", "page_first": 1033, "page_last": 1040, "abstract": null, "full_text": "Experience-Guided Search:\n\nA Theory of Attentional Control\n\nMichael C. Mozer\n\nDavid Baldwin\n\nDepartment of Computer Science and\n\nDepartment of Computer Science\n\nInstitute of Cognitive Science\n\nUniversity of Colorado\n\nIndiana University\n\nBloomington, IN 47405\n\nmozer@colorado.edu\n\nbaldwind@indiana.edu\n\nAbstract\n\nPeople perform a remarkable range of tasks that require search of the visual en-\nvironment for a target item among distractors. The Guided Search model (Wolfe,\n1994, 2007), or GS, is perhaps the best developed psychological account of hu-\nman visual search. To prioritize search, GS assigns saliency to locations in the\nvisual \ufb01eld. Saliency is a linear combination of activations from retinotopic maps\nrepresenting primitive visual features. GS includes heuristics for setting the gain\ncoef\ufb01cient associated with each map. Variants of GS have formalized the notion\nof optimization as a principle of attentional control (e.g., Baldwin & Mozer, 2006;\nCave, 1999; Navalpakkam & Itti, 2006; Rao et al., 2002), but every GS-like model\nmust be \u2019dumbed down\u2019 to match human data, e.g., by corrupting the saliency map\nwith noise and by imposing arbitrary restrictions on gain modulation. We propose\na principled probabilistic formulation of GS, called Experience-Guided Search\n(EGS), based on a generative model of the environment that makes three claims:\n(1) Feature detectors produce Poisson spike trains whose rates are conditioned on\nfeature type and whether the feature belongs to a target or distractor; (2) the en-\nvironment and/or task is nonstationary and can change over a sequence of trials;\nand (3) a prior speci\ufb01es that features are more likely to be present for target than\nfor distractors. Through experience, EGS infers latent environment variables that\ndetermine the gains for guiding search. Control is thus cast as probabilistic infer-\nence, not optimization. We show that EGS can replicate a range of human data\nfrom visual search, including data that GS does not address.\n\n1 Introduction\n\nHuman visual activity often involves search. We search for our keys on a cluttered desk, a face in\na crowd, an exit sign on the highway, a key paragraph in a paper, our favorite brand of cereal at\nthe supermarket, etc. The \ufb02exibility of the human visual system stems from the endogenous (or\ninternal) control of attention, which allows for processing resources to be directed to task-relevant\nregions and objects in the visual \ufb01eld. How is attention directed based on an individual\u2019s goals? To\nwhat sort of features of the visual environment can attention be directed? These two questions are\ncentral to the study of how humans explore their environment.\n\nVisual search has traditionally been studied in the laboratory using cluttered stimulus displays con-\ntaining arti\ufb01cial objects. The objects are de\ufb01ned by a set of primitive visual features, such as\ncolor, shape, and size. For example, an experimental task might be to search for a red vertical line\nsegment\u2014the target\u2014among green verticals and red horizontals\u2014the distractors. Performance is\ntypically evaluated as the response latency to detect the presence or absence of a target with high\naccuracy. The ef\ufb01ciency of visual search is often characterized by the search slope\u2014the increase\n\n1\n\n\fFigure 1: The archi-\ntecture of Guided\nSearch\n\nin response latency with each additional distractor in the display. An inef\ufb01cient search can often\nrequire an additional 25\u201335 ms/item (or more, if eye movements are required).\n\nMany computational models of visual search have been proposed to explain data from the burgeon-\ning experimental literature (e.g., Baldwin & Mozer, 2006; Cave, 1999; Itti & Koch, 2001; Mozer,\n1991; Navalpakkam & Itti, 2006; Sandon, 1990; Wolfe, 1994). Despite differences in their details,\nthey share central assumptions, perhaps most plainly emphasized by Wolfe\u2019s (1994) Guided Search\n2.0 (GS) model. We describe the central assumptions of GS, taking some liberty in ignoring details\nand complications of GS that obfuscate the similarities within this class of models and that are not\nessential for the purpose of this paper.1\nAs depicted Figure 1, GS posits that primitive visual features are detected across the retina in parallel\nalong dimensions such as color and orientation, yielding a set of feature activity maps. Feature\nactivations are scalars in [0, 1]. The feature maps represent each dimension via a coarse coding.\nThat is, the maps for a particular dimension are highly overlapping and broadly tuned. For example,\ncolor might be represented by maps tuned to red, green, blue, and yellow; orientation might be\nrepresented by maps tuned to left, right, steep, and shallow-sloped edges. The feature activity maps\nare passed through a differencing mechanism that enhances local contrast and texture discontinuities,\nyielding a bottom-up activation.\n\nThe bottom-up activations from all feature maps are combined to form a saliency map in which\nactivation at a location indicates the priority of that location for the task at hand. Attention is directed\nto locations in order from most salient to least, and the object at each location is identi\ufb01ed. GS\nsupposes that response latency is linear in the number of locations that need to be searched before\na target is found.\n(The model includes rules for terminating search if no target is found after a\nreasonable amount of effort.)\n\nConsider the task of searching for a red vertical bar among green vertical bars and red horizontal\nbars. Ideally, attention should be drawn to red and vertical bars, not to green or horizontal bar. To\nallow for guidance of attention, GS posits that a weight or top-down gain is associated with each\nfeature map, and the contribution of given feature map to the saliency map is scaled by the gain. It\nis the responsibility of control processes to determining gains that emphasize task-relevant features.\n\nAlthough gain modulation is a sensible means of implementing goal-directed action, it yields be-\nhavior than is more ef\ufb01cient than people appear to be. To elaborate, consider again the task of\nsearching for a red vertical. If the gains on the red and vertical maps are set to 1, and the gains on\ngreen and horizontal maps are set to 0, then a target (red vertical) will have two units of activation\nin the saliency map, whereas each distractor (red horizontal or green vertical) will have only one\nunit of activation. Because the target is the most salient item and GS assumes that response time is\nmonotonically related to the saliency ranking of the target, the target should be located quickly, in a\ntime independent of the number of distractors. In contrast, human response times increase linearly\nwith the number of distractors in conjunction search.\n\nTo reduce search ef\ufb01ciency, GS assumes noise corruption of the saliency map. In the case of GS, the\nsignal-to-noise ratio is roughly 2:1. Baldwin and Mozer (2006) also require noise corruption for the\nsame reason, although the corruption is to the low-level feature representation not the saliency map.\nAlthough Navalpakkam and Itti (2006) do not explicitly introduce noise in their model, they do so\nimplicitly via a selection rule that the probability of attending an item is proportional to its saliency.\n\nTo further reduce search ef\ufb01ciency, GS includes a complex set of rules that limit gain control. For ex-\nample, gain modulation is allowed for only one feature map per dimension. Other attentional models\n\n1Although Guided Search has undergone re\ufb01nement (Wolfe, 2007), the key claims summarized here are\n\nunchanged. Recent extensions to GS consider eye movements and acuity changes with retinal eccentricity.\n\n2\n\nvisualsaliencymapverticalhorizontalgreenredprimitive-featuretop-downbottom-up activationsattentionalselectionattentionalstateimagegainsnoisecontrast mapsprocess+\fpose the restrictionP\nNavalpakkam and Itti (2006) prefer the constraintsP\n\nplace similar, somewhat arbitrary limitations on gain modulation. Baldwin and Mozer (2006) im-\ni |gi \u2212 1| < c, where gi is the gain of feature map i and c is a constant.\n\ni gi = c and gi > 0.\n\nFinally, we mention one other key property the various models have in common. Gain tuning is\ncast as an optimization problem: the goal of the model is to adjust the gains so as to maximize the\ntarget saliency relative to distractor saliency for the task at hand. Baldwin and Mozer (2006) de\ufb01ne\nthe criterion in terms of the target saliency ranking. Navalpakkam and Itti (2006) use the expected\ntarget to distractor saliency ratio. Wolfe (1994) sets gains according to rules that he describes as\nperforming optimization.\n\n2 Experience-Guided Search\n\nThe model we introduce in this paper makes three contributions over the class of Guided Search\nmodels previously proposed. (1) GS uses noise or nondeterminism to match human data. In reality,\nnoise and nondeterminism serve to degrade the model\u2019s performance over what it could otherwise\nbe. In contrast, all components of our model are justi\ufb01ed on computational grounds, leading to a\nmore elegant, principled account. (2) GS imposes arbitrary limitations on gain modulation that also\nresult in the model performing worse than it otherwise could. Although limitations on gain mod-\nulation might be neurobiologically rationalized, a more elegant account would characterize these\nlimitations in terms of trade offs: constraints on gain modulation may limit performance, but they\nyield certain bene\ufb01ts. Our model offers such a trade-off account. (3) In GS, attentional control is\nachieved by tuning gains to optimize performance. In contrast, our model is designed to infer the\nstructure of its environment through experience, and gain modulation is a byproduct of this infer-\nence. Consequently, our model has no distinct control mechanism, leading to a novel perspective on\nexecutive control processes in the brain.\n\nOur approach begins with the premise that attention is fundamentally task based: a location in the\nvisual \ufb01eld is salient if a target is likely at that location. We de\ufb01ne saliency as the target probability,\nP (Tx = 1|Fx), where Fx is the local feature activity vector at retinal location x and Tx is a binary\nrandom variable indicating if location x contains a target. Torralba et al. (2006) and Zhang and\nCottrell (submitted) have also suggested that saliency should re\ufb02ect target probability, although they\npropose approaches to computing the target probability very different from ours. Our approach is to\ncompute the target probability using statistics obtained from recent experience performing the task.\nConsequently, we refer to our model as experience-guided search or EGS.\nTo expand P (Tx|Fx), we make the naive-Bayes assumption that the feature activities are indepen-\ndent of one another, yielding\n\nP (Tx|Fx, \u03c1) = P (Tx)Q\n\ni P (Fxi|Tx, \u03c1)/P1\n\nt=0 P (Tx = t)Q\n\ni P (Fxi|Tx = t, \u03c1),\n\n(1)\n\nwhere \u03c1 is a vector of parameters that characterize the current stimulus environment in the current\ntask, and Fxi encodes the activity of feature i. Consider Fxi to be a rate-coded representation of\na neural spike train. Speci\ufb01cally, Fxi denotes the count of the number of spikes that occurred in a\nwindow of n time intervals, where at most one spike can occur in each interval.\nWe propose a generative model of the environment in which Fxi is a binomial random variable,\nFxi|{Tx = t, \u03c1} \u223c Binomial(\u03c1it, n), where a spike rate \u03c1it is associated with feature i for target\n(t = 1) and nontarget (t = 0) items. As n becomes large\u2014i.e., the spike count is obtained over\na larger period of time\u2014the binomial is well approximated by a Gaussian: Fxi|{Tx = t, \u03c1} \u223c\nN (n\u03c1it, n\u03c1it(1 \u2212 \u03c1it)). Using the Gaussian approximation, Equation 1 can be rewritten in the form\nof a logistic function: P (Tx|Fx, \u03c1) = 1/(1 + e\u2212(rx+ n\n\n(cid:20) P (Tx = 1)\n\n(cid:21)\n\nP (Tx = 0)\n\n\u2212 1\n2\n\n(cid:20) \u03c1i1(1 \u2212 \u03c1i1)\n\n\u03c1i0(1 \u2212 \u03c1i0)\n\n(cid:21)\n\nln\n\nX\n\ni\n\n2 sx)), where\n\nand sx =X\n\n1X\n\ni\n\nt=0\n\nrx = ln\n\n1 \u2212 2t\n\n\u03c1it(1 \u2212 \u03c1it)\n\n( \u02dcfxi\u2212\u03c1it)2 (2)\n\nand \u02dcfxi = fxi/n denotes the observed spike rate for a feature detector.\nBecause of the logistic relationship, P (Tx|Fx, \u03c1) is monotonic in rx + n\n2 sx. Consequently, if at-\ntentional priority is given to locations in order of their target probability, P (Tx|Fx, \u03c1), then it is\n\n3\n\n\fequivalent to rank using rx + n\n2 sx. Further, if we assume that the target is equally likely in any\nlocation, then rx is constant across locations, and sx can substitute for P (Tx|Fx, \u03c1) as an equivalent\nmeasure of saliency.\nThis saliency measure, sx, makes intuitive sense. Saliency at a location increases if feature i\u2019s\nactivity is distant from the mean activity observed in the past for a distractor (\u03c1i0) and decreases if\nfeature i\u2019s activity is distant from the mean activity observed in the past for a target (\u03c1i1). These\nsaliency increases (decreases) are scaled by the variance of the distractor (target) activities, such that\nhigh-variance features have less impact on saliency.\nExpanding the numerator terms in the de\ufb01nition of sx (Equation 2), we observe that sx can be written\nas a linear combination of terms involving the feature activities, \u02dcfxi, and the squared activities, \u02dcf 2\nxi\n(along with a constant term that can be ignored for ranking by saliency). The saliency measure\n\u02dcfxi. The differences\nsx in EGS is thus quite similar to the saliency measure in GS, sGS\nare: \ufb01rst, EGS incorporates quadratic terms, and second, gain coef\ufb01cients of EGS are not free\nparameters but are derived from statistics of targets and distractors in the current task and stimulus\nenvironment. In this fact lies the virtue of EGS relative to GS: The control parameters are obtained\nnot by optimization, but are derived directly from statistics of the environment.\n\nx = P\n\ni gi\n\n2.1 Uncertainty in the Environment Statistics\n\nThe model parameters, \u03c1, could be maximum likelihood estimates obtained by observing target\nand distractor activations over a series of trials. That is, suppose that each item in the display is\nidenti\ufb01ed as a target or distractor. The set of activations of feature i at all locations containing a\ntarget could be used to estimate \u03c1i1, and likewise with locations containing a distractor to estimate\n\u03c1i0. Alternatively, one could adopt a Bayesian approach and treat \u03c1it as a random variable, whose\nuncertainty is reduced by the evidence obtained on each trial. Because feature spike rates lie in [0, 1],\nwe de\ufb01ne \u03c1it as a beta random variable, \u03c1it \u223c Beta(\u03b1it, \u03b2it).\nThis Bayesian approach also allows us to specify priors over \u03c1it in terms of imaginary counts, \u03b10\nit\nand \u03b20\nit. For example, in the absence of any task experience, a conservative assumption is that all\nfeature activations are predictive of a target, i.e., \u03c1i1 should be drawn from a distribution biased\ntoward 1, and \u03c1i0 should be drawn from a distribution biased toward 0.\ni.e., P (Tx|Fx) =\nTo compute the target probabilities, we must marginalize over \u03c1,\n\u03c1 P (Tx|Fx, \u03c1)p(\u03c1)d\u03c1. Unfortunately, this integral is impossible to evaluate analytically. We in-\nstead compute the expectation of sx over \u03c1, \u00afsx \u2261 E\u03c1(sx), which has the solution\n(cid:21)\nxi \u2212 2(\u03b1it + \u03b2it \u2212 1)\n\u02dcf 2\n\n(cid:20)(\u03b1it + \u03b2it \u2212 1)(\u03b1it + \u03b2it \u2212 2)\n\n\u00afsx =X\n\n(1 \u2212 2t)\n\nR\n\n(\u03b1it \u2212 1)(\u03b2it \u2212 1)\n\n1X\n\ni\n\nt=0\n\n\u03b2it \u2212 1\n\n\u02dcfxi + \u03b1it\n\u03b2it \u2212 1\n\n(3)\n\nNote that, like the expression for sx, \u00afsx is a weighted sum of linear and quadratic feature-activity\nterms. When \u03b1it and \u03b2it are large, the distribution of \u03c1it is sharply peaked, and \u00afsx approaches sx\nwith \u03c1it = \u03b1it/(\u03b1it + \u03b2it). When this condition is satis\ufb01ed, ranking by \u00afsx is equivalent to ranking\nby P (Tx|Fx). Although the equivalence is not guaranteed for smaller \u03b1it and \u03b2it, we have found\nthe equivalence to hold in empirical tests. Indeed, in our simulations, we \ufb01nd that de\ufb01ning saliency\nas either sx or \u00afsx yields similar results, reinforcing the robustness of our approach.\n\n2.2 Modeling the Stimulus Environment\n\nThe parameter vectors \u03b1 and \u03b2 maintain a model of the stimulus environment in the context of the\ncurrent task. Following each trial, these parameters must be updated to re\ufb02ect the statistics of the\ntrial. We assume that following a trial, each item in the display has been identi\ufb01ed as either a target\nor distractor. (All other adaptive attention models such as GS make this assumption.)\n\nConsider a location x that has been labeled as type t (1 for target, 0 for distractor), and some feature\ni at that location, Fxi. We earlier characterized Fxi as a binomial random variable re\ufb02ecting a\nspike count; that is, during n time intervals, fxi spikes are observed. Each time interval provides\nevidence as to the value \u03c1it. Given prior distribution \u03c1it \u223c Beta(\u03b1it, \u03b2it), the posterior is \u03c1it|Fxi \u223c\nBeta(\u03b1it + fxi, \u03b2it + n \u2212 fxi). However, to limit the evidence provided from each item, we scale it\n\n4\n\n\fby a factor of n. When all locations are considered, the resulting posterior is:\n1 \u2212 \u02dcfxi\n\n\u03c1it|Fi \u223c Beta\n\n\u02dcfxi, \u03b2it +P\n\n\u03b1it +P\n\nx\u2208\u03c7t\n\nx\u2208\u03c7t\n\n(cid:16)\n\n(cid:17)\n\n(4)\n\nwhere Fi is feature map i and \u03c7t is the set of locations containing elements of type t.\nWith the approach we\u2019ve described, evidence concerning the value of \u03c1it accumulates over a se-\nquence of trials. However, if an environment is nonstationary, this build up of evidence is not\nadaptive. We thus consider a switching model of the environment that speci\ufb01es with probability \u03bb,\nthe environment changes and all evidence should be discarded. The consequence of this assumption\nit).\nis that the posterior distribution is a mixture of Equation 4 and the prior distribution, Beta(\u03b10\nModeling the mixture distribution is problematic because the number of mixture components grows\nlinearly with the number of trials. We could approximate the mixture distribution by the beta dis-\ntribution that best approximates the mixture, in the sense of Kullback-Leibler divergence. However,\nwe chose to adopt a simpler, more intuitive solution: to interpolate between the two distributions.\nThe update rule we use is therefore\n\nit, \u03b20\n\n\u03c1it|Fi \u223c Beta\n\n\u03bb\u03b10\n\nit + (1 \u2212 \u03bb)\n\n\u02dcfxi\n\n, \u03bb\u03b20\n\nit + (1 \u2212 \u03bb)\n\n1 \u2212 \u02dcfxi\n\n.\n\n(5)\n\n \n\n#\n\n\"\n\n\u03b1it + X\n\nx\u2208\u03c7t\n\n\"\n\n\u03b2it + X\n\nx\u2208\u03c7t\n\n#!\n\n3 Simulation Methodology\n\nWe present a step-by-step description of how the model runs to simulate experimental subjects per-\nforming a visual search task. We start by generating a sequence of experimental trials with the\nproperties studied in an experiment. The model is initialized with \u03b1it = \u03b10\nit. On each\nsimulation trial, the following sequence occurs. (1) Feature extraction is performed on the display\nto obtain \ufb01ring rates, \u02dcfxi for each location x and feature type i. (2) Saliency, \u00afsx, is computed for\neach location according to Equation 3. (3) The saliency rank of each location is assessed, and the\nnumber of locations that need to be searched in order to identify the target is assumed to be equal to\nthe target rank. Response time should then be linear in target rank. (4) Following each trial, target\nand distractor statistics, \u03b1it and \u03b2it, are updated according to Equation 5.\ni1}, and \u03bb. However, with no reason to\nEGS has potentially many free parameters: {\u03b10\nbelieve that one feature behaves differently than another, we assign all the features the same priors.\nj0 = \u00b5 for all i and j, reducing\nFurther, we impose symmetry such that \u03b10\nthe total number of free parameters to three.\n\nit} and {\u03b20\nj1 = \u03bd and \u03b10\n\nit and \u03b2it = \u03b20\n\ni0 = \u03b20\n\ni1 = \u03b20\n\nBecause we are focused on the issue of attentional control, we wanted to sidestep other issues, such\nas feature extraction. Consequently, EGS uses the front-end preprocessing of GS. GS takes as input\nan 8 \u00d7 8 array of locations, each of which can contain a single colored bar. As described earlier, GS\nanalyzes the input via four broadly tuned features for color, and four for orientation. After a local\ncontrast-enhancement operator, GS yields activation values in [0, 1] at each of 8 \u00d7 8 locations for\neach of eight feature dimensions. We treat the activation produced by GS for feature i at location\nx as the \ufb01ring rate \u02dcfxi needed to simulate EGS. Like GS, the response time of EGS is linear in the\ntarget ranking. A scaling factor is required to convert rank to response time; we chose 25 msec/item,\nwhich is a fourth free parameter of GS.\n\n4 Results\n\nWe simulated EGS on a series of tasks that Wolfe (1994) used to evaluate GS. Because GS is limited\nto processing displays containing colored, oriented lines, some of the tasks constructed by Wolfe did\nnot have an exact correspondence in the experimental literature. Rather, Wolfe, the leading expert\nin visual search, identi\ufb01ed key \ufb01ndings that he wanted GS to replicate. Because EGS shares front-\nend processing with GS, EGS is limited to the same set of tasks as GS. Consequently, we present a\ncomparison of GS and EGS.\n\nWe began by replicating Wolfe\u2019s results on GS. This replication was nontrivial, because GS contains\nmany parameters, rules, and special cases, and published descriptions of GS do not provide a crisp\n\n5\n\n\falgorithmic description of the model. To implement EGS, we simply removed much of the com-\nplexity of GS\u2014including the distinction between bottom-up and top-down weights, heuristics for\nsetting the weights, and the injection of high-amplitude noise into the saliency map\u2014and replaced\nit with Equations 3 and 5.\n\nEach simulation begins with a sequence of 100 practice trials, followed by a sequence of 1000 trials\nfor each blocked condition. Displays on each trial are generated according to the constraints of\nthe task with random variation with respect to unconstrained aspects of the task (e.g., locations of\ndisplay elements, distractor identities, etc.). In typical search tasks, the participant is asked to detect\nthe presence or absence of a target. We omit results for target-absent trials, since GS and EGS make\nidentical predictions for these trials.\n\nThe qualitative performance of EGS does not depend on its free parameters when two conditions\nare met: \u03bb > 0 and \u00b5 > \u03bd. The latter condition yields E[\u03c1i1] > E[\u03c1i0] for all i, and corresponds\nto the bias that features are more likely to be present for a target than for a distractor. This bias is\nrational in order to prevent cognition from suppressing information that could potentially be critical\nto behavior. All simulation results reported here used \u03bb = 0.3, \u00b5 = 25, and \u03bd = 10.\nFigure 2 shows simulation results on six sets of tasks, labeled A\u2013F. The \ufb01rst and third columns (thin\nlines) are data from our replication of GS; the second and fourth columns (thick lines) are data from\nour implementation of EGS. The key feature to note is that results from EGS are qualitatively and\nquantitatively similar to results from GS. As should become clear when we explain the individual\ntasks, EGS probably produces a better qualitative \ufb01t to the human data. (Unfortunately, it is not\nfeasible to place the human data side-by-side with the simulation results. Although the six sets of\ntasks were chosen by Wolfe to represent key experiments in the literature, most are abstractions\nof the original experimental tasks because the retina of GS\u2014and its descendent EGS\u2014is greatly\nsimpli\ufb01ed and cannot accommodate the stimulus arrays used in human studies. Thus, Wolfe never\nintended to quantitatively model speci\ufb01c experimental studies.)\n\nWe brie\ufb02y describe the six tasks. The \ufb01rst four involve displays of a homogeneous color, and search\nfor a target orientation among distractors of different orientations. Task A explores search for a\nvertical (de\ufb01ned as 0\u25e6) target among homogeneous distractors of a different orientation. The graph\nplots the slope of the line relating display size to response latency, as a function of the distractor\norientation. Search slopes become more ef\ufb01cient as the target-distractor similarity decreases. Task\nB explores search for a target among two types of distractors as a function of display size. The\ndistractors are 100\u25e6 apart, and the target is 40\u25e6 and 60\u25e6 from the distractors, but in one case the\ntarget differs from the distractors in that it is the only nearly vertical item, allowing pop out via the\nvertical feature detector. Note that pop out is not wired into EGS, but emerges because EGS identi\ufb01es\nvertical-feature activity as a reliable predictor of the target. Task C examines search ef\ufb01ciency for\na target among heterogeneous distractors, for two target orientations and two degrees of target-\ndistractor similarity. Search is more ef\ufb01cient when the target and distractors are dissimilar. (EGS\nobtains results better matched to the human data than GS.) Task D explores an asymmetry in search:\nit is more ef\ufb01cient to \ufb01nd a tilted bar among verticals than a vertical among tilted. This effect arises\nfrom the same mechanism that yielded ef\ufb01cient search in task B: a unique feature is highly activated\nwhen the target is tilted but not when it is vertical. And search is better guided to features that are\npresent than to features that are absent in EGS, due to the \u03c1 priors. Task E involves conjunction\nsearch. The target is a red vertical among green vertical and red tilted distractors. The red item\u2019s\ntilt can be either 90\u25e6 (i.e., horizontal) or 40\u25e6. Both distractor environments yield inef\ufb01cient search,\nbut\u2014consistent with human data\u2014conjunction searches can vary in their relative dif\ufb01culty.\nTask F examines search ef\ufb01ciency for a red vertical among red 60\u25e6 and yellow vertical distractors,\nas a function of the ratio of the two distractor types. The result shows that search can be guided:\nresponse times become faster as either the target color or target orientation becomes sparse, because\na relatively unique feature serves as a reliable cue to the target. Figure 3a depicts how EGS adapts\ndifferently for the extreme conditions in which the distractors are mostly vertical (dark bars) or\nmostly red (light bars). The bars represent E[\u03c1i0]; the lower the value, the more a feature is viewed as\nreliably discriminating targets and distractors. (E[\u03c1i1] is independent of the experimental condition.)\nWhen distractors are mostly vertical, the red feature is a better cue, and vice versa. The standard\nexplanation for this phenomenon in the psychological literature is that subjects operate in two stages,\n\ufb01rst \ufb01ltering out based on the more discriminative feature, and then serially searching the remaining\n\n6\n\n\fFigure 2: Simulation results on six sets of tasks, labeled A\u2013F, for GS (thin lines, 1st and 3d columns)\nand EGS (thick lines, 2nd and 4th columns). Simulation details are explained in the text.\n\nitems. EGS provides a single-stage account that does not need to invoke specialized mechanisms for\nadaptation to the environment, because all attentional control is adaptation of this sort.\n\nTo summarize, EGS predicts the key factors in visual search that determine search ef\ufb01ciency. Most\nef\ufb01cient search is for a target de\ufb01ned by the presence of a single categorical feature among homo-\ngeneous distractors that do not share the categorical feature. Least ef\ufb01cient search is for target and\ndistractors that share features (e.g., T among L\u2019s, or red verticals among red horizontals and green\nverticals) and/or when distractors are heterogeneous.\n\nWolfe, Cave, & Franzel (1989) conducted an experiment to demonstrate that people can bene\ufb01t\nfrom guidance. This experiment, which oddly has never been modeled by GS, involves search for a\nconjunction target de\ufb01ned by a triple of features, e.g., a big red vertical bar. The target might be pre-\nsented among heterogeneous distractors that share two features with it, such as a big red horizontal\nbar, or distractors that share only one feature with it, such as a small green vertical bar. Performance\nin these two conditions, denoted T3-D2 and T3-D1, respectively, is compared to performance in\na standard conjunction search task, denoted T2-D1, involving targets de\ufb01ned by two features and\nsharing one feature with each distractor. Wolfe et al. reasoned that if search can be guided, saliency\nat a location should be proportional to the number of target-relevant features at that location, and the\nratio of target to distractor salience should be x/y in condition Tx-Dy. Because x > y, the target is\nalways more salient than any distractor, but GS assumes less ef\ufb01cient search due to noise corruption\nof the saliency map, thereby predicting search slopes that are inversely related to x/y. The human\ndata show exactly this pattern, producing almost \ufb02at search slopes for T3-D1. EGS replicates the\nhuman data (Figure 3b) without employing GS\u2019s arbitrary assumption that prioritization is corrupted\nby noise. Instead, x/y re\ufb02ects the amount of evidence available on each trial about features that dis-\ncriminate targets from distractors. Essentially, EGS suggests that x/y determines the availability of\ndiscriminative statistics in the environment. Thus, the limitation is on learning, not on performance.\n\n7\n\n1020304040060080010001200Display SizeRT (msec)T: 20(cid:176); D: 0(cid:176)T: 0(cid:176); D: 20(cid:176)1020304040060080010001200 (D)Feature Search AsymmetryDisplay SizeRT (msec)4812400500600700Display SizeRT (msec)T: 0(cid:176); D: \u221220(cid:176); 20(cid:176)T: 0(cid:176); D: \u221240(cid:176); 40(cid:176)T: 20(cid:176); D: 0(cid:176); 40(cid:176)T: 20(cid:176); D: \u221220(cid:176); 60(cid:176)4812400500600700 (C)Target\u2212Distractor SimilarityDisplay SizeRT (msec)0.25.50.7516008001000Proportion of Red DistractorsRT (msec)0.25.50.7516008001000 (F) Conjunction SearchVarying Distractor RatioProportion of Red DistractorsRT (msec)102030404006008001000Display SizeRT (msec)T: 10(cid:176); D: \u221250(cid:176); 50(cid:176)T: 20(cid:176); D: \u221220(cid:176); 80(cid:176)T: 10(cid:176); D: \u221230(cid:176); 70(cid:176)102030404006008001000Display SizeRT (msec) (B)Categorical Search102030406008001000T: 0(cid:176) R; D: 0(cid:176) G; 40(cid:176) RT: 0(cid:176) R; D: 0(cid:176) G; 90(cid:176) RDisplay SizeRT (msec)102030406008001000 (E) Conjunction SearchVarying Distractor ConfusabilityDisplay SizeRT (msec)1020304050051015Distractor OrientationRT Slope (msec/item)1020304050051015 (A) Vertical Bar AmongHomogeneous DistractorsDistractor OrientationRT Slope (msec/item)\fFigure 3:\n(a) Values\nof E[\u03c1i0] in task F. (b)\nEGS performance on\nthe\ntriple-conjunction\ntask of Wolfe, Cave, &\nFranzel (1989)\n\n5 Discussion\n\nWe presented a model, EGS, that guides visual search via statistics collected over the course of\nexperience in a task environment. The primary contributions of EGS are as follows. First, EGS is a\nsigni\ufb01cantly more elegant and parsimonious theory than its predecessors. In contrast to EGS, GS is a\ncomplex model under the hood with many free parameters and heuristic assumptions. We and other\ngroups have spent many months reverse engineering GS to determine how exactly it works, because\npublished descriptions do not have the speci\ufb01city of an algorithm. Second, to explain human data,\nGS and its ancestors are \u201cretarded\u201d by injecting noise or arbitrarily limiting gains. Although it may\nultimately be determined that the brain suffers from these conditions, one would prefer theories\nthat cast performance of the brain as ideal or rational. EGS achieves this objective via explicit\nassumptions about the generative model of the environment embodied by cognition. In particular,\nthe dumbing-down of GS and its variants is replaced in EGS by the claim that environments are\nnonstationary. If the environment can change from one trial to the next, the cognitive system does\nwell not to turn up gains on one feature dimension at the expense of other feature dimensions. The\nresult is a sensible trade off: attentional control can be rapidly tuned as the task or environment\nchanges, but this \ufb02exibility restricts EGS\u2019s search ef\ufb01ciency when the task and environment remain\nconstant. Third, EGS suggests a novel perspective on attentional control, and executive control more\ngenerally. All other modern perspectives we are aware of treat control as optimization, whereas in\nEGS, control arises directly from statistical inference on the task environment. Our current research\nis exploring the implications of this intriguing perspective.\n\nAcknowledgments\n\nThis research was supported by NSF BCS 0339103 and NSF CSE-SMA 0509521. Support for the second\nauthor comes from an NSF Graduate Fellowship.\n\nReferences\nBaldwin, D., & Mozer, M. C. (2006). Controlling attention with noise: The cue-combination model of visual\nsearch. In R. Sun & N. Miyake (Eds.), Proc. of the 28th Ann. Conf. of the Cog. Sci. Society (pp. 42-47).\nHillsdale, NJ: Erlbaum.\n\nCave, K. R. (1999). The FeatureGate model of visual selection. Psychol. Res., 62, 182\u2013194.\nItti, L., & Koch, C. (2001). Computational modeling of visual attention. Nature Rev. Neurosci., 2, 194\u2013203.\nMozer, M. C. (1991). The perception of multiple objects: A connectionist approach. Cambridge, MA: MIT.\nNavalpakkam, V., & Itti, L. (2006). Optimal cue selection strategy. In Advances in Neural Information Pro-\n\ncessing Systems Vol. 19 (pp. 1-8). Cambridge, MA: MIT Press.\n\nRao, R., Zelinsky, G., Hayhoe, M., & Ballard, D. (2002). Eye movements in iconic visual search. Vis. Res., 42,\n\n1447\u20131463.\n\nSandon, P. A. (1990). Simulating visual attention. Journal of Cog. Neuro., 2, 213\u2013231. Sandon, 1990\nTorralba, A., Oliva, A., Castelhano, M.S., & Henderson, J. M. (2006). Contextual guidance of eye movements\nand attention in real-world scenes: The role of global features on objects search. Psych. Rev., 113, 766\u2013786.\nWolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided search: An alternative to the feature integration\n\nmodel for visual search. Jnl. Exp. Psych.: Hum. Percep. & Perform., 15, 419\u2013433.\n\nWolfe, J. M. (1994). Guided Search 2.0: A revised model of visual search. Psych. Bull. & Rev., 1, 202\u2013238.\nWolfe, J. M. (2007). Guided Search 4.0: Current progress with a model of visual search. In. W. Gray (Ed.),\n\nIntegrated Models of Cognitive Systems. NY: Oxford.\n\nZhang, L., & Cottrell, G. W. (submitted). Probabilistic search: A new theory of visual search. Submitted for\n\npublication.\n\n8\n\nverticalred00.050.10.150.20.25FeatureActivation(a)mostly vert distractorsmostly red distractors01020304040060080010001200Display SizeReaction Time(b)T3\u2212D2T2\u2212D1T3\u2212D1\f", "award": [], "sourceid": 351, "authors": [{"given_name": "David", "family_name": "Baldwin", "institution": null}, {"given_name": "Michael", "family_name": "Mozer", "institution": null}]}