{"title": "The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search", "book": "Advances in Neural Information Processing Systems", "page_first": 1569, "page_last": 1576, "abstract": null, "full_text": "The Role of Top-down and Bottom-up Processes\nin Guiding Eye Movements during Visual Search\n\nGregory J. Zelinsky\u2020\u2021, Wei Zhang\u2021, Bing Yu\u2021, Xin Chen\u2020\u2217, Dimitris Samaras\u2021\n\nDept. of Psychology\u2020, Dept. of Computer Science\u2021\n\nState University of New York at Stony Brook\n\nStony Brook, NY 11794\n\nGregory.Zelinsky@stonybrook.edu\u2020, xichen@ic.sunysb.edu\u2217\n\n{wzhang,ybing,samaras}@cs.sunysb.edu\u2021\n\nAbstract\n\nTo investigate how top-down (TD) and bottom-up (BU) information is\nweighted in the guidance of human search behavior, we manipulated the\nproportions of BU and TD components in a saliency-based model. The\nmodel is biologically plausible and implements an arti\ufb01cial retina and\na neuronal population code. The BU component is based on feature-\ncontrast. The TD component is de\ufb01ned by a feature-template match to a\nstored target representation. We compared the model\u2019s behavior at differ-\nent mixtures of TD and BU components to the eye movement behavior\nof human observers performing the identical search task. We found that a\npurely TD model provides a much closer match to human behavior than\nany mixture model using BU information. Only when biological con-\nstraints are removed (e.g., eliminating the retina) did a BU/TD mixture\nmodel begin to approximate human behavior.\n\n1. Introduction\nThe human object detection literature, also known as visual search, has long struggled with\nhow best to conceptualize the role of bottom-up (BU) and top-down (TD) processes in guid-\ning search behavior.1 Early theories of search assumed a pure BU feature decomposition of\nthe objects in an image, followed by the later reconstitution of these features into objects if\nthe object\u2019s location was visited by spatially directed visual attention [1]. Importantly, the\ndirection of attention to feature locations was believed to be random in these early models,\nthereby making them devoid of any BU or TD component contributing to the guidance of\nattention to objects in scenes.\nThe belief in a random direction of attention during search was quashed by Wolfe and\ncolleague\u2019s [2] demonstration of TD information affecting search guidance. According to\ntheir guided-search model [3], preattentively available features from objects not yet bound\nby attention can be compared to a high-level target description to generate signals indicat-\ning evidence for the target in a display. The search process can then use these signals to\n\n1In this paper we will refer to BU guidance as guidance based on task-independent signals arising\nfrom basic neuronal feature analysis. TD guidance will refer to guidance based on information not\nexisting in the input image or proximal search stimulus, such as knowledge of target features or\nprocessing constraints imposed by task instruction.\n\n\fguide attention to display locations indicating the greatest evidence for the target. More\nrecent models of TD target guidance can accept images of real-world scenes as stimuli\nand generate sequences of eye movements that can be directly compared to human search\nbehavior [4].\nPurely BU models of attention guidance have also enjoyed a great deal of recent research in-\nterest. Building on the concept of a saliency map introduced in [5], these models attempt to\nuse biologically plausible computational primitives (e.g., center-surround receptive \ufb01elds,\ncolor opponency, winner-take-all spatial competition, etc.) to de\ufb01ne points of high salience\nin an image that might serve as attractors of attention. Much of this work has been dis-\ncussed in the context of scene perception [6], but recently Itti and Koch [7] extended a\npurely BU model to the task of visual search. They de\ufb01ned image saliency in terms of\nintensity, color, and orientation contrast for multiple spatial scales within a pyramid. They\nfound that a saliency model based on feature-contrast was able to account for a key \ufb01nding\nin the behavioral search literature, namely very ef\ufb01cient search for feature-de\ufb01ned targets\nand far less ef\ufb01cient search for targets de\ufb01ned by conjunctions of features [1].\nGiven the body of evidence suggesting both TD and BU contributions to the guidance of\nattention in a search task, the logical next question to ask is whether these two sources of\ninformation should be combined to describe search behavior and, if so, in what proportion?\nTo answer this question, we adopt a three-pronged approach. First, we implement two mod-\nels of eye movements during visual search, one a TD model derived from the framework\nproposed by [4] and the other a BU model based on the framework proposed by [7]. Sec-\nond, we use an eyetracker to collect behavioral data from human observers so as to quantify\nguidance in terms of the number of \ufb01xations needed to acquire a target. Third, we combine\nthe outputs of the two models in various proportions to determine the TD/BU weighting\nbest able to describe the number of search \ufb01xations generated by the human observers.\n2. Eye movement model\n\nFigure 1: Flow of processing through the model. Abbreviations: TD SM (top-down\nsaliency map); BU SM (bottom-up saliency map); SF(suggested \ufb01xation point); TSM\n(thresholded saliency map); CF2HS (Euclidean distance between current \ufb01xation and\nhotspot); SF2CF(Euclidean distance between suggested \ufb01xation and current \ufb01xation);\nEMT (eye movement threshold); FT (foveal threshold).\n\nIn this section we introduce a computational model of eye movements during visual search.\nThe basic \ufb02ow of processing in this model is shown in Figure 1. Generally, we repre-\n\n\fsent search scenes in terms of simple and biologically-plausible visual feature-detector\nresponses (colors, orientations, scales). Visual routines then act on these representations to\nproduce a sequence of simulated eye movements. Our framework builds on work described\nin [8, 4], but differs from this earlier model in several important respects. First, our model\nincludes a perceptually-accurate simulated retina, which was not included in [8, 4]. Sec-\nond, the visual routine responsible for moving gaze in our model is fundamentally different\nfrom the earlier version. In [8, 4], the number of eye movements was largely determined\nby the number of spatial scale \ufb01lters used in the representation. The method used in the\ncurrent model to generate eye movements (Section 2.3) removes this upper limit. Third,\nand most important to the topic of this paper, the current model is capable of integrating\nboth BU and TD information in guiding search behavior. The [8, 4] model was purely TD.\n\n2.1. Overview\nThe model can be conceptually divided into three broad stages: (1) the creation of a saliency\nmap (SM) based on TD and BU analysis of a retinally-transformed image, (2) recognizing\nthe target, and (3) the operations required to generate eye movements. Within each of these\nstages are several more speci\ufb01c operations, which we will now describe brie\ufb02y in an order\ndetermined by the processing \ufb02ow.\nInput image: The model accepts as input a high-resolution (1280 \u00d7 960 pixel) image of\nthe search scene, as well as a smaller image of the search target. A point is speci\ufb01ed on the\ntarget image and \ufb01lter responses are collected from a region surrounding this point. In the\ncurrent study this point corresponded to the center of the target image.\nRetina transform: The search image is immediately transformed to re\ufb02ect the acuity lim-\nitations imposed by the human retina. To implement this neuroanatomical constraint, we\nadopt a method described in [9], which was shown to provide a good \ufb01t to acuity limitations\nin the human visual system. The approach takes an image and a \ufb01xation point as input, and\noutputs a retina-transformed version of the image based on the \ufb01xation point (making it a\ngood front-end to our model). The initial retina transformation assumes \ufb01xation at the cen-\nter of the image, consistent with the behavioral experiment. A new retina transformation of\nthe search image is conducted after each change in gaze.\nSaliency maps: Both the TD and the BU saliency maps are based on feature responses\nfrom Gaussian \ufb01lters of different orientations, scales, colors, and orders. These two maps\nare then combined to create the \ufb01nal SM used to guide search (see Section 2.2 for details).\nNegativity map: The negativity map keeps a spatial record of every nontarget location that\nwas \ufb01xated and rejected through the application of Gaussian inhibition, a process similar\nto inhibition of return [10] that we refer to as \u201dzapping\u201d. The existence of such a map is\nsupported by behavioral evidence indicating a high-capacity spatial memory for rejected\nnontargets in a search task [11].\nFind hotspot: The hotspot (HS) is de\ufb01ned as the point on the saliency map having the\nlargest saliency value. Although no biologically plausible mechanism for isolating the\nhotspot is currently used, we assume that a standard winner-take-all (WTA) algorithm can\nbe used to \ufb01nd the SM hotspot.\nRecognition thresholds: Recognition is accomplished by comparing the hotspot value\nwith two thresholds. The model terminates with a target-present judgment if the hotspot\nvalue exceeds a high target-present threshold, set at .995 in the current study. A target-\nabsent response is made if the hotspot value falls below a low target-absent threshold (not\nused in the current study). If neither of these termination criteria are satis\ufb01ed, processing\npasses to the eye movement stage.\nFoveal threshold: Processing in the eye movement stage depends on whether the model\u2019s\nsimulated fovea is \ufb01xated on the SM hotspot. This event is determined by computing\nthe Euclidean distance between the current location of the fovea\u2019s center and the hotspot\n(CF2HS), then comparing this distance to a foveal threshold (FT). The FT, set at 0.5 deg\n\n\fof visual angle, is determined by the retina transform and viewing angle and corresponds\nto the radius of the foveal window size. The foveal window is the region of the image\nnot blurred by the retina transform function, much like the high-resolution foveola in the\nhuman visual system.\nHotspot out of fovea: If the hotspot is not within the FT, meaning that the object giving rise\nto the hotspot is not currently \ufb01xated, then the model will make an eye movement to bring\nthe simulated fovea closer to the hotspot\u2019s location. In making this movement, the model\nwill be effectively canceling the effect of the retina transform, thereby enabling a judgment\nregarding the hotspot pattern. The destination of the eye movement is computed by taking\nthe weighted centroid of activity on the thresholded saliency map (TSM). See Section 2.3\nfor additional details regarding the centroid calculation of the suggested \ufb01xation point (SF),\nits relationship to the distance threshold for generating an eye movement (EMT), and the\ndynamically-changing threshold used to remove those SM points offering the least evidence\nfor the target (+SM thresh).\nHotspot at fovea: If the simulated fovea reaches the hotspot (CF2HS < FT) and the target\nis still not detected (HS < target-present threshold), the model is likely to have \ufb01xated\na nontarget. When this happens (a common occurrence in the course of a search), it is\ndesirable to inhibit the location of this false target so as not to have it re-attract attention or\ngaze. To accomplish this, we inhibit or \u201dzap\u201d the hotspot by applying a negative Gaussian\n\ufb01lter centered at the hotspot location (set at 63 pixels). Following this injection of negativity\ninto the SM, a new eye movement is made based on the dynamics outlined in Section 2.3.\n\n2.2. Saliency map creation\nThe \ufb01rst step in creating the TD and BU saliency maps is to separate the retina-transformed\nimage into an intensity channel and two opponent-process color channels (R-G and B-\nY). For each channel, we then extract visual features by applying a set of steerable 2D\nGaussian-derivative \ufb01lters, G(t, \u03b8, s), where t is the order of the Gaussian kernel, \u03b8 is\nthe orientation, and s is the spatial scale. The current model uses \ufb01rst and second order\nGaussians, 4 orientations (0, 45, 90 and 180 degrees), and 3 scales (7, 15 and 31 pixels),\nfor a total of 24 \ufb01lters. We therefore obtain 24 feature maps of \ufb01lter responses per channel,\nM(t, \u03b8, s), or alternatively, a 72-dimensional feature vector, F , for each pixel in the retina-\ntransformed image.\nThe TD saliency map is created by correlating the retina-transformed search image with\nthe target feature vector Ft.2\nTo maintain consistency between the two saliency map representations, the same channels\nand features used in the TD saliency map were also used to create the BU saliency map.\nFeature-contrast signals on this map were obtained directly from the responses of the Gaus-\nsian derivative \ufb01lters. For each channel, the 24 feature maps were combined into a single\nmap according to:\n\n(cid:88)\n\nN (|M(t, \u03b8, s)|)\n\n(1)\n\nt,\u03b8,s\n\nwhere N (\u2022) is the normalization function described in [12]. The \ufb01nal BU saliency map\nis then created by averaging the three combined feature maps. Note that this method of\ncreating a BU saliency map differs from the approach used in [12, 7] in that our \ufb01lters\nconsisted of 1st and 2nd order derivatives of Gaussians and not center-surround DoG \ufb01lters.\nWhile the two methods of computing feature contrast are not equivalent, in practice they\nyield very similar patterns of BU salience.\n\n2Note that because our TD saliency maps are derived from correlations between target and scene\nimages, the visual statistics of these images are in some sense preserved and might be described as a\nBU component in our model. Nevertheless, the correlation-based guidance signal requires knowledge\nof a target (unlike a true BU model), and for this reason we will continue to refer to this as a TD\nprocess.\n\n\fFinally, the combined SM was simply a linear combination of the TD and BU saliency\nmaps, where the weighting coef\ufb01cient was a parameter manipulated in our experiments.\n\n2.3. Eye movement generation\nOur model de\ufb01nes gaze position at each moment in time by the weighted spatial average\n(centroid) of signals on the SM, a form of neuronal population code for the generation\nof eye movement [13, 14]. Although a centroid computation will tend to bias gaze in\nthe direction of the target (assuming that the target is the maximally salient pattern in the\nimage), gaze will also be pulled away from the target by salient nontarget points. When\nthe number of nontarget points is large, the eye will tend to move toward the geometric\ncenter of the scene (a tendency referred to in the behavioral literature as the global effect,\n[15, 16]); when the number of points is small, the eye will move more directly to the target.\nTo capture this activity-dependent eye movement behavior, we introduce a moving thresh-\nold, \u03c1, that excludes points from the SM over time based on their signal strength. Initially \u03c1\nwill be set to zero, allowing every signal on the SM to contribute to the centroid gaze com-\nputation. However, with each timestep, \u03c1 is increased by .001, resulting in the exclusion of\nminimally salient points from the SM (+ SM thresh in Figure 1). The centroid of the SM,\nwhat we refer to as the suggested \ufb01xation point (SF), is therefore dependent on the current\nvalue of \u03c1 and can be expressed as:\n\n(cid:88)\n\nSp>\u03c1\n\npSp(cid:80)\n\nSp\n\nSF =\n\n.\n\n(2)\n\nEventually, only the most salient points will remain on the thresholded saliency map\n(TSM), resulting in the direction of gaze to the hotspot. If this hotspot is not the target,\n\u03c1 can be decreased (- SM thresh in Figure 1) after zapping in order to reintroduce points\nto the SM. Such a moving threshold is a plausible mechanism of neural computation easily\ninstantiated by a simple recurrent network [17].\nIn order to prevent gaze from moving with each change in \u03c1, which would result in an\nunrealistically large number of very small eye movements, we impose an eye movement\nthreshold (EMT) that prevents gaze from shifting until a minimum distance between SF\nand CF is achieved (SF2CF > EMT in Figure 1). The EMT is based on the signal and\nnoise characteristics of each retina-transformed image, and is de\ufb01ned as:\n\nEM T = max (F T, d(1 + Cd log Signal\nN oise\n\n)),\n\n(3)\n\nwhere F T is the fovea threshold, C is a constant, and d is the distance between the current\n\ufb01xation and the hotspot. The Signal term is de\ufb01ned as the sum of all foveal saliency\nvalues on the TSM; the N oise term is de\ufb01ned as the sum of all other TSM values. The\nSignal/Noise log ratio is clamped to the range of [\u22121/C, 0]. The lower bound of the SF2CF\ndistance is F T , and the upper bound is d. The eye movement dynamics can therefore be\nsummarized as follows: incrementing \u03c1 will tend to increase the SF2CF distance, which\nwill result in an eye movement to SF once this distance exceeds the EMT.\n3. Experimental methods\nFor each trial, the two human observers and the model were \ufb01rst shown an image of a\ntarget (a tank). In the case of the human observers, the target was presented for one second\nand presumably encoded into working memory. In the case of the model, the target was\nrepresented by a single 72-dimensional feature vector as described in Section 2. A search\nimage was then presented, which remained visible to the human observers until they made\na button press response. Eye movements were recorded during this interval using an ELII\neyetracker. Section 2 details the processing stages used by the model. There were 44\nimages and targets, which were all modi\ufb01ed versions of images in the TNO dataset [18].\nThe images subtended approximately 20\u25e6 on both the human and simulated retinas.\n\n\f4. Experimental results\nModel and human data are reported from 2 experiments. For each experiment we tested 5\nweightings of TD and BU components in the combined SM. Expressed as a proportion of\nthe BU component, these weightings were: BU 0 (TD only), BU .25, BU .5, BU .75, and\nBU 1.0 (BU only).\n4.1. Experiment 1\n\nTable 1: Human and model search behavior at 5 TD/BU mixtures in Experiment 1.\nRetina\nPopulation\nMisses (%)\nFixations\nStd Dev\n\nModel\nTD only BU: 0.25 BU: 0.5\n72.73\n20.08\n12.50\n\nBU: 0.75 BU only\n\nHuman subjects\nH1\n0.00\n4.55\n0.88\n\nH2\n0.00\n4.43\n2.15\n\n77.27\n21.00\n10.29\n\n88.64\n22.40\n12.58\n\n0.00\n4.55\n0.82\n\n36.36\n18.89\n10.44\n\nFigure 2: Comparison of human and model scanpaths at different TD/BU weightings.\n\nAs can be seen from Table 1, the human observers were remarkably consistent in their\nbehavior. Each required an average of 4.5 \ufb01xations to \ufb01nd the target (de\ufb01ned as gaze\nfalling within .5 deg of the target\u2019s center), and neither generated an error (de\ufb01ned by a\nfailure to \ufb01nd the target within 40 \ufb01xations). Human target detection performance was\nmatched almost exactly by a pure TD model, both in terms of errors (0%) and \ufb01xations\n(4.55). This exceptional match between human and model disappeared with the addition\nof a BU component. Relative to the human and TD model, a BU 0.25 mixture model\nresulted in a dramatic increase in the miss rate (36%) and in the average number of \ufb01xations\nneeded to acquire the target (18.9) on those trials in which the target was ultimately \ufb01xated.\nThese high miss and \ufb01xation rates continued to increase with larger weightings of the BU\ncontribution, reaching an unrealistic 89% misses and 22 \ufb01xations with a pure BU model.\nFigure 2 shows representative eye movement scanpaths from our two human observers (a)\nand the model at three different TD/BU mixtures (b, BU 0; c, BU 0.5; d, BU 1.0) for one\nimage. Note the close agreement between the human scanpaths and the behavior of the\n\n\fTD model. Note also that, with the addition of a BU component, the model\u2019s eye either\nwanders to high-contrast patterns (bushes, trees) before landing on the target (c), or misses\nthe target entirely (d).\n4.2. Experiment 2\nRecently, Navalpakkam & Itti [19] reported data from a saliency-based model also inte-\ngrating BU and TD information to guide search. Among their many results, they compared\ntheir model to the purely TD model described in [4] and found that their mixture model\noffered a more realistic account of human behavior. Speci\ufb01cally, they observed that the [4]\nmodel was too accurate, often predicting that the target would be \ufb01xated after only a single\neye movement. Although our current \ufb01ndings would seem to contradict [19]\u2019s result, this\nis not the case. Recall from Section 2.0 that our model differs from [4] in two respects:\n(1) it retinally transforms the input image with each \ufb01xation, and (2) it uses a thresholded\npopulation-averaging code to generate eye movements. Both of these additions would be\nexpected to increase the number of \ufb01xations made by the current model relative to the TD\nmodel described in [4]. Adding a simulated retina should increase the number of \ufb01xations\nby reducing the target-scene TD correlations and increasing the probability of false targets\nemerging in the blurred periphery. Adding population averaging should increase \ufb01xations\nby causing eye movements to locations other than hotspots. It may therefore be the case\nthat [19]\u2019s critique of [4] may be pointing out two speci\ufb01c weaknesses of [4]\u2019s model rather\nthan a general weakness of their TD approach.\nTo test this hypothesis, we disabled the arti\ufb01cial retina and the population averaging code\nin our current model. The model now moves directly from hotspot to hotspot, zapping each\nbefore moving to the next. Without retinal blurring and population averaging, the behavior\nof this simpler model is now driven entirely by a WTA computation on the combined SM.\nMoreover, with a BU weighting of 1.0, this version of our model now more closely approx-\nimates other purely BU models in the literature that also lack retinal acuity limitations and\npopulation dynamics.\n\nTable 2: Human and model search behavior at 5 TD/BU mixtures in Experiment 2.\nNO Retina\nNO Population\nMisses (%)\nFixations\nStd Dev\n\nModel\nTD only BU: 0.25 BU: 0.5\n27.27\n16.60\n12.29\n\n68.18\n14.71\n12.84\n\nHuman subjects\nH1\n0.00\n4.55\n0.88\n\nH2\n0.00\n4.43\n2.15\n\n0.00\n1.00\n0.00\n\n9.09\n8.73\n9.15\n\nBU: 0.75 BU only\n\n56.82\n13.37\n9.20\n\nTable 2 shows the data from this experiment. The \ufb01rst two columns replot the human data\nfrom Table 1. Consistent with [19], we now \ufb01nd that the performance of a purely TD model\nis too good. The target is consistently \ufb01xated after only a single eye movement, unlike the\n4.5 \ufb01xations averaged by human observers. Also consistent with [19] is the observation that\na BU contribution may assist this model in better characterizing human behavior. Although\na 0.25 BU weighting resulted in a doubling of the human \ufb01xation rate and 9% misses, it\nis conceivable that a smaller BU weighting could nicely describe human performance. As\nin Experiment 1, at larger BU weightings the model again generated unrealistically high\nerror and \ufb01xation rates. These results suggest that, in the absence of retinal and neuronal\npopulation-averaging constraints, BU information may play a small role in guiding search.\n5. Conclusions\nTo what extent is TD and BU information used to guide search behavior? The \ufb01ndings from\nExperiment 1 offer a clear answer to this question: when biologically plausible constraints\nare considered, any addition of BU information to a purely TD model will worsen, not\nimprove, the match to human search performance (see [20] for a similar conclusion applied\nto a walking task). The \ufb01ndings from Experiment 2 are more open to interpretation. It may\nbe possible to devise a TD model in which adding a BU component might prove useful,\nbut doing this would require building into this model biologically implausible assumptions.\n\n\fA corollary to this conclusion is that, when these same biological constraints are added to\nexisting BU saliency-based models, these models may no longer be able to describe human\nbehavior.\nA \ufb01nal fortuitous \ufb01nding from this study is the surprising degree of agreement between our\npurely TD model and human performance. The fact that this agreement was obtained by\ndirect comparison to human behavior (rather than patterns reported in the behavioral liter-\nature), and observed in eye movement variables, lends validity to our method. Future work\nwill explore the generality of our TD model, extending it to other forms of TD guidance\n(e.g., scene context) and tasks in which a target may be poorly de\ufb01ned (e.g., categorical\nsearch).\n\nAcknowledgments\nThis work was supported by a grant from the ARO (DAAD19-03-1-0039) to G.J.Z.\nReferences\n[1] A. Treisman and G. Gelade. A feature-integration theory of attention. Cognitive Psychology,\n\n12:97\u2013136, 1980.\n\n[2] J. Wolfe, K. Cave, and S. Franzel. Guided search: An alternative to the feature integration model\nfor visual search. Journal of Experimental Psychology: Human Perception and Performance,\n15:419\u2013433, 1989.\n\n[3] J. Wolfe. Guided search 2.0: A revised model of visual search. Psychonomic Bulletin and\n\nReview, 1:202\u2013238, 1994.\n\n[4] R. Rao, G. Zelinsky, M. Hayhoe, and D. Ballard. Eye movements in iconic visual search. Vision\n\nResearch, 42:1447\u20131463, 2002.\n\n[5] C. Koch and S. Ullman. Shifts of selective visual attention: Toward the underlying neural\n\ncircuitry. Human Neurobiology, 4:219\u2013227, 1985.\n\n[6] L. Itti and C. Koch. Computational modeling of visual attention. Nature Reviews Neuroscience,\n\n2(3):194\u2013203, 2001.\n\n[7] L. Itti and C. Koch. A saliency-based search mechanism for overt and covert shift of visual\n\nattention. Vision Research, 40(10-12):1489\u20131506, 2000.\n\n[8] R. Rao, G. Zelinsky, M. Hayhoe, and D. Ballard. Modeling saccadic targeting in visual search.\n\nIn NIPS, 1995.\n\n[9] J.S. Perry and W.S. Geisler. Gaze-contingent real-time simulation of arbitrary visual \ufb01elds. In\n\nSPIE, 2002.\n\n[10] R. M. Klein and W.J. MacInnes. Inhibition of return is a foraging facilitator in visual search.\n\nPsychological Science, 10(4):346\u2013352, 1999.\n\n[11] C. A. Dickinson and G. Zelinsky. Marking rejected distractors: A gaze-contingent technique\n\nfor measuring memory during search. Psychonomic Bulletin and Review, In press.\n\n[12] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene\n\nanalysis. PAMI, 20(11):1254\u20131259, 1998.\n\n[13] T. Sejnowski. Neural populations revealed. Nature, 332:308, 1988.\n[14] C. Lee, W. Rohrer, and D. Sparks. Population coding of saccadic eye movements by neurons in\n\nthe superior colliculus. Nature, 332:357\u2013360, 1988.\n\n[15] J. Findlay. Global visual processing for saccadic eye movements. Vision Research, 22:1033\u2013\n\n1045, 1982.\n\n[16] G. Zelinsky, R. Rao, M. Hayhoe, and D. Ballard. Eye movements reveal the spatio-temporal\n\ndynamics of visual search. Psychological Science, 8:448\u2013453, 1997.\n\n[17] J. L. Elman. Finding structures in time. Cognitive Science, 14:179\u2013211, 1990.\n[18] A. Toet, P. Bijl, F. L. Kooi, and J. M. Valeton. A high-resolution image dataset for testing search\nand detection models. Technical Report TNO-NM-98-A020, TNO Human Factors Research\nInstitute,, Soesterberg, The Netherlands, 1998.\n\n[19] V. Navalpakkam and L Itti. Modeling the in\ufb02uence of task on attention. Vision Research,\n\n45:205\u2013231, 2005.\n\n[20] K. A. Turano, D. R. Geruschat, and F. H. Baker. Oculomotor strategies for direction of gaze\n\ntested with a real-world activity. Vision Research, 43(3):333\u2013346, 2003.\n\n\f", "award": [], "sourceid": 2805, "authors": [{"given_name": "Gregory", "family_name": "Zelinsky", "institution": null}, {"given_name": "Wei", "family_name": "Zhang", "institution": null}, {"given_name": "Bing", "family_name": "Yu", "institution": null}, {"given_name": "Xin", "family_name": "Chen", "institution": null}, {"given_name": "Dimitris", "family_name": "Samaras", "institution": null}]}