{"title": "Load and Attentional Bayes", "book": "Advances in Neural Information Processing Systems", "page_first": 369, "page_last": 376, "abstract": "Selective attention is a most intensively studied psychological phenomenon, rife with theoretical suggestions and schisms. A critical idea is that of limited capacity, the allocation of which has produced half a century's worth of conflict about such phenomena as early and late selection. An influential resolution of this debate is based on the notion of perceptual load (Lavie, 2005, TICS, 9: 75), which suggests that low-load, easy tasks, because they underuse the total capacity of attention, mandatorily lead to the processing of stimuli that are irrelevant to the current attentional set; whereas high-load, difficult tasks grab all resources for themselves, leaving distractors high and dry. We argue that this theory presents a challenge to Bayesian theories of attention, and suggest an alternative, statistical, account of key supporting data.", "full_text": "Load and Attentional Bayes\n\nPeter Dayan\n\nGatsby Computational Neuroscience Unit, UCL\n\nLondon, England, WC1N 3AR\ndayan@gatsby.ucl.ac.uk\n\nAbstract\n\nSelective attention is a most intensively studied psychological phenomenon, rife\nwith theoretical suggestions and schisms. A critical idea is that of limited capacity,\nthe allocation of which has produced continual con\ufb02ict about such phenomena\nas early and late selection. An in\ufb02uential resolution of this debate is based on\nthe notion of perceptual load (Lavie, 2005), which suggests that low-load, easy\ntasks, because they underuse the total capacity of attention, mandatorily lead to\nthe processing of stimuli that are irrelevant to the current attentional set; whereas\nhigh-load, dif\ufb01cult tasks grab all resources for themselves, leaving distractors high\nand dry. We argue that this theory presents a challenge to Bayesian theories of\nattention, and suggest an alternative, statistical, account of key supporting data.\n\n1 Introduction\n\nIt was some \ufb01fty years after James (1950)\u2019s famously poetic description of our capacities for atten-\ntion that more analytically-directed experiments began, based originally on dichotic listening Cherry\n(1953). There are three obvious dichotic tasks: (i) being able to interpret fully two separate streams\nof information coming into the two ears; (ii) the less ambitious version of this of being able to in-\nterpret fully one of the streams, speci\ufb01ed top-down, without interference from the other one; and\n(iii) being able to combine information from the two ears appropriately, perhaps into a single per-\ncept. Various forms, interpretations and con\ufb02icts about these three tasks have permeated the \ufb01eld of\nattention ever since (Driver, 2001; Paschler, 1998), driven by different notions of the computational\ntasks and constraints at hand.\n\nThe experiments in dichotic listening coincided with the quickly burgeoning realization that math-\nematical concepts from Shannonian information theory would be very helpful for understanding\nbiological information processing. One central concept in information theory is that of a limited ca-\npacity channel, and Broadbent (1958) adopted this as a formal basis for understanding the necessity\nfor, and hence the nature of, selection. Broadbent (1958)\u2019s theory critically involves early selection,\nin that following a \ufb01rst, automatic, parallel stage of low-level perceptual processing (itself the sub-\nject of important studies of bottom-up in\ufb02uences on selection, Zhaoping, 2006), a relevant stream\nshould be selected for subsequent higher-level, semantic, processing, leaving any irrelevant streams\nin the cold. However, evidence that information in unattended streams is actually processed seman-\ntically (eg being able to bias the perception of ambiguous words in the attended stream; Mackay,\n1973), led to alternative theories, either late selection (in\ufb02uentially, Deutsch and Deutsch, 1963;\nDuncan, 1980), in which both streams are fully processed, but with the irrelevant stream being pre-\nvented by a selective process at the last step from entering memory or awareness, or weaker forms of\nthis, such as the notion that elements from the irrelevant stream might be attenuated, only sometimes\nprogressing through to higher levels of processing (Treisman, 1960, 1969). Many hypotheses in the\n\ufb01eld depend on this collection of metaphors, nicely exempli\ufb01ed by the zoom-lens theory of Erik-\nsen and St. James (1986) (based on in\ufb02uential experiments on distractor processing such as Eriksen\nand Eriksen, 1974), which suggests that the smaller the attentional focus, the more intense it can\nsomehow be, given that the limited capacity is \u2018spread\u2019 over a smaller area.\n\n\fHowever, of course, late selection makes little sense from a limited capacity viewpoint; and short\nof a theory of what controls the degree of attenuation of irrelevant stimuli, Treisman (1960)\u2019s idea\nis hard to falsify. Here, we consider the seminal sharp operationalization of Lavie and Tsal (1994);\nLavie (2005), who suggested that attenuation is a function of load, such that in easy tasks, irrelevant\ndata is always processed, even at the cost of worse performance on the relevant information, whereas\nin dif\ufb01cult tasks, no capacity remains, and so distractors are more effectively removed. To reiterate,\nthe attentional load hypothesis, although an attractive formalization of attenuation, suggests that the\nbrain is unable on easy tasks to exclude information that is known to be irrelevant. It therefore\ninvolves an arguably infelicitous combination of sophisticated attentional shaping (as to what can be\nattended in high-load situations) with inept control.\n\nAlthough the Bayesian revolution in cognitive science has had a huge impact over modern views of\nsensory processing (see, for instance, Rao et al., 2002, and references therein), having the ability to\nresolve many issues in the \ufb01eld as a whole, there are few recent attempts to build probabilistic models\nfor selective attention (see Shaw, 1982; Palmer, 1994; Dayan and Zemel, 1999; Navalpakkam and\nItti, 2006; Mozer and Baldwin, 2008; Yu and Dayan, 2005; Yu et al., 2008). This is despite the\nmany other computational models of attention (see Itti and Koch, 2001; Zhaoping, 2006). Indeed,\nWhiteley and Sahani (2008) have suggested that this lacuna arises from a focus on optimal Bayesian\ninference in the face of small numbers of objects in the focus of attention, rather than the necessity\nof using approximate methods in the light of realistic, cluttered, complex scenes.\n\nSome of the existing probabilistic models are aimed at variants of search (Navalpakkam and Itti,\n2006; Mozer and Baldwin, 2008); however others, including Palmer (1994); Dayan and Zemel\n(1999), and one of the two models in Yu et al. (2008), are more similar to the account here. They\nacknowledge that there is a critical limited resource coming from the existence of neurons with large\nreceptive \ufb01elds into which experimenters slot multiple sensory objects, some relevant, some irrele-\nvant. Probabilistically-correct inference should then implement selection, when data that is known\nto be irrelevant is excluded to the advantage of the relevant information (eg Dayan and Zemel, 1999;\nPalmer, 1994). However, in other circumstances, it will be appropriate to take advantage of the\ninformation about the target that is available in the neurons with large \ufb01elds, even if this means\nallowing some in\ufb02uence on the \ufb01nal decisions from distractors.\n\nHere, we build a Bayesian-inspired account of key data used to argue for the attentional load hypoth-\nesis (based on an extension of Yu et al. (2008)\u2019s model of Eriksen and Eriksen (1974)). Section 2\ndescribes the key data; section 3 the model and results; and section 4 discusses the implications.\n\n2 Attentional Load\n\nFigure 1 shows the central experiment and results from Lavie and de Fockert (2003) that we set out\nto capture. Subjects had to report the identity of a target letter that was either an \u2018X\u2019 or an \u2018N\u2019 (here,\nthe former) presented in one of eight locations arranged in a circle around the \ufb01xation point. The\nreaction times and accuracies of their selections were measured. There was also a distractor letter\nin the further periphery (the larger \u2018N\u2019) which was either compatible (ie the same as the target),\nincompatible (as here, the opposite of the target), or, in so-called neutral trials, a different letter\naltogether.\n\nFigure 1A-C show the three key conditions. Figure 1A is a high-load condition, in that there are\nirrelevant non-targets in the remaining 7 positions around the circle. Figure 1B is a low-load con-\ndition, since there is no non-target. Figure 1C is a critical control, called the degraded low-load\ncondition, and was actually the main topic of Lavie and de Fockert (2003). In this, the dif\ufb01culty of\nthe sensory processing was increased (by making the target smaller and dimmer) without changing\nthe attentional (ie selectional) load.\n\nFigure 1D shows the mean reaction times (RTs) for these conditions for the three sorts of distractor\n(RTs suf\ufb01ce here, since there was no speed accuracy tradeoff at work in the different conditions;\ndata not shown). There are three key results:\n\n1. The central \ufb01nding about attentional load is that the distractor exerted a signi\ufb01cant effect\nover target processing only in the low load case \u2013 that is, an incompatible distractor slowed\ndown the RTs compared with a neutral distractor for the low load case but not the high load\ncase.\n\n\fFigure 1: The attentional load task, from Lavie and de Fockert (2003). Subjects had to judge whether\na target letter in the central circle around \ufb01xation was \u2018N\u2019 or \u2018X\u2019 in the face of a compatible, incom-\npatible (shown) or neutral distractor. A) high-load condition with non-target letters occupying the\nother positions in the circle. B) low-load condition with no non-target letters. C) degraded low-load\ncondition with no non-targets but a smaller (not shown) and darker target. D) reaction times (RTs)\nfor the conditions, averaging only over correct choices.\n\n2. Since, in the degraded low-load case the RTs were slower but the in\ufb02uence of the distractor\nwas if anything greater, this could not just be a function of the processing time or dif\ufb01culty.\nIndeed, Lavie and de Fockert (2003) noted the distinction made by Norman and Bobrow\n(1975) between data- and resource-limited processing, with excess resources (putatively\nample, given the low load) unable to make up for the poor quality sensory data, and so\npredicted this greater distractor impact.\n\n3. It is apparent that compatible distractors were of almost no help in any case, whereas in-\n\ncompatible distractors were harmful.\n\n3 The Bayesian model\n\nThe data in \ufb01gure 1 pose the question for normative modeling as to why the distractor would corrupt\nprocessing of the target in the easy, low-load, case, but not the dif\ufb01cult, high-load case. No normative\naccount could simply assume that extra data \u2018leak\u2019 through in the low-load condition (which is the\nattentional load hypothesis) if the subjects have the ability to fashion attention far more \ufb01nely in\nother cases, such as that of high load.\n\nWe argue that these results stem from the simple observation that the visual system has available\nreceptive \ufb01elds with a range of sizes, including smaller, spatially precise ones, which can be nicely\ncon\ufb01ned to the target; and larger, spatially extended ones, which may include both target and dis-\ntractor. In this case, normative processing will combine information from all the receptive \ufb01elds,\nwith Bayesian inference and marginalization exactly eliminating any substantial impact from those\nthat are useless or confusing. In the high load case, the proximal non-target stimuli have the effect of\nadding so much extra noise to the units with large receptive \ufb01elds compared with their signal about\nthe target, that only the smallest receptive \ufb01elds will be substantially useful. This implies that the\ndistractor will exert little in\ufb02uence. In the low load case, large receptive \ufb01elds that also include the\ndistractor will be usefully informative about the target, and so the distractor will exert an in\ufb02uence.\nNote that this happens automatically through inference \u2013 indeed to make this point starkly, there is\nno explicit attentional control signal in our model whatsoever, only inference and marginalization.1\n\n1Note that Lavie and de Fockert (2003) chose the conditions in the experiment at random, so many forms\n\nof top-down selection would not be possible.\n\n\fload\nlow\nhigh\n\nn\n0\n+1\n\nneutral\nn\nt\n0\n+c\n+c\n-1\n\nd\n0\n0\n\nn\n0\n+1\n\nincompatible\n\nt\n+c\n+c\n\nn\n0\n-1\n\ncompatible\n\nd\n-1\n-1\n\nn\n0\n+1\n\nt\n+c\n+c\n\nn\n0\n-1\n\nd\n+1\n+1\n\nTable 1: Our version of the task. This table shows 6 out of the 18 conditions. Each display consists\nof four stimulus positions labelled n for the non-targets; t for the target (shown in the table, though\nnot the display, as being boxed); and d for the distractor, which is relatively far from the target.\nThe target takes the values \u00b1c, where c acts like a contrast; subjects have to report its sign. The\ndistractor can be 0 (neutral) or \u00b11; and is compatible if it has the same sign as the target (and\nconversely, incompatible). Load is increased by having non-zero non-targets which are spatially\nbalanced, with mean 0, so providing no net information about the sign of the target, but only noise.\nThe 18 conditions come from using c = \u00b11 and c = \u00b10.3, with the degraded condition (|c| = 0.3)\nonly being run for the case of low load, as in \ufb01gure 1D.\n\nLavie and de Fockert (2003)\u2019s experiment is rather complicated. Table 1 shows our simpli\ufb01cation\nof it, to a form which is slightly closer to a version of an Eriksen task (Eriksen and Eriksen, 1974)\nwith two optional \ufb02ankers in known positions on either size of the target (the non-targets) and a\nfarther-\ufb02ung distractor (the input layer of \ufb01gure 2A cartoons the spatial arrangement). The target\ntakes the value \u00b1c; subjects have to report its sign. The distractor can be neutral (0) or have the same\nsign as (compatible) or a different sign from (incompatible) the target. In the low load condition,\nthe non-target units are 0; in the high load, one is +1; the other is \u22121, making them balanced, but\nconfusing, because they lead to excess noise.\n\nThe generative model\n\nTable 1 indicates the values determining the various conditions from the perspective of the experi-\nmenter. We assume that the subject performs inference about the sign of the target based on noisy\nobservations created by a generative model. In the generative model, the values in table 1 amount to\nhidden structure, which, as in Yu et al. (2008), is mapped and mixed through various receptive \ufb01elds\nto provide the noisy input to a Bayesian recognition model. The job of the recognition model is to\ncalculate the posterior probability of the various hidden settings given data, and, by marginalizing\n(summing) out all the hidden settings apart from the state of the target, report on its sign.\n\nFigure 2A shows the generative model, indicating the receptive \ufb01elds (RFs) associated with this\nmixing. We consider 8 topographically-mapped units, 4 with small RFs covering only a single input\n(the generative weights are just the identity map); and 4 with large RFs (in which the inputs are\nmixed together more holistically). Since the distractor is relatively far from the target and non-target\nstimuli, the weights associated with its hidden values are lower for the three large RFs mapped to the\ntarget and non-target hidden units; the target and non-target hidden units have smaller weights to the\ngenerated input associated with the distractor. For simplicity, we treat the distractor as equidistant\nfrom the target and non-target input, partially modeling the fact that it can be in different locations.\nWe assume a crude form of signal-dependent noise; it is this that makes the non-target stimuli so\ndevastating.\n\nFigure 2B shows the means and standard deviations arising from the generative model for the 8\nunits (one per column) for the six conditions in table 1 (rows from top to bottom \u2013 low load: neutral,\nincompatible, compatible; then high load: neutral, incompatible, compatible). For this \ufb01gure, c =\n+1. The means associated with the small and large RF target units show the lack of bias from\nthe non-targets in the high-load condition; and for the large RF case, the bias associated with the\ndistractor.\nThe standard deviations play the most critical role in the model, de\ufb01ning what it means for the non-\ntarget stimuli, when present, to make inference dif\ufb01cult. They therefore constitute a key modeling\nassumption. In the high load case, the units with the large RFs are assumed to have very high stan-\ndard deviations, coming from a crude form of signal-dependent noise. This captures the relatively\nuselessness of these large RFs in the high load condition. However, and importantly, their mean\nvalues are unaffected by the non-target stimuli, since the non-targets are balanced between positive\nand negative values, preferring neither sign of target.\n\n\fA\n\nn\n\nt\n\nn\n\nd\n\ninput\n\nweights\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n7\n\n8\n\nsmall RFs\n\nlarge RFs\n\nB\nunit #\ninco\nneut\ncomp\ninco\nneut\ncomp\n\nw\no\n\nl\n\nh\ng\nh\n\ni\n\nd\na\no\n\nl\n \n\nn\n\nt\nt\n\na\n\nmean\n3 4 5\n6 7 8\n\n1\n\n2\n\nstd\n3 4 5\n\n6 7 8\n\n1\n\n2\n\nn n dt\nsmall\n\nlarge\n\nRF size\n\nn n dt\nsmall\n\nlarge\n\nRF size\n\nFigure 2: The generative model. A) In the model, the four input units, representing non-targets,\nthe target and the distractor, are assumed to generate 8 input units which fall into two groups, with\nsmall and large receptive \ufb01elds (RFs). The Hinton diagrams of the weights indicate how the RFs are\nrepresented (all weights are positive; the maximum value is 0.3). B) These plots show the means\nand standard deviations in the generative model associated with the 8 input units for the low and\nhigh load cases shown in table 1 (in raster scan order). The means for the large RFs (based on the\nweights in A) are unaffected by the load; the standard deviations for the units with large receptive\n\ufb01elds are much higher in the high load condition. Standard deviations are affected by a coarse form\nof signal-dependent noise.\n\nIn all cases, a new sample from the generative model is provided at each time step; the noise cor-\nrupting each of the observed units is assumed to be Gaussian, and independent across units and over\ntime.\n\nThe recognition model\n\nWe build a recognition model based on this generative model. The recognition model is quite similar\nto a sequential probability ratio test (SPRT; Wald, 1947), except that, as in Yu and Dayan (2005); Yu\net al. (2008), it is necessary to perform inference over all the possible values of the hidden variables\n(all the possible values of the hidden structure2), then marginalizing out all the variables apart the\nthe target itself. We accumulate evidence until a threshold of 0.9 is reached on the probability that\nthe target is either positive or negative (reporting whichever one is more likely). However, to take\naccount of the possibility of erroneous, early, responses, there is also a probability of 0.01 per step of\nstopping the accumulation and reporting whichever sign of target has a higher probability (guessing\nrandomly if this probability is 0.5). This factor played a critical role in Yu et al. (2008) in generating\nearly responses.\n\nResults\n\nFigure 3 shows the results of inference based on the model. For each of the conditions, \ufb01gure 3A\nshows the reaction times in the form of the mean number of steps to a choice. Here, as in the data\nin Lavie and de Fockert (2003), the RTs are averaged only over cases in which the model got the\nanswer correct. However, \ufb01gure 3B shows the percentage correct answers in each condition; the\nerrors are relatively rare, and so the RTs plots look identical. The datapoints are averages over more\nthan 35, 000 samples (depending on the actual error rates) and so the errorbars are too small to see.\n\nComparing \ufb01gure 3A with the data in \ufb01gure 1D, it is apparent that the main trends in the data\nare closely captured. This general pattern of results is robust to many different parameter values;\nthough it is possible (by reducing c) to make inference take very much longer still in the degraded\nlow load condition whilst maintaining and boosting the effect of high load. The error probabilities\nin \ufb01gure 3B indicate that the pattern of RTs is not accounted for by a tradeoff between speed and\naccuracy.\n\nThe three characteristics of these data described above are explained in the model as:\n\n1. In the low load case, the lack of non-targets means that the inputs based on the large RFs\nare usefully informative about the target, and therefore automatically play a key role in\nposterior inference. Since these inputs are also in\ufb02uenced by the distractor, there is an RT\n\n2In fact, also including the possibility of a degraded high-load case\n\n\fA\n\ns\np\ne\n\nt\ns\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n \n\nRT\n\nB\n\n \n\nt\n\ne\na\nr\n \nr\no\nr\nr\ne\n\nIncompatible\nNeutral\nCompatible\n\nlow\nload\n\nhigh\nload\n\ndegraded\nlow load\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n \n\nerror rate\n\n \n\nIncompatible\nNeutral\nCompatible\n\nlow\nload\n\nhigh\nload\n\ndegraded\nlow load\n\nFigure 3: Results. A) Mean RTs (steps of inference) for correct choices in each of the 9 cases (since\nthe target is equally often positive and negative, we averaged over these cases. Here, the threshold\non the (marginalized) probability was 0.9, and there was a probability of 0.01 per step that inference\nwould terminate early with whichever response was more probable. B) Error probabilities for the\nsame conditions showing the lack of a speed-accuracy trade-off. All points are averages over more\nthan 35000 points, and so errorbars would be too small to see.\n\ncost in the face of incompatibility. However, in the high load case, the non-target stimuli\nare closer to the target and exert substantial in\ufb02uence over the noise corrupting the large RF\nunits associated with it (and no net signal). This makes these large RF units relatively poor\nsources of information about the target. Thus the smaller RF units are relied upon instead,\nwhich are not affected by the distractor.\n\n2. Rather as suggested in Norman and Bobrow (1975); Lavie and de Fockert (2003): in the\ndata-poor case of the degraded input, it is particularly important to take advantage of in-\nformation from the large RFs, to make inferences about the target; therefore the distractor\nexerts a large in\ufb02uence over target processing.\n\n3. The compatible distractor is helpful to a lesser extent than the incompatible one is harmful,\nfor a couple of reasons. First, there is a ceiling effect for the former coming from the\nnon-linearity of an effective sigmoid function that arises in turning log likelihood ratios\ninto probabilities. Second, compared with a neutral distractor, the compatible distractor\nincreases the (signal-dependent) noise associated with the units with large RFs, reducing\ntheir informativeness about the target.\n\n4 Discussion\n\nIn this paper, we have shown how to account for key results used to argue for an attentional load hy-\npothesis. Our model involves simple Bayesian inference based on a generative process recognizing\nthe existence of small and large receptive \ufb01elds. The attentional load hypothesis suggests that when\nlittle attention is required to solve the set task, inputs associated with distractor stimuli leak through\nwith little attenuation, and so cause disruption; when the task is dif\ufb01cult, attention is totally occu-\npied with the set task, leaving nothing left over. By contrast, we have suggested that an inferential\nmodel taking advantage of all the information in the input will show exactly the same characteristic,\nwith the key issue being whether the units with large RFs, which include the distractor, are rendered\nuseless by the non-target stimuli that make for the high load in the \ufb01rst place. The advantage of this\nversion of an attenuation theory (Treisman, 1960, 1969) is that it obviates the requirement to appeal\nto an inexplicable inef\ufb01ciency, over and above the existence of units with large RFs, and indeed\nrelates this set of selective attentional tasks to the wide range of other accounts of probabilistically-\ncorrect sensory inference.\n\nOne key characteristic of this model (shared with, among others, Yu et al., 2008) is that the form of\nselection it considers is an output of inference rather than an input into it. That is, the model does\nnot employ an explicit attentional mechanism in inference which has the capacity to downplay some\ninput units over others. The model does know the location of the target, and focuses all its resources\n\n\fon it; but there is no further way of boosting or suppressing some RFs compared with others. Most\nof the substantial results on the neuroscience of selective attention (eg Moran and Desimone, 1985;\nDesimone and Duncan, 1995; Reynolds and Chelazzi, 2004) study the focusing process, rather than\nthe post-focus information integration that we have looked at; the forms of attention at play in the\nload-related tasks we have discussed are somewhat orthogonal. It would be interesting to design\nneurophysiological experiments to probe the form of online selection at work in the attentional load\ntasks.\n\nThe difference between the present model and the spatial version of Yu et al. (2008) is that the model\nhere includes RFs of different sizes, whereas in that model, the distractors were always close to the\ntarget. Further, the two neutral conditions here (no distractor, and low load) were not modeled in the\nearlier study. Yu et al. (2008) suggested that the anterior cingulate might monitor con\ufb02ict between\nthe cases of compatible and incompatible distractors as part of an approximate inference strategy.\nThat seems most unlikely here, since the con\ufb02ict would have to be between the multidimensional\ncollection of hidden nuisance variables (notably the cross product between the states of the non-\ntargets and the state of the distractor), which seems implausibly complicated.\n\nThe assumptions of large RFs and their high standard deviations in the high load condition are cer-\ntainly rather simplistic. However, (a) RFs in inferotemporal cortex are indeed very large, allowing\nfor the possibility of distractor interference in the low load condition; and (b) even under the atten-\ntional load hypothesis, the only reason that an unattenuated distractor stimulus would interfere with\ntarget processing is that there is something in common about them, since it is known that there is\nmore to the effects of distractors than just competition at the stage of the actual responses (Driver,\n2001). Further, the assumption that the inputs with large RFs have high standard deviations in the\nhigh load condition is a most straightforward way to capture the essential effect of the non-target\nstimuli in disrupting target processing in a way that forces a more stringent attentional effect associ-\nated with the use of the small RFs.\n\nThe attentional load theory has been applied to many tasks (including the regular Eriksen task,\nEriksen and Eriksen, 1974) as well as the one here. However, it would be good to extend the current\nmodel to match the experimental circumstances in Lavie and de Fockert (2003) more faithfully.\nPerhaps the most signi\ufb01cant lacuna is that, as in the Eriksen task, we assumed that the subjects knew\nthe location of the target in the stimulus array, whereas in the real experiment, this had to be inferred\nfrom the letters in the circle of targets close to \ufb01xation (\ufb01gure 1A). Modeling this would effectively\nrequire a more complex collection of letter-based RFs, together with a confusion matrix associated\nwith the perceptual similarities of letters. This induces a search problem, more like the one studied\nby Mozer and Baldwin (2008), except, again, multiple sizes of RFs would play a critical role. It\nwould also be worth extending the current model to the much wider range of other tasks used to\nexplore the effects of attentional load (such as Forster and Lavie, 2008).\n\nIn conclusion, we have suggested a particular rationale for an attenuation theory of attention, which\nputs together the three tasks suggested at the outset for dichotic listening. Inputs should automati-\ncally be attenuated to the extent that they do not bear on (or, worse, are confusing with respect to)\na task. The key resource limitation is the restricted number, and therefore, the necessarily broad\ntuning of RFs; the normative response to his makes attenuation and combination kissing cousins.\n\nAcknowledgements\n\nI am most grateful to Louise Whiteley for helpful comments and to her and Nillie Lavie for discus-\nsions. Funding came from the Gatsby Charitable Foundation.\n\nReferences\n\nBroadbent, D. (1958). Perception and communication. OUP, Oxford, England.\nCherry, E. (1953). Some experiments on the recognition of speech with one and with two ears.\n\nJournal of the Acoustical Society of America, 25:975\u2013979.\n\nDayan, P. and Zemel, R. (1999). Statistical models and sensory attention. In ICANN 1999, volume 2.\n\nDesimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual attention. Annu Rev\n\nIEE.\n\nNeurosci, 18:193\u2013222.\n\n\fDeutsch, J. A. and Deutsch, D. (1963). Attention: Some theoretical considerations. Psychol Rev,\n\nDriver, J. (2001). A selective review of selective attention research from the past century. Br J\n\nDuncan, J. (1980). The locus of interference in the perception of simultaneous stimuli. Psychol Rev,\n\n70:80\u201390.\n\nPsychol, 92 Part 1:53\u201378.\n\n87(3):272\u2013300.\n\nEriksen, B. and Eriksen, C. (1974). Effects of noise-letters on identi\ufb01cation of a target letter in a\n\nnonsearch task. Perception & Psychophysics, 16:143\u2013149.\n\nEriksen, C. W. and St. James, J. D. (1986). Visual attention within and around the \ufb01eld of focal\n\nattention: a zoom lens model. Percept Psychophys, 40(4):225\u2013240.\n\nForster, S. and Lavie, N. (2008). Failures to ignore entirely irrelevant distractors: the role of load. J\n\nExp Psychol Appl, 14(1):73\u201383.\n\nItti, L. and Koch, C. (2001). Computational modelling of visual attention. Nat Rev Neurosci,\n\nJames, W. (1890/1950). The Principles of Psychology. Dover, New York, NY.\nLavie, N. (2005). Distracted and confused?: selective attention under load. Trends Cogn Sci,\n\n2(3):194\u2013203.\n\n9(2):75\u201382.\n\nLavie, N. and de Fockert, J. W. (2003). Contrasting effects of sensory limits and capacity limits in\n\nvisual selective attention. Percept Psychophys, 65(2):202\u2013212.\n\nLavie, N. and Tsal, Y. (1994). Perceptual load as a major determinant of the locus of selection in\n\nvisual attention. Percept Psychophys, 56(2):183\u2013197.\n\nMackay, D. (1973). Aspects of the theory of comprehension, memory and attention. Quarterly\n\nJournal of Experimental Psychology,, 25:22\u201340.\n\nMoran, J. and Desimone, R. (1985). Selective attention gates visual processing in the extrastriate\n\ncortex. Science, 229(4715):782\u2013784.\n\nMozer, M. and Baldwin, D. (2008). Experience-guided search: A theory of attentional control. In\nPlatt, J., Koller, D., Singer, Y., and Roweis, S., editors, Advances in Neural Information Process-\ning Systems 20, pages 1033\u20131040. MIT Press, Cambridge, MA.\n\nNavalpakkam, V. and Itti, L. (2006). Optimal cue selection strategy. In Weiss, Y., Sch\u00a8olkopf, B., and\nPlatt, J., editors, Advances in Neural Information Processing Systems 18, pages 987\u2013994. MIT\nPress, Cambridge, MA.\n\nNorman, D. and Bobrow, D. (1975). On Data-limited and Resource-limited Processes. Cognitive\n\nPsychology, 7(1):44\u201364.\n\nPalmer, J. (1994). Set-size effects in visual search: the effect of attention is independent of the\n\nstimulus for simple tasks. Vision Res, 34(13):1703\u20131721.\n\nPaschler, H. (1998). The Psychology of Attention. MIT Press, Cambridge, MA.\nRao, R. P. N., Olshausen, B. A., and Lewicki, M. S., editors (2002). Probabilistic Models of the\n\nReynolds, J. H. and Chelazzi, L. (2004). Attentional modulation of visual processing. Annu Rev\n\nBrain. MIT Press, Cambridge, MA.\n\nNeurosci, 27:611\u2013647.\n\nShaw, M. (1982). Attending to multiple sources of information. Cognitive Psychology, 14:353\u2013409.\nTreisman, A. M. (1960). Contextual cues in selective listening. Quarterly Journal of Experimental\n\nPsychology, 12:242\u2013248.\n\nTreisman, A. M. (1969). Strategies and models of selective attention. Psychol Rev, 76(3):282\u2013299.\nWald, A. (1947). Sequential Analysis. Wiley, New York.\nWhiteley, L. and Sahani, M. (2008). Attention resolves the effects of a computational bottleneck:\n\nmodelling binding, precueing, and task-driven bias. In COSYNE 2008, pages I\u201398.\n\nYu, A., Dayan, P., and Cohen, J. (2008). Bayesian account of attentional control. Journal of Exper-\n\nimental Psychology: Human Percept Psychophys, in press.\n\nYu, A. J. and Dayan, P. (2005). Inference, attention, and decision in a bayesian neural architecture.\nIn Saul, L. K., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing\nSystems 17, pages 1577\u20131584. MIT Press, Cambridge, MA.\n\nZhaoping, L. (2006). Theoretical understanding of the early visual processes by data compression\n\nand data selection. Network, 17(4):301\u2013334.\n\n\f", "award": [], "sourceid": 38, "authors": [{"given_name": "Peter", "family_name": "Dayan", "institution": null}]}