Congruence between model and human attention reveals unique signatures of critical visual events

Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)

Bibtex Metadata Paper

Authors

Robert Peters, Laurent Itti

Abstract

Current computational models of bottom-up and top-down components of atten- tion are predictive of eye movements across a range of stimuli and of simple, fixed visual tasks (such as visual search for a target among distractors). How- ever, to date there exists no computational framework which can reliably mimic human gaze behavior in more complex environments and tasks, such as driving a vehicle through traffic. Here, we develop a hybrid computational/behavioral framework, combining simple models for bottom-up salience and top-down rel- evance, and looking for changes in the predictive power of these components at different critical event times during 4.7 hours (500,000 video frames) of observers playing car racing and flight combat video games. This approach is motivated by our observation that the predictive strengths of the salience and relevance mod- els exhibit reliable temporal signatures during critical event windows in the task sequence—for example, when the game player directly engages an enemy plane in a flight combat game, the predictive strength of the salience model increases significantly, while that of the relevance model decreases significantly. Our new framework combines these temporal signatures to implement several event detec- tors. Critically, we find that an event detector based on fused behavioral and stim- ulus information (in the form of the model’s predictive strength) is much stronger than detectors based on behavioral information alone (eye position) or image in- formation alone (model prediction maps). This approach to event detection, based on eye tracking combined with computational models applied to the visual input, may have useful applications as a less-invasive alternative to other event detection approaches based on neural signatures derived from EEG or fMRI recordings.