Given a set of objects in the visual field, how does the the visual system learn to attend to a particular object of interest while ignoring the rest? How are occlusions and background clutter so effortlessly discounted for when rec(cid:173) ognizing a familiar object? In this paper, we attempt to answer these ques(cid:173) tions in the context of a Kalman filter-based model of visual recognition that has previously proved useful in explaining certain neurophysiological phe(cid:173) nomena such as endstopping and related extra-classical receptive field ef(cid:173) fects in the visual cortex. By using results from the field of robust statistics, we describe an extension of the Kalman filter model that can handle multiple objects in the visual field. The resulting robust Kalman filter model demon(cid:173) strates how certain forms of attention can be viewed as an emergent prop(cid:173) erty of the interaction between top-down expectations and bottom-up sig(cid:173) nals. The model also suggests functional interpretations of certain attention(cid:173) related effects that have been observed in visual cortical neurons. Exper(cid:173) imental results are provided to help demonstrate the ability of the model to perform robust segmentation and recognition of objects and image se(cid:173) quences in the presence of varying degrees of occlusions and clutter.