Which processes underly our ability to quickly recognize familiar objects within a complex visual input scene? In this paper an imple(cid:173) mented neural network model is described that attempts to specify how selective visual attention, perceptual organisation, and invari(cid:173) ance transformations might work together in order to segment, select, and recognize objects out of complex input scenes containing multi(cid:173) ple, possibly overlapping objects. Retinotopically organized feature maps serve as input for two main processing routes: pathway' dealing with location information and the 'what-pathway' computing the shape and attributes of objects. A location-based at(cid:173) tention mechanism operates on an early stage of visual processing selecting a contigous region of the visual field for preferential proces(cid:173) sing. Additionally, location-based attention plays an important role for invariant object recognition controling appropriate normalization processes within the what-pathway. Object recognition is supported through the segmentation of the visual field into distinct entities. In order to represent different segmented entities at the same time, the model uses an oscillatory binding mechanism. Connections between the where-pathway and the what-pathway lead to a flexible coope(cid:173) ration between different functional subsystems producing an overall behavior which is consistent with a variety of psychophysical data.