Estimating motion in scenes containing multiple moving objects remains a difficult problem in computer vision. A promising ap(cid:173) proach to this problem involves using mixture models, where the motion of each object is a component in the mixture. However, ex(cid:173) isting methods typically require specifying in advance the number of components in the mixture, i.e. the number of objects in the scene.
Here we show that the number of objects can be estimated auto(cid:173) matically in a maximum likelihood framework, given an assumption about the level of noise in the video sequence. We derive analytical results showing the number of models which maximize the likeli(cid:173) hood for a given noise level in a given sequence. We illustrate these results on a real video sequence, showing how the phase transitions correspond to different perceptual organizations of the scene.
Figure la depicts a scene where motion estimation is difficult for many computer vision systems. A semi-transparent surface partially occludes a second surface, and the camera is translating horizontally. Figure 1 b shows a slice through the horizontal component of the motion generated by the camera - points that are closer to the camera move faster than those further away. In practice, the local motion information would be noisy as shown in figure lc and this imposes conflicting demands on a motion analysis system - reliable estimates require pooling together many measurements while avoiding mixing together measurements derived from the two different surfaces.
Phase Transitions and the Perceptual Organization of Video Sequences
. , .. . ,. "....