Milind Naphade, Igor Kozintsev, Thomas S. Huang
We propose a novel probabilistic framework for semantic video in(cid:173) dexing. We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. A graphical network of such multijects (multinet) captures scene con(cid:173) text by discovering intra-frame as well as inter-frame dependency relations between the concepts. The main contribution is a novel application of a factor graph framework to model this network. We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. Using the sum-product algorithm  for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detec(cid:173) tion by forcing high-level constraints. This results in a significant improvement in the overall detection performance.