Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)
Thomas Hofmann, Joachim Buhmann
Active data clustering is a novel technique for clustering of proxim(cid:173) ity data which utilizes principles from sequential experiment design in order to interleave data generation and data analysis. The pro(cid:173) posed active data sampling strategy is based on the expected value of information, a concept rooting in statistical decision theory. This is considered to be an important step towards the analysis of large(cid:173) scale data sets, because it offers a way to overcome the inherent data sparseness of proximity data. '''Ie present applications to unsu(cid:173) pervised texture segmentation in computer vision and information retrieval in document databases.