Fusion of Similarity Data in Clustering

Part of Advances in Neural Information Processing Systems 18 (NIPS 2005)

Bibtex Metadata Paper


Tilman Lange, Joachim Buhmann


Fusing multiple information sources can yield significant benefits to suc- cessfully accomplish learning tasks. Many studies have focussed on fus- ing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative matrix factorization problem of a mix- ture of similarity measurements. The tradeoff between the informative- ness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model se- lection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets.