Data Clustering by Markovian Relaxation and the Information Bottleneck Method

Part of Advances in Neural Information Processing Systems 13 (NIPS 2000)

Bibtex Metadata Paper


Naftali Tishby, Noam Slonim


We introduce a new, non-parametric and principled, distance based clustering method. This method combines a pairwise based ap(cid:173) proach with a vector-quantization method which provide a mean(cid:173) ingful interpretation to the resulting clusters. The idea is based on turning the distance matrix into a Markov process and then examine the decay of mutual-information during the relaxation of this process. The clusters emerge as quasi-stable structures dur(cid:173) ing this relaxation, and then are extracted using the information bottleneck method. These clusters capture the information about the initial point of the relaxation in the most effective way. The method can cluster data with no geometric or other bias and makes no assumption about the underlying distribution.