Model Selection in Clustering by Uniform Convergence Bounds

Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)

Bibtex Metadata Paper


Joachim Buhmann, Marcus Held


Unsupervised learning algorithms are designed to extract struc(cid:173) ture from data samples. Reliable and robust inference requires a guarantee that extracted structures are typical for the data source, Le., similar structures have to be inferred from a second sample set of the same data source. The overfitting phenomenon in max(cid:173) imum entropy based annealing algorithms is exemplarily studied for a class of histogram clustering models. Bernstein's inequality for large deviations is used to determine the maximally achievable approximation quality parameterized by a minimal temperature. Monte Carlo simulations support the proposed model selection cri(cid:173) terion by finite temperature annealing.