Model Selection in Clustering by Uniform Convergence Bounds

Buhmann, Joachim; Held, Marcus

Model Selection in Clustering by Uniform Convergence Bounds

Joachim M. Buhmann, Marcus Held

Advances in Neural Information Processing Systems 12 (NIPS 1999)

Abstract

Unsupervised learning algorithms are designed to extract struc(cid:173) ture from data samples. Reliable and robust inference requires a guarantee that extracted structures are typical for the data source, Le., similar structures have to be inferred from a second sample set of the same data source. The overfitting phenomenon in max(cid:173) imum entropy based annealing algorithms is exemplarily studied for a class of histogram clustering models. Bernstein's inequality for large deviations is used to determine the maximally achievable approximation quality parameterized by a minimal temperature. Monte Carlo simulations support the proposed model selection cri(cid:173) terion by finite temperature annealing.

Abstract

Name Change Policy