Automatic Choice of Dimensionality for PCA

Part of Advances in Neural Information Processing Systems 13 (NIPS 2000)

Bibtex Metadata Paper


Thomas Minka


A central issue in principal component analysis (PCA) is choosing the number of principal components to be retained. By interpreting PCA as density estimation, we show how to use Bayesian model selection to es(cid:173) timate the true dimensionality of the data. The resulting estimate is sim(cid:173) ple to compute yet guaranteed to pick the correct dimensionality, given enough data. The estimate involves an integral over the Steifel manifold of k-frames, which is difficult to compute exactly. But after choosing an appropriate parameterization and applying Laplace's method, an accu(cid:173) rate and practical estimator is obtained. In simulations, it is convincingly better than cross-validation and other proposed algorithms, plus it runs much faster.