Unsupervised Feature Selection for the $k$-means Clustering Problem

Boutsidis, Christos; Drineas, Petros; Mahoney, Michael W.

Unsupervised Feature Selection for the $k$ -means Clustering Problem

Part of Advances in Neural Information Processing Systems 22 (NIPS 2009)

Bibtex Metadata Paper

Authors

Christos Boutsidis, Petros Drineas, Michael W. Mahoney

Abstract

We present a novel feature selection algorithm for the $k$ -means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter $\epsilon \in (0,1)$ , selects and appropriately rescales in an unsupervised manner $\Theta(k \log(k / \epsilon) / \epsilon^2)$ features from a dataset of arbitrary dimensions. We prove that, if we run any $\gamma$ -approximate $k$ -means algorithm ( $\gamma \geq 1$ ) on the features selected using our method, we can find a $(1+(1+\epsilon)\gamma)$ -approximate partition with high probability.

Unsupervised Feature Selection for the kk-means Clustering Problem

Authors

Abstract

Name Change Policy

Unsupervised Feature Selection for the $k$ -means Clustering Problem