FastEx: Hash Clustering with Exponential Families

Part of Advances in Neural Information Processing Systems 25 (NIPS 2012)

Bibtex Metadata Paper

Authors

Amr Ahmed, Sujith Ravi, Alex Smola, Shravan Narayanamurthy

Abstract

Clustering is a key component in data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as $k$-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters.