Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)
Ping Li, Trevor Hastie
Many tasks (e.g., clustering) in machine learning only require the lα distances in- stead of the original data. For dimension reductions in the lα norm (0 < α ≤ 2), the method of stable random projections can efficiently compute the lα distances in massive datasets (e.g., the Web or massive data streams) in one pass of the data. The estimation task for stable random projections has been an interesting topic. We propose a simple estimator based on the fractional power of the samples (pro- jected data), which is surprisingly near-optimal in terms of the asymptotic vari- ance. In fact, it achieves the Cram´er-Rao bound when α = 2 and α = 0+. This new result will be useful when applying stable random projections to distance- based clustering, classifications, kernels, massive data streams etc.