A More Powerful Two-Sample Test in High Dimensions using Random Projection

Part of Advances in Neural Information Processing Systems 24 (NIPS 2011)

Bibtex Metadata Paper

Authors

Miles Lopes, Laurent Jacob, Martin J. Wainwright

Abstract

We consider the hypothesis testing problem of detecting a shift between the means of two multivariate normal distributions in the high-dimensional setting, allowing for the data dimension p to exceed the sample size n. Our contribution is a new test statistic for the two-sample test of means that integrates a random projection with the classical Hotelling T squared statistic. Working within a high- dimensional framework that allows (p,n) to tend to infinity, we first derive an asymptotic power function for our test, and then provide sufficient conditions for it to achieve greater power than other state-of-the-art tests. Using ROC curves generated from simulated data, we demonstrate superior performance against competing tests in the parameter regimes anticipated by our theoretical results. Lastly, we illustrate an advantage of our procedure with comparisons on a high-dimensional gene expression dataset involving the discrimination of different types of cancer.