Part of Advances in Neural Information Processing Systems 26 (NIPS 2013)
Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector $\theta_0\in R^p$ from noisy linear observation $y=X\theta_0+w\in R^n$ and the popular estimation procedure of solving an $\ell_1$-penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the $\ell_2$ estimation risk $\|\hat{\theta}-\theta_0\|_2$ and the variance of the noise. These can be used to select the regularization parameter optimally. Our approach combines Stein unbiased risk estimate (Stein'81) and recent results of (Bayati and Montanari'11-12) on the analysis of approximate message passing and risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices $X$ of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on the validity of a certain conjecture from statistical physics. Our approach is the first that provides an asymptotically consistent risk estimator. In addition, we demonstrate through simulation that our variance estimation outperforms several existing methods in the literature.