Part of Advances in Neural Information Processing Systems 26 (NIPS 2013)
Lee H. Dicker, Dean P. Foster
We model a one-shot learning" situation, where very few (scalar) observations $y_1,...,y_n$ are available. Associated with each observation $y_i$ is a very high-dimensional vector $x_i$, which provides context for $y_i$ and enables us to predict subsequent observations, given their own context. One of the salient features of our analysis is that the problems studied here are easier when the dimension of $x_i$ is large; in other words, prediction becomes easier when more context is provided. The proposed methodology is a variant of principal component regression (PCR). Our rigorous analysis sheds new light on PCR. For instance, we show that classical PCR estimators may be inconsistent in the specified setting, unless they are multiplied by a scalar $c > 1$; that is, unless the classical estimator is expanded. This expansion phenomenon appears to be somewhat novel and contrasts with shrinkage methods ($c < 1$), which are far more common in big data analyses. "