Part of Advances in Neural Information Processing Systems 21 (NIPS 2008)
Sahand Negahban, Martin J. Wainwright
We consider the following instance of transfer learning: given a pair of regression problems, suppose that the regression coefficients share a partially common support, parameterized by the overlap fraction \overlap between the two supports. This set-up suggests the use of 1,∞-regularized linear regression for recovering the support sets of both regression vectors. Our main contribution is to provide a sharp characterization of the sample complexity of this 1,∞ relaxation, exactly pinning down the minimal sample size n required for joint support recovery as a function of the model dimension \pdim, support size \spindex and overlap \overlap∈[0,1]. For measurement matrices drawn from standard Gaussian ensembles, we prove that the joint 1,∞-regularized method undergoes a phase transition characterized by order parameter \orpar(\numobs,\pdim,\spindex,\overlap)=\numobs(4−3\overlap)slog(p−(2−\overlap)s). More precisely, the probability of successfully recovering both supports converges to 1 for scalings such that \orpar>1, and converges to 0 to scalings for which \orpar<1. An implication of this threshold is that use of 1,∞-regularization leads to gains in sample complexity if the overlap parameter is large enough (\overlap>2/3), but performs worse than a naive approach if \overlap<2/3. We illustrate the close agreement between these theoretical predictions, and the actual behavior in simulations. Thus, our results illustrate both the benefits and dangers associated with block-1,∞ regularization in high-dimensional inference.