Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Nilesh Tripuraneni, Michael Jordan, Chi Jin
We provide new statistical guarantees for transfer learning via representation learning--when transfer is achieved by learning a feature representation shared across different tasks. This enables learning on new tasks using far less data than is required to learn them in isolation. Formally, we consider t+1 tasks parameterized by functions of the form fj∘h in a general function class F∘H, where each fj is a task-specific function in F and h is the shared representation in H. Letting C(⋅) denote the complexity measure of the function class, we show that for diverse training tasks (1) the sample complexity needed to learn the shared representation across the first t training tasks scales as C(H)+tC(F), despite no explicit access to a signal from the feature representation and (2) with an accurate estimate of the representation, the sample complexity needed to learn a new task scales only with C(F). Our results depend upon a new general notion of task diversity--applicable to models with general tasks, features, and losses--as well as a novel chain rule for Gaussian complexities. Finally, we exhibit the utility of our general framework in several models of importance in the literature.