Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study

Part of Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020)

Bibtex »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Authors

Assaf Dauber, Meir Feder, Tomer Koren, Roi Livni

Abstract

<p>The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-days overparameterized learning algorithms. This notion refers to the tendency of the optimization algorithm towards a certain structured solution that often generalizes well. Recently, several papers have studied implicit regularization and were able to identify this phenomenon in various scenarios. </p> <p>We revisit this paradigm in arguably the simplest non-trivial setup, and study the implicit bias of Stochastic Gradient Descent (SGD) in the context of Stochastic Convex Optimization. As a first step, we provide a simple construction that rules out the existence of a \emph{distribution-independent} implicit regularizer that governs the generalization ability of SGD. We then demonstrate a learning problem that rules out a very general class of \emph{distribution-dependent} implicit regularizers from explaining generalization, which includes strongly convex regularizers as well as non-degenerate norm-based regularizations. Certain aspects of our constructions point out to significant difficulties in providing a comprehensive explanation of an algorithm's generalization performance by solely arguing about its implicit regularization properties.</p>