Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study
Abstract
The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of moderndays overparameterized learning algorithms. This notion refers to the tendency of the optimization algorithm towards a certain structured solution that often generalizes well. Recently, several papers have studied implicit regularization and were able to identify this phenomenon in various scenarios. We revisit this paradigm in arguably the simplest nontrivial setup, and study the implicit bias of Stochastic Gradient Descent (SGD) in the context of Stochastic Convex Optimization. As a first step, we provide a simple construction that rules out the existence of a \emph{distributionindependent} implicit regularizer that governs the generalization ability of SGD. We then demonstrate a learning problem that rules out a very general class of \emph{distributiondependent} implicit regularizers from explaining generalization, which includes strongly convex regularizers as well as nondegenerate normbased regularizations. Certain aspects of our constructions point out to significant difficulties in providing a comprehensive explanation of an algorithm's generalization performance by solely arguing about its implicit regularization properties.
 Publication:

arXiv eprints
 Pub Date:
 March 2020
 arXiv:
 arXiv:2003.06152
 Bibcode:
 2020arXiv200306152D
 Keywords:

 Computer Science  Machine Learning;
 Statistics  Machine Learning