Optimization Variance: Exploring Generalization Properties of DNNs
Abstract
Unlike the conventional wisdom in statistical learning theory, the test error of a deep neural network (DNN) often demonstrates double descent: as the model complexity increases, it first follows a classical Ushaped curve and then shows a second descent. Through biasvariance decomposition, recent studies revealed that the bellshaped variance is the major cause of modelwise double descent (when the DNN is widened gradually). This paper investigates epochwise double descent, i.e., the test error of a DNN also shows double descent as the number of training epoches increases. By extending the biasvariance analysis to epochwise double descent of the zeroone loss, we surprisingly find that the variance itself, without the bias, varies consistently with the test error. Inspired by this result, we propose a novel metric, optimization variance (OV), to measure the diversity of model updates caused by the stochastic gradients of random training batches drawn in the same iteration. OV can be estimated using samples from the training set only but correlates well with the (unknown) \emph{test} error, and hence early stopping may be achieved without using a validation set.
 Publication:

arXiv eprints
 Pub Date:
 June 2021
 arXiv:
 arXiv:2106.01714
 Bibcode:
 2021arXiv210601714Z
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Artificial Intelligence
 EPrint:
 Work in progress