Never Go Full Batch (in Stochastic Convex Optimization)

doi:10.48550/arXiv.2107.00469

Never Go Full Batch (in Stochastic Convex Optimization)

We study the generalization performance of $\text{full-batch}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants. We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within $\epsilon$ after $O(1/\epsilon^2)$ iterations, full-batch methods either need at least $\Omega(1/\epsilon^4)$ iterations or exhibit a dimension-dependent sample complexity.

Publication:

arXiv e-prints

Pub Date:

June 2021

DOI:

10.48550/arXiv.2107.00469

arXiv:

arXiv:2107.00469

Bibcode:

2021arXiv210700469A

Keywords:

Mathematics - Optimization and Control;
Computer Science - Machine Learning

NASA/ADS

Never Go Full Batch (in Stochastic Convex Optimization)

Abstract