Learning Curve Theory

doi:10.48550/arXiv.2102.04074

Learning Curve Theory

Hutter, Marcus

Recently a number of empirical "universal" scaling law papers have been published, most notably by OpenAI. `Scaling laws' refers to power-law decreases of training or test error w.r.t. more data, larger neural networks, and/or more compute. In this work we focus on scaling w.r.t. data size $n$. Theoretical understanding of this phenomenon is largely lacking, except in finite-dimensional models for which error typically decreases with $n^{-1/2}$ or $n^{-1}$, where $n$ is the sample size. We develop and theoretically analyse the simplest possible (toy) model that can exhibit $n^{-\beta}$ learning curves for arbitrary power $\beta>0$, and determine whether power laws are universal or depend on the data distribution.

Publication:

arXiv e-prints

Pub Date:

February 2021

DOI:

10.48550/arXiv.2102.04074

arXiv:

arXiv:2102.04074

Bibcode:

2021arXiv210204074H

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

26 pages, 6 Figures

NASA/ADS

Learning Curve Theory

Abstract