Turing-Universal Learners with Optimal Scaling Laws

doi:10.48550/arXiv.2111.05321

Turing-Universal Learners with Optimal Scaling Laws

Nakkiran, Preetum

For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithm's test performance as a function of number of train samples. Many learning methods in both theory and practice have power-law rates, i.e. performance scales as $n^{-\alpha}$ for some $\alpha > 0$. Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest. We observe the existence of a "universal learner", which achieves the best possible distribution-dependent asymptotic rate among all learning algorithms within a specified runtime (e.g. $O(n^2)$), while incurring only polylogarithmic slowdown over this runtime. This algorithm is uniform, and does not depend on the distribution, and yet achieves best-possible rates for all distributions. The construction itself is a simple extension of Levin's universal search (Levin, 1973). And much like universal search, the universal learner is not at all practical, and is primarily of theoretical and philosophical interest.

Publication:

arXiv e-prints

Pub Date:

November 2021

DOI:

10.48550/arXiv.2111.05321

arXiv:

arXiv:2111.05321

Bibcode:

2021arXiv211105321N

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Computational Complexity;
Mathematics - Statistics Theory;
Statistics - Machine Learning

NASA/ADS

Turing-Universal Learners with Optimal Scaling Laws

Abstract