An Information-Theoretic Perspective on Overfitting and Underfitting

doi:10.48550/arXiv.2010.06076

An Information-Theoretic Perspective on Overfitting and Underfitting

We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algorithm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a signature for when a model can overfit or underfit a dataset. We present results upper-bounding algorithm capacity, establish its relationship to quantities in the algorithmic search framework for machine learning, and relate our work to recent information-theoretic approaches to generalization.

Publication:

arXiv e-prints

Pub Date:

October 2020

DOI:

10.48550/arXiv.2010.06076

arXiv:

arXiv:2010.06076

Bibcode:

2020arXiv201006076B

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Information Theory;
Statistics - Machine Learning

E-Print:

Accepted for presentation at The 33rd Australasian Joint Conference on Artificial Intelligence (AJCAI 2020), November 29-30, 2020

NASA/ADS

An Information-Theoretic Perspective on Overfitting and Underfitting

Abstract