Fundamental limits to learning closedform mathematical models from data
Abstract
Given a finite and noisy dataset generated with a closedform mathematical model, when is it possible to learn the true generating model from the data alone? This is the question we investigate here. We show that this modellearning problem displays a transition from a lownoise phase in which the true model can be learned, to a phase in which the observation noise is too high for the true model to be learned by any method. Both in the lownoise phase and in the highnoise phase, probabilistic model selection leads to optimal generalization to unseen data. This is in contrast to standard machine learning approaches, including artificial neural networks, which in this particular problem are limited, in the lownoise phase, by their ability to interpolate. In the transition region between the learnable and unlearnable phases, generalization is hard for all approaches including probabilistic model selection.
 Publication:

Nature Communications
 Pub Date:
 February 2023
 DOI:
 10.1038/s4146702336657z
 arXiv:
 arXiv:2204.02704
 Bibcode:
 2023NatCo..14.1043F
 Keywords:

 Computer Science  Machine Learning;
 Condensed Matter  Disordered Systems and Neural Networks;
 Condensed Matter  Statistical Mechanics;
 Physics  Computational Physics;
 Physics  Data Analysis;
 Statistics and Probability
 EPrint:
 doi:10.1038/s4146702336657z