Phase diagram of stochastic gradient descent in high-dimensional two-layer neural networks
Abstract
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.
- Publication:
-
Journal of Statistical Mechanics: Theory and Experiment
- Pub Date:
- November 2023
- DOI:
- 10.1088/1742-5468/ad01b1
- arXiv:
- arXiv:2202.00293
- Bibcode:
- 2023JSMTE2023k4008V
- Keywords:
-
- learning theory;
- machine learning;
- phase diagrams;
- Statistics - Machine Learning;
- Condensed Matter - Disordered Systems and Neural Networks;
- Computer Science - Machine Learning
- E-Print:
- 20 pages