High-dimensional manifold of solutions in neural networks: insights from statistical physics
Abstract
In these pedagogic notes I review the statistical mechanics approach to neural networks, focusing on the paradigmatic example of the perceptron architecture with binary an continuous weights, in the classification setting. I will review the Gardner's approach based on replica method and the derivation of the SAT/UNSAT transition in the storage setting. Then, I discuss some recent works that unveiled how the zero training error configurations are geometrically arranged, and how this arrangement changes as the size of the training set increases. I also illustrate how different regions of solution space can be explored analytically and how the landscape in the vicinity of a solution can be characterized. I give evidence how, in binary weight models, algorithmic hardness is a consequence of the disappearance of a clustered region of solutions that extends to very large distances. Finally, I demonstrate how the study of linear mode connectivity between solutions can give insights into the average shape of the solution manifold.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2023
- DOI:
- arXiv:
- arXiv:2309.09240
- Bibcode:
- 2023arXiv230909240M
- Keywords:
-
- Condensed Matter - Disordered Systems and Neural Networks;
- Computer Science - Machine Learning;
- Mathematics - Probability;
- Mathematics - Statistics Theory
- E-Print:
- 22 pages, 9 figures, based on a set of lectures done at the "School of the Italian Society of Statistical Physics", IMT, Lucca