A spring-block theory of feature learning in deep neural networks
Abstract
Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry. How this phenomenon emerges from collective action of nonlinearity, noise, learning rate, and other choices that shape the dynamics, has eluded first-principles theories built from microscopic neuronal dynamics. We exhibit a noise-nonlinearity phase diagram that identifies regimes where shallow or deep layers learn more effectively. We then propose a macroscopic mechanical theory that reproduces the diagram, explaining why some DNNs are lazy and some active, and linking feature learning across layers to generalization.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2024
- DOI:
- 10.48550/arXiv.2407.19353
- arXiv:
- arXiv:2407.19353
- Bibcode:
- 2024arXiv240719353S
- Keywords:
-
- Condensed Matter - Disordered Systems and Neural Networks;
- Condensed Matter - Statistical Mechanics;
- Computer Science - Machine Learning;
- Statistics - Machine Learning