Exact Solutions of a Deep Linear Network
Abstract
This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that zero is a special point in deep neural network architecture. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer, qualitatively different from a network with only $1$ hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2022
- DOI:
- 10.48550/arXiv.2202.04777
- arXiv:
- arXiv:2202.04777
- Bibcode:
- 2022arXiv220204777Z
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning
- E-Print:
- NeurIPS 2022