Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

doi:10.48550/arXiv.1809.08587

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

Shamir, Ohad

We study the dynamics of gradient descent on objective functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar parameters $w_1,\ldots,w_k$), which arise in the context of training depth-$k$ linear neural networks. We prove that for standard random initializations, and under mild assumptions on $f$, the number of iterations required for convergence scales exponentially with the depth $k$. We also show empirically that this phenomenon can occur in higher dimensions, where each $w_i$ is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where $k$ is large.

Publication:

arXiv e-prints

Pub Date:

September 2018

DOI:

10.48550/arXiv.1809.08587

arXiv:

arXiv:1809.08587

Bibcode:

2018arXiv180908587S

Keywords:

Computer Science - Machine Learning;
Computer Science - Neural and Evolutionary Computing;
Mathematics - Optimization and Control;
Statistics - Machine Learning

E-Print:

Comparison to previous version: Fixed a bug in lemma 1 part 3 (does not affect any other part of the paper)

NASA/ADS

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

Abstract