A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
Abstract
Stochastic gradient descent (SGD) is a key ingredient in the training of deep neural networks and yet its geometrical significance appears elusive. We study a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from the diffusion matrix. These metrics encode information about the highly nonisotropic gradient noise in SGD. We establish a parallel with General Relativity models, where the role of the electromagnetic field is played by the gradient of the loss function. We compute an example of a two layer network.
 Publication:

Entropy
 Pub Date:
 January 2020
 DOI:
 10.3390/e22010101
 arXiv:
 arXiv:1910.12194
 Bibcode:
 2020Entrp..22..101F
 Keywords:

 Computer Science  Machine Learning;
 General Relativity and Quantum Cosmology;
 Mathematics  Differential Geometry;
 Statistics  Machine Learning
 EPrint:
 doi:10.3390/e22010101