Exact Stochastic Second Order Deep Learning

doi:10.48550/arXiv.2104.03804

Exact Stochastic Second Order Deep Learning

Optimization in Deep Learning is mainly dominated by first-order methods which are built around the central concept of backpropagation. Second-order optimization methods, which take into account the second-order derivatives are far less used despite superior theoretical properties. This inadequacy of second-order methods stems from its exorbitant computational cost, poor performance, and the ineluctable non-convex nature of Deep Learning. Several attempts were made to resolve the inadequacy of second-order optimization without reaching a cost-effective solution, much less an exact solution. In this work, we show that this long-standing problem in Deep Learning could be solved in the stochastic case, given a suitable regularization of the neural network. Interestingly, we provide an expression of the stochastic Hessian and its exact eigenvalues. We provide a closed-form formula for the exact stochastic second-order Newton direction, we solve the non-convexity issue and adjust our exact solution to favor flat minima through regularization and spectral adjustment. We test our exact stochastic second-order method on popular datasets and reveal its adequacy for Deep Learning.

Publication:

arXiv e-prints

Pub Date:

April 2021

DOI:

10.48550/arXiv.2104.03804

arXiv:

arXiv:2104.03804

Bibcode:

2021arXiv210403804M

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

ADS

Exact Stochastic Second Order Deep Learning

Abstract