Mean Field Theory of Activation Functions in Deep Neural Networks

doi:10.48550/arXiv.1805.08786

Mean Field Theory of Activation Functions in Deep Neural Networks

We present a Statistical Mechanics (SM) model of deep neural networks, connecting the energy-based and the feed forward networks (FFN) approach. We infer that FFN can be understood as performing three basic steps: encoding, representation validation and propagation. From the meanfield solution of the model, we obtain a set of natural activations -- such as Sigmoid, $\tanh$ and ReLu -- together with the state-of-the-art, Swish; this represents the expected information propagating through the network and tends to ReLu in the limit of zero noise.We study the spectrum of the Hessian on an associated classification task, showing that Swish allows for more consistent performances over a wider range of network architectures.

Publication:

arXiv e-prints

Pub Date:

May 2018

DOI:

10.48550/arXiv.1805.08786

arXiv:

arXiv:1805.08786

Bibcode:

2018arXiv180508786M

Keywords:

Computer Science - Machine Learning;
Computer Science - Neural and Evolutionary Computing;
Statistics - Machine Learning

E-Print:

Presented at the ICML 2019 Workshop on Theoretical Physics forDeep Learning

ADS

Mean Field Theory of Activation Functions in Deep Neural Networks

Abstract