Backward Gradient Normalization in Deep Neural Networks

doi:10.48550/arXiv.2106.09475

Backward Gradient Normalization in Deep Neural Networks

We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. These normalization nodes do not affect forward activity propagation, but modify backpropagation equations to permit a well-scaled gradient flow that reaches the deepest network layers without experimenting vanishing or explosion. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm, allowing the update of weights in the deepest layers and improving network accuracy on several experimental conditions.

Publication:

arXiv e-prints

Pub Date:

June 2021

DOI:

10.48550/arXiv.2106.09475

arXiv:

arXiv:2106.09475

Bibcode:

2021arXiv210609475C

Keywords:

Computer Science - Machine Learning

ADS

Backward Gradient Normalization in Deep Neural Networks

Abstract