Deep frequency principle towards understanding why deeper learning is faster
Abstract
Understanding the effect of depth in deep learning is a critical problem. In this work, we utilize the Fourier analysis to empirically provide a promising mechanism to understand why feedforward deeper learning is faster. To this end, we separate a deep neural network, trained by normal stochastic gradient descent, into two parts during analysis, i.e., a precondition component and a learning component, in which the output of the precondition one is the input of the learning one. We use a filtering method to characterize the frequency distribution of a highdimensional function. Based on experiments of deep networks and real dataset, we propose a deep frequency principle, that is, the effective target function for a deeper hidden layer biases towards lower frequency during the training. Therefore, the learning component effectively learns a lower frequency function if the precondition component has more layers. Due to the wellstudied frequency principle, i.e., deep neural networks learn lower frequency functions faster, the deep frequency principle provides a reasonable explanation to why deeper learning is faster. We believe these empirical studies would be valuable for future theoretical studies of the effect of depth in deep learning.
 Publication:

arXiv eprints
 Pub Date:
 July 2020
 arXiv:
 arXiv:2007.14313
 Bibcode:
 2020arXiv200714313X
 Keywords:

 Computer Science  Machine Learning;
 Statistics  Machine Learning
 EPrint:
 AAAI2021