Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective
Abstract
We prove that the gradient descent training of a twolayer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{4/(d2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly highdimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent training with general Lipschitz target functions becomes slower and slower as the dimension increases, but converges at approximately the same rate in all dimensions when the target function lies in the natural function space for twolayer ReLU networks.
 Publication:

arXiv eprints
 Pub Date:
 May 2020
 arXiv:
 arXiv:2005.10815
 Bibcode:
 2020arXiv200510815W
 Keywords:

 Computer Science  Machine Learning;
 Mathematics  Analysis of PDEs;
 Statistics  Machine Learning;
 68T07;
 49Q22;
 68W25
 EPrint:
 5 figures