On the Expressive Power of Neural Networks
Abstract
In 1989 George Cybenko proved in a landmark paper that wide shallow neural networks can approximate arbitrary continuous functions on a compact set. This universal approximation theorem sparked a lot of followup research. Shen, Yang and Zhang determined optimal approximation rates for ReLUnetworks in $L^p$norms with $p \in [1,\infty)$. Kidger and Lyons proved a universal approximation theorem for deep narrow ReLUnetworks. Telgarsky gave an example of a deep narrow ReLUnetwork that cannot be approximated by a wide shallow ReLUnetwork unless it has exponentially many neurons. However, there are even more questions that still remain unresolved. Are there any wide shallow ReLUnetworks that cannot be approximated well by deep narrow ReLUnetworks? Is the universal approximation theorem still true for other norms like the Sobolev norm $W^{1,1}$? Do these results hold for activation functions other than ReLU? We will answer all of those questions and more with a framework of two expressive powers. The first one is wellknown and counts the maximal number of linear regions of a function calculated by a ReLUnetwork. We will improve the best known bounds for this expressive power. The second one is entirely new.
 Publication:

arXiv eprints
 Pub Date:
 May 2023
 DOI:
 10.48550/arXiv.2306.00145
 arXiv:
 arXiv:2306.00145
 Bibcode:
 2023arXiv230600145H
 Keywords:

 Mathematics  Classical Analysis and ODEs;
 Computer Science  Artificial Intelligence;
 Computer Science  Machine Learning;
 Statistics  Machine Learning;
 68T01
 EPrint:
 54 pages