On the Expressive Power of Neural Networks
Abstract
In 1989 George Cybenko proved in a landmark paper that wide shallow neural networks can approximate arbitrary continuous functions on a compact set. This universal approximation theorem sparked a lot of follow-up research. Shen, Yang and Zhang determined optimal approximation rates for ReLU-networks in $L^p$-norms with $p \in [1,\infty)$. Kidger and Lyons proved a universal approximation theorem for deep narrow ReLU-networks. Telgarsky gave an example of a deep narrow ReLU-network that cannot be approximated by a wide shallow ReLU-network unless it has exponentially many neurons. However, there are even more questions that still remain unresolved. Are there any wide shallow ReLU-networks that cannot be approximated well by deep narrow ReLU-networks? Is the universal approximation theorem still true for other norms like the Sobolev norm $W^{1,1}$? Do these results hold for activation functions other than ReLU? We will answer all of those questions and more with a framework of two expressive powers. The first one is well-known and counts the maximal number of linear regions of a function calculated by a ReLU-network. We will improve the best known bounds for this expressive power. The second one is entirely new.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2023
- DOI:
- 10.48550/arXiv.2306.00145
- arXiv:
- arXiv:2306.00145
- Bibcode:
- 2023arXiv230600145H
- Keywords:
-
- Mathematics - Classical Analysis and ODEs;
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning;
- Statistics - Machine Learning;
- 68T01
- E-Print:
- 54 pages