Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks
Abstract
We investigate how sparse neural activity affects the generalization performance of a deep Bayesian neural network at the large width limit. To this end, we derive a neural network Gaussian Process (NNGP) kernel with rectified linear unit (ReLU) activation and a predetermined fraction of active neurons. Using the NNGP kernel, we observe that the sparser networks outperform the non-sparse networks at shallow depths on a variety of datasets. We validate this observation by extending the existing theory on the generalization error of kernel-ridge regression.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2023
- DOI:
- 10.48550/arXiv.2305.10550
- arXiv:
- arXiv:2305.10550
- Bibcode:
- 2023arXiv230510550C
- Keywords:
-
- Computer Science - Machine Learning;
- Condensed Matter - Disordered Systems and Neural Networks;
- Quantitative Biology - Neurons and Cognition