Test Sample Accuracy Scales with Training Sample Density in Neural Networks
Abstract
Intuitively, one would expect accuracy of a trained neural network's prediction on test samples to correlate with how densely the samples are surrounded by seen training samples in representation space. We find that a bound on empirical training error smoothed across linear activation regions scales inversely with training sample density in representation space. Empirically, we verify this bound is a strong predictor of the inaccuracy of the network's prediction on test samples. For unseen test sets, including those with out-of-distribution samples, ranking test samples by their local region's error bound and discarding samples with the highest bounds raises prediction accuracy by up to 20% in absolute terms for image classification datasets, on average over thresholds.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2021
- DOI:
- 10.48550/arXiv.2106.08365
- arXiv:
- arXiv:2106.08365
- Bibcode:
- 2021arXiv210608365J
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- Statistics - Machine Learning
- E-Print:
- CoLLAs 2022 oral