Revisiting Ensembles in an Adversarial Context: Improving Natural Accuracy
Abstract
A necessary characteristic for the deployment of deep learning models in real world applications is resistance to small adversarial perturbations while maintaining accuracy on non-malicious inputs. While robust training provides models that exhibit better adversarial accuracy than standard models, there is still a significant gap in natural accuracy between robust and non-robust models which we aim to bridge. We consider a number of ensemble methods designed to mitigate this performance difference. Our key insight is that model trained to withstand small attacks, when ensembled, can often withstand significantly larger attacks, and this concept can in turn be leveraged to optimize natural accuracy. We consider two schemes, one that combines predictions from several randomly initialized robust models, and the other that fuses features from robust and standard models.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2020
- DOI:
- 10.48550/arXiv.2002.11572
- arXiv:
- arXiv:2002.11572
- Bibcode:
- 2020arXiv200211572S
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Cryptography and Security;
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Machine Learning
- E-Print:
- 5 pages, accepted to ICLR 2020 Workshop on Towards Trustworthy ML: Rethinking Security and Privacy for ML