Better the Devil you Know: An Analysis of Evasion Attacks using Out-of-Distribution Adversarial Examples
Abstract
A large body of recent work has investigated the phenomenon of evasion attacks using adversarial examples for deep learning systems, where the addition of norm-bounded perturbations to the test inputs leads to incorrect output classification. Previous work has investigated this phenomenon in closed-world systems where training and test inputs follow a pre-specified distribution. However, real-world implementations of deep learning applications, such as autonomous driving and content classification are likely to operate in the open-world environment. In this paper, we demonstrate the success of open-world evasion attacks, where adversarial examples are generated from out-of-distribution inputs (OOD adversarial examples). In our study, we use 11 state-of-the-art neural network models trained on 3 image datasets of varying complexity. We first demonstrate that state-of-the-art detectors for out-of-distribution data are not robust against OOD adversarial examples. We then consider 5 known defenses for adversarial examples, including state-of-the-art robust training methods, and show that against these defenses, OOD adversarial examples can achieve up to 4$\times$ higher target success rates compared to adversarial examples generated from in-distribution data. We also take a quantitative look at how open-world evasion attacks may affect real-world systems. Finally, we present the first steps towards a robust open-world machine learning system.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2019
- DOI:
- 10.48550/arXiv.1905.01726
- arXiv:
- arXiv:1905.01726
- Bibcode:
- 2019arXiv190501726S
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Cryptography and Security;
- Computer Science - Computer Vision and Pattern Recognition;
- Statistics - Machine Learning
- E-Print:
- 18 pages, 5 figures, 9 tables