Attacking the Madry Defense Model with $L_1$-based Adversarial Examples
Abstract
The Madry Lab recently hosted a competition designed to test the robustness of their adversarially trained MNIST model. Attacks were constrained to perturb each pixel of the input image by a scaled maximal $L_\infty$ distortion $\epsilon$ = 0.3. This discourages the use of attacks which are not optimized on the $L_\infty$ distortion metric. Our experimental results demonstrate that by relaxing the $L_\infty$ constraint of the competition, the elastic-net attack to deep neural networks (EAD) can generate transferable adversarial examples which, despite their high average $L_\infty$ distortion, have minimal visual distortion. These results call into question the use of $L_\infty$ as a sole measure for visual distortion, and further demonstrate the power of EAD at generating robust adversarial examples.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2017
- DOI:
- 10.48550/arXiv.1710.10733
- arXiv:
- arXiv:1710.10733
- Bibcode:
- 2017arXiv171010733S
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Cryptography and Security;
- Computer Science - Machine Learning
- E-Print:
- Accepted to ICLR 2018 Workshops