There are No Bit Parts for Sign Bits in Black-Box Attacks
Abstract
We present a black-box adversarial attack algorithm which sets new state-of-the-art model evasion rates for query efficiency in the $\ell_\infty$ and $\ell_2$ metrics, where only loss-oracle access to the model is available. On two public black-box attack challenges, the algorithm achieves the highest evasion rate, surpassing all of the submitted attacks. Similar performance is observed on a model that is secure against substitute-model attacks. For standard models trained on the MNIST, CIFAR10, and IMAGENET datasets, averaged over the datasets and metrics, the algorithm is 3.8x less failure-prone, and spends in total 2.5x fewer queries than the current state-of-the-art attacks combined given a budget of 10, 000 queries per attack attempt. Notably, it requires no hyperparameter tuning or any data/time-dependent prior. The algorithm exploits a new approach, namely sign-based rather than magnitude-based gradient estimation. This shifts the estimation from continuous to binary black-box optimization. With three properties of the directional derivative, we examine three approaches to adversarial attacks. This yields a superior algorithm breaking a standard MNIST model using just 12 queries on average!
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2019
- DOI:
- 10.48550/arXiv.1902.06894
- arXiv:
- arXiv:1902.06894
- Bibcode:
- 2019arXiv190206894A
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Cryptography and Security;
- Statistics - Machine Learning
- E-Print:
- Added results of Ensemble Adv Learning. ICML template