There are No Bit Parts for Sign Bits in Black-Box Attacks

doi:10.48550/arXiv.1902.06894

There are No Bit Parts for Sign Bits in Black-Box Attacks

We present a black-box adversarial attack algorithm which sets new state-of-the-art model evasion rates for query efficiency in the $\ell_\infty$ and $\ell_2$ metrics, where only loss-oracle access to the model is available. On two public black-box attack challenges, the algorithm achieves the highest evasion rate, surpassing all of the submitted attacks. Similar performance is observed on a model that is secure against substitute-model attacks. For standard models trained on the MNIST, CIFAR10, and IMAGENET datasets, averaged over the datasets and metrics, the algorithm is 3.8x less failure-prone, and spends in total 2.5x fewer queries than the current state-of-the-art attacks combined given a budget of 10, 000 queries per attack attempt. Notably, it requires no hyperparameter tuning or any data/time-dependent prior. The algorithm exploits a new approach, namely sign-based rather than magnitude-based gradient estimation. This shifts the estimation from continuous to binary black-box optimization. With three properties of the directional derivative, we examine three approaches to adversarial attacks. This yields a superior algorithm breaking a standard MNIST model using just 12 queries on average!

Publication:

arXiv e-prints

Pub Date:

February 2019

DOI:

10.48550/arXiv.1902.06894

arXiv:

arXiv:1902.06894

Bibcode:

2019arXiv190206894A

Keywords:

Computer Science - Machine Learning;
Computer Science - Cryptography and Security;
Statistics - Machine Learning

E-Print:

Added results of Ensemble Adv Learning. ICML template

NASA/ADS

There are No Bit Parts for Sign Bits in Black-Box Attacks

Abstract