Learning Universal Adversarial Perturbations with Generative Models

doi:10.48550/arXiv.1708.05207

Learning Universal Adversarial Perturbations with Generative Models

Neural networks are known to be vulnerable to adversarial examples, inputs that have been intentionally perturbed to remain visually similar to the source input, but cause a misclassification. It was recently shown that given a dataset and classifier, there exists so called universal adversarial perturbations, a single perturbation that causes a misclassification when applied to any input. In this work, we introduce universal adversarial networks, a generative network that is capable of fooling a target classifier when it's generated output is added to a clean sample from a dataset. We show that this technique improves on known universal adversarial attacks.

Publication:

arXiv e-prints

Pub Date:

August 2017

DOI:

10.48550/arXiv.1708.05207

arXiv:

arXiv:1708.05207

Bibcode:

2017arXiv170805207H

Keywords:

Computer Science - Cryptography and Security;
Computer Science - Machine Learning;
Statistics - Machine Learning

NASA/ADS

Learning Universal Adversarial Perturbations with Generative Models

Abstract