Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders

doi:10.48550/arXiv.1905.10729

Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders

Machine learning models are vulnerable to adversarial examples. Iterative adversarial training has shown promising results against strong white-box attacks. However, adversarial training is very expensive, and every time a model needs to be protected, such expensive training scheme needs to be performed. In this paper, we propose to apply iterative adversarial training scheme to an external auto-encoder, which once trained can be used to protect other models directly. We empirically show that our model outperforms other purifying-based methods against white-box attacks, and transfers well to directly protect other base models with different architectures.

Publication:

arXiv e-prints

Pub Date:

May 2019

DOI:

10.48550/arXiv.1905.10729

arXiv:

arXiv:1905.10729

Bibcode:

2019arXiv190510729L

Keywords:

Computer Science - Machine Learning;
Computer Science - Cryptography and Security;
Computer Science - Computer Vision and Pattern Recognition;
Statistics - Machine Learning

NASA/ADS

Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders

Abstract