Deep k-NN Defense against Clean-label Data Poisoning Attacks

doi:10.48550/arXiv.1909.13374

Deep k-NN Defense against Clean-label Data Poisoning Attacks

Targeted clean-label data poisoning is a type of adversarial attack on machine learning systems in which an adversary injects a few correctly-labeled, minimally-perturbed samples into the training data, causing a model to misclassify a particular test sample during inference. Although defenses have been proposed for general poisoning attacks, no reliable defense for clean-label attacks has been demonstrated, despite the attacks' effectiveness and realistic applications. In this work, we propose a simple, yet highly-effective Deep k-NN defense against both feature collision and convex polytope clean-label attacks on the CIFAR-10 dataset. We demonstrate that our proposed strategy is able to detect over 99% of poisoned examples in both attacks and remove them without compromising model performance. Additionally, through ablation studies, we discover simple guidelines for selecting the value of k as well as for implementing the Deep k-NN defense on real-world datasets with class imbalance. Our proposed defense shows that current clean-label poisoning attack strategies can be annulled, and serves as a strong yet simple-to-implement baseline defense to test future clean-label poisoning attacks. Our code is available at https://github.com/neeharperi/DeepKNNDefense

Publication:

arXiv e-prints

Pub Date:

September 2019

DOI:

10.48550/arXiv.1909.13374

arXiv:

arXiv:1909.13374

Bibcode:

2019arXiv190913374P

Keywords:

Computer Science - Machine Learning;
Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Neural and Evolutionary Computing

E-Print:

Accepted to ECCV 2020 Workshop - Adversarial Robustness in the Real World (AROW). First three authors contributed equally

NASA/ADS

Deep k-NN Defense against Clean-label Data Poisoning Attacks

Abstract