Backdoors in Neural Models of Source Code

doi:10.48550/arXiv.2006.06841

Backdoors in Neural Models of Source Code

Deep neural networks are vulnerable to a range of adversaries. A particularly pernicious class of vulnerabilities are backdoors, where model predictions diverge in the presence of subtle triggers in inputs. An attacker can implant a backdoor by poisoning the training data to yield a desired target prediction on triggered inputs. We study backdoors in the context of deep-learning for source code. (1) We define a range of backdoor classes for source-code tasks and show how to poison a dataset to install such backdoors. (2) We adapt and improve recent algorithms from robust statistics for our setting, showing that backdoors leave a spectral signature in the learned representation of source code, thus enabling detection of poisoned data. (3) We conduct a thorough evaluation on different architectures and languages, showing the ease of injecting backdoors and our ability to eliminate them.

Publication:

arXiv e-prints

Pub Date:

June 2020

DOI:

10.48550/arXiv.2006.06841

arXiv:

arXiv:2006.06841

Bibcode:

2020arXiv200606841R

Keywords:

Computer Science - Machine Learning;
Computer Science - Cryptography and Security;
Statistics - Machine Learning

E-Print:

doi:10.1109/ICPR56361.2022.9956690

NASA/ADS

Backdoors in Neural Models of Source Code

Abstract