Gradient-Based Adversarial and Out-of-Distribution Detection
Abstract
We propose to utilize gradients for detecting adversarial and out-of-distribution samples. We introduce confounding labels -- labels that differ from normal labels seen during training -- in gradient generation to probe the effective expressivity of neural networks. Gradients depict the amount of change required for a model to properly represent given inputs, providing insight into the representational power of the model established by network architectural properties as well as training data. By introducing a label of different design, we remove the dependency on ground truth labels for gradient generation during inference. We show that our gradient-based approach allows for capturing the anomaly in inputs based on the effective expressivity of the models with no hyperparameter tuning or additional processing, and outperforms state-of-the-art methods for adversarial and out-of-distribution detection.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2022
- DOI:
- 10.48550/arXiv.2206.08255
- arXiv:
- arXiv:2206.08255
- Bibcode:
- 2022arXiv220608255L
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- International Conference on Machine Learning (ICML) Workshop on New Frontiers in Adversarial Machine Learning, July 2022