A Vulnerability of Attribution Methods Using Pre-Softmax Scores
Abstract
We discuss a vulnerability involving a category of attribution methods used to provide explanations for the outputs of convolutional neural networks working as classifiers. It is known that this type of networks are vulnerable to adversarial attacks, in which imperceptible perturbations of the input may alter the outputs of the model. In contrast, here we focus on effects that small modifications in the model may cause on the attribution method without altering the model outputs.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2023
- DOI:
- 10.48550/arXiv.2307.03305
- arXiv:
- arXiv:2307.03305
- Bibcode:
- 2023arXiv230703305L
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- 68T07;
- I.2.m
- E-Print:
- 7 pages, 5 figures