Sound Explanation for Trustworthy Machine Learning
Abstract
We take a formal approach to the explainability problem of machine learning systems. We argue against the practice of interpreting black-box models via attributing scores to input components due to inherently conflicting goals of attribution-based interpretation. We prove that no attribution algorithm satisfies specificity, additivity, completeness, and baseline invariance. We then formalize the concept, sound explanation, that has been informally adopted in prior work. A sound explanation entails providing sufficient information to causally explain the predictions made by a system. Finally, we present the application of feature selection as a sound explanation for cancer prediction models to cultivate trust among clinicians.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2023
- DOI:
- arXiv:
- arXiv:2306.06134
- Bibcode:
- 2023arXiv230606134J
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence