In Defence of Post-hoc Explainability
Abstract
The widespread adoption of machine learning in scientific research has created a fundamental tension between model opacity and scientific understanding. Whilst some advocate for intrinsically interpretable models, we introduce Computational Interpretabilism (CI) as a philosophical framework for post-hoc interpretability in scientific AI. Drawing parallels with human expertise, where post-hoc rationalisation coexists with reliable performance, CI establishes that scientific knowledge emerges through structured model interpretation when properly bounded by empirical validation. Through mediated understanding and bounded factivity, we demonstrate how post-hoc methods achieve epistemically justified insights without requiring complete mechanical transparency, resolving tensions between model complexity and scientific comprehension.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.17883
- Bibcode:
- 2024arXiv241217883O
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence
- E-Print:
- Presented at the Interpretable AI: Past, Present, and Future Workshop at NeurIPS 2024 (non-archival)