The Intriguing Properties of Model Explanations
Abstract
Linear approximations to the decision boundary of a complex model have become one of the most popular tools for interpreting predictions. In this paper, we study such linear explanations produced either post-hoc by a few recent methods or generated along with predictions with contextual explanation networks (CENs). We focus on two questions: (i) whether linear explanations are always consistent or can be misleading, and (ii) when integrated into the prediction process, whether and how explanations affect the performance of the model. Our analysis sheds more light on certain properties of explanations produced by different methods and suggests that learning models that explain and predict jointly is often advantageous.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2018
- DOI:
- 10.48550/arXiv.1801.09808
- arXiv:
- arXiv:1801.09808
- Bibcode:
- 2018arXiv180109808A
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence
- E-Print:
- Interpretable ML Symposium, NIPS 2017