Post-hoc Interpretability for Neural NLP: A Survey

doi:10.48550/arXiv.2108.04840

Post-hoc Interpretability for Neural NLP: A Survey

Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations after a model is learned and are generally model-agnostic. This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans, it discusses each method in-depth, and how they are validated, as the latter is often a common concern.

Publication:

arXiv e-prints

Pub Date:

August 2021

DOI:

10.48550/arXiv.2108.04840

arXiv:

arXiv:2108.04840

Bibcode:

2021arXiv210804840M

Keywords:

Computer Science - Computation and Language;
Computer Science - Machine Learning;
Computer Science - Neural and Evolutionary Computing

E-Print:

ACM Comput. Surv. 55, 8, Article 155 (December 2022)

NASA/ADS

Post-hoc Interpretability for Neural NLP: A Survey

Abstract