Explainability in Neural Networks for Natural Language Processing Tasks
Abstract
Neural networks are widely regarded as black-box models, creating significant challenges in understanding their inner workings, especially in natural language processing (NLP) applications. To address this opacity, model explanation techniques like Local Interpretable Model-Agnostic Explanations (LIME) have emerged as essential tools for providing insights into the behavior of these complex systems. This study leverages LIME to interpret a multi-layer perceptron (MLP) neural network trained on a text classification task. By analyzing the contribution of individual features to model predictions, the LIME approach enhances interpretability and supports informed decision-making. Despite its effectiveness in offering localized explanations, LIME has limitations in capturing global patterns and feature interactions. This research highlights the strengths and shortcomings of LIME and proposes directions for future work to achieve more comprehensive interpretability in neural NLP models.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.18036
- Bibcode:
- 2024arXiv241218036M
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Artificial Intelligence