Switched linear projections for neural network interpretability
Abstract
We introduce switched linear projections for expressing the activity of a neuron in a deep neural network in terms of a single linear projection in the input space. The method works by isolating the active subnetwork, a series of linear transformations, that determine the entire computation of the network for a given input instance. With these projections we can decompose activity in any hidden layer into patterns detected in a given input instance. We also propose that in ReLU networks it is instructive and meaningful to examine patterns that deactivate the neurons in a hidden layer, something that is implicitly ignored by the existing interpretability methods tracking solely the active aspect of the network's computation.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2019
- DOI:
- 10.48550/arXiv.1909.11275
- arXiv:
- arXiv:1909.11275
- Bibcode:
- 2019arXiv190911275S
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning