Switched linear projections for neural network interpretability

doi:10.48550/arXiv.1909.11275

Switched linear projections for neural network interpretability

We introduce switched linear projections for expressing the activity of a neuron in a deep neural network in terms of a single linear projection in the input space. The method works by isolating the active subnetwork, a series of linear transformations, that determine the entire computation of the network for a given input instance. With these projections we can decompose activity in any hidden layer into patterns detected in a given input instance. We also propose that in ReLU networks it is instructive and meaningful to examine patterns that deactivate the neurons in a hidden layer, something that is implicitly ignored by the existing interpretability methods tracking solely the active aspect of the network's computation.

Publication:

arXiv e-prints

Pub Date:

September 2019

DOI:

10.48550/arXiv.1909.11275

arXiv:

arXiv:1909.11275

Bibcode:

2019arXiv190911275S

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

NASA/ADS

Switched linear projections for neural network interpretability

Abstract