Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

doi:10.48550/arXiv.1802.07384

Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user when the network's output is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfaction problems. The technique is evaluated on three neural network models: one predicting whether an applicant will pay a mortgage, one predicting whether a first-order theorem can be proved efficiently by a solver using certain heuristics, and the final one judging whether a drawing is an accurate rendition of a canonical drawing of a cat.

Publication:

arXiv e-prints

Pub Date:

February 2018

DOI:

10.48550/arXiv.1802.07384

arXiv:

arXiv:1802.07384

Bibcode:

2018arXiv180207384Z

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Statistics - Machine Learning;
68T01

E-Print:

24 pages

NASA/ADS

Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

Abstract