Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation
Abstract
Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose Distill-and-Compare, a model distillation and comparison approach to audit such models. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by black-box models. We compare the student model trained with distillation to a second un-distilled transparent model trained on ground-truth outcomes, and use differences between the two models to gain insight into the black-box model. Our approach can be applied in a realistic setting, without probing the black-box model API. We demonstrate the approach on four public data sets: COMPAS, Stop-and-Frisk, Chicago Police, and Lending Club. We also propose a statistical test to determine if a data set is missing key features used to train the black-box model. Our test finds that the ProPublica data is likely missing key feature(s) used in COMPAS.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2017
- DOI:
- 10.48550/arXiv.1710.06169
- arXiv:
- arXiv:1710.06169
- Bibcode:
- 2017arXiv171006169T
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning
- E-Print:
- Camera-ready version for AAAI/ACM AIES 2018. Data and pseudocode at https://github.com/shftan/auditblackbox. Previously titled "Detecting Bias in Black-Box Models Using Transparent Model Distillation". A short version was presented at NIPS 2017 Symposium on Interpretable Machine Learning