Multi-Differential Fairness Auditor for Black Box Classifiers
Abstract
Machine learning algorithms are increasingly involved in sensitive decision-making process with adversarial implications on individuals. This paper presents mdfa, an approach that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness. Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier and demonstrate that individuals identified as African-American with little criminal history are three-times more likely to be considered at high risk of violent recidivism than similar individuals but not African-American.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2019
- DOI:
- 10.48550/arXiv.1903.07609
- arXiv:
- arXiv:1903.07609
- Bibcode:
- 2019arXiv190307609G
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning