Multi-Differential Fairness Auditor for Black Box Classifiers

doi:10.48550/arXiv.1903.07609

Multi-Differential Fairness Auditor for Black Box Classifiers

Machine learning algorithms are increasingly involved in sensitive decision-making process with adversarial implications on individuals. This paper presents mdfa, an approach that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness. Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier and demonstrate that individuals identified as African-American with little criminal history are three-times more likely to be considered at high risk of violent recidivism than similar individuals but not African-American.

Publication:

arXiv e-prints

Pub Date:

March 2019

DOI:

10.48550/arXiv.1903.07609

arXiv:

arXiv:1903.07609

Bibcode:

2019arXiv190307609G

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

NASA/ADS

Multi-Differential Fairness Auditor for Black Box Classifiers

Abstract