Certified Data Removal from Machine Learning Models

doi:10.48550/arXiv.1911.03030

Certified Data Removal from Machine Learning Models

Good data stewardship requires removal of data at the request of the data's owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to "remove" data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.

Publication:

arXiv e-prints

Pub Date:

November 2019

DOI:

10.48550/arXiv.1911.03030

arXiv:

arXiv:1911.03030

Bibcode:

2019arXiv191103030G

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

Accepted to ICML 2020

NASA/ADS

Certified Data Removal from Machine Learning Models

Abstract