Privacy via the Johnson-Lindenstrauss Transform

doi:10.48550/arXiv.1204.2606

Privacy via the Johnson-Lindenstrauss Transform

Suppose that party A collects private information about its users, where each user's data is represented as a bit vector. Suppose that party B has a proprietary data mining algorithm that requires estimating the distance between users, such as clustering or nearest neighbors. We ask if it is possible for party A to publish some information about each user so that B can estimate the distance between users without being able to infer any private bit of a user. Our method involves projecting each user's representation into a random, lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then adding Gaussian noise to each entry of the lower-dimensional representation. We show that the method preserves differential privacy---where the more privacy is desired, the larger the variance of the Gaussian noise. Further, we show how to approximate the true distances between users via only the lower-dimensional, perturbed data. Finally, we consider other perturbation methods such as randomized response and draw comparisons to sketch-based methods. While the goal of releasing user-specific data to third parties is more broad than preserving distances, this work shows that distance computations with privacy is an achievable goal.

Publication:

arXiv e-prints

Pub Date:

April 2012

DOI:

10.48550/arXiv.1204.2606

arXiv:

arXiv:1204.2606

Bibcode:

2012arXiv1204.2606K

Keywords:

Computer Science - Data Structures and Algorithms;
Computer Science - Computers and Society;
Computer Science - Databases;
Computer Science - Social and Information Networks;
K.4.1;
F.2;
H.3.5;
G.3;
I.5.3;
H.3.3;
H.2.8;
E.1;
G.1.3

E-Print:

24 pages

NASA/ADS

Privacy via the Johnson-Lindenstrauss Transform

Abstract