PrivacyPreserving Distributed Learning in the Analog Domain
Abstract
We consider the critical problem of distributed learning over data while keeping it private from the computational servers. The stateoftheart approaches to this problem rely on quantizing the data into a finite field, so that the cryptographic approaches for secure multiparty computing can then be employed. These approaches, however, can result in substantial accuracy losses due to fixedpoint representation of the data and computation overflows. To address these critical issues, we propose a novel algorithm to solve the problem when data is in the analog domain, e.g., the field of real/complex numbers. We characterize the privacy of the data from both informationtheoretic and cryptographic perspectives, while establishing a connection between the two notions in the analog domain. More specifically, the wellknown connection between the distinguishing security (DS) and the mutual information security (MIS) metrics is extended from the discrete domain to the continues domain. This is then utilized to bound the amount of information about the data leaked to the servers in our protocol, in terms of the DS metric, using wellknown results on the capacity of singleinput multipleoutput (SIMO) channel with correlated noise. It is shown how the proposed framework can be adopted to do computation tasks when data is represented using floatingpoint numbers. We then show that this leads to a fundamental tradeoff between the privacy level of data and accuracy of the result. As an application, we also show how to train a machine learning model while keeping the data as well as the trained model private. Then numerical results are shown for experiments on the MNIST dataset. Furthermore, experimental advantages are shown comparing to fixedpoint implementations over finite fields.
 Publication:

arXiv eprints
 Pub Date:
 July 2020
 arXiv:
 arXiv:2007.08803
 Bibcode:
 2020arXiv200708803S
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Cryptography and Security;
 Computer Science  Distributed;
 Parallel;
 and Cluster Computing;
 Computer Science  Information Theory;
 Statistics  Machine Learning