Federated Principal Component Analysis
Abstract
We present a federated, asynchronous, and $(\varepsilon, \delta)$differentially private algorithm for PCA in the memorylimited setting. Our algorithm incrementally computes local model updates using a streaming procedure and adaptively estimates its $r$ leading principal components when only $\mathcal{O}(dr)$ memory is available with $d$ being the dimensionality of the data. We guarantee differential privacy via an inputperturbation scheme in which the covariance matrix of a dataset $\mathbf{X} \in \mathbb{R}^{d \times n}$ is perturbed with a nonsymmetric random Gaussian matrix with variance in $\mathcal{O}\left(\left(\frac{d}{n}\right)^2 \log d \right)$, thus improving upon the stateoftheart. Furthermore, contrary to previous federated or distributed algorithms for PCA, our algorithm is also invariant to permutations in the incoming data, which provides robustness against straggler or failed nodes. Numerical simulations show that, while using limitedmemory, our algorithm exhibits performance that closely matches or outperforms traditional nonfederated algorithms, and in the absence of communication latency, it exhibits attractive horizontal scalability.
 Publication:

arXiv eprints
 Pub Date:
 July 2019
 arXiv:
 arXiv:1907.08059
 Bibcode:
 2019arXiv190708059G
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Information Theory;
 Statistics  Machine Learning
 EPrint:
 36 pages, 13 figures, 1 table. Accepted for publication at Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada