Decentralized Entropic Optimal Transport for Distributed Distribution Comparison
Abstract
Distributed distribution comparison aims to measure the distance between the distributions whose data are scattered across different agents in a distributed system and cannot even be shared directly among the agents. In this study, we propose a novel decentralized entropic optimal transport (DEOT) method, which provides a communication-efficient and privacy-preserving solution to this problem with theoretical guarantees. In particular, we design a mini-batch randomized block-coordinate descent (MRBCD) scheme to optimize the DEOT distance in its dual form. The dual variables are scattered across different agents and updated locally and iteratively with limited communications among partial agents. The kernel matrix involved in the gradients of the dual variables is estimated by a decentralized kernel approximation method, in which each agent only needs to approximate and store a sub-kernel matrix by one-shot communication and without sharing raw data. Besides computing entropic Wasserstein distance, we show that the proposed MRBCD scheme and kernel approximation method also apply to entropic Gromov-Wasserstein distance. We analyze our method's communication complexity and, under mild assumptions, provide a theoretical bound for the approximation error caused by the convergence error, the estimated kernel, and the mismatch between the storage and communication protocols. In addition, we discuss the trade-off between the precision of the EOT distance and the strength of privacy protection when implementing our method. Experiments on synthetic data and real-world distributed domain adaptation tasks demonstrate the effectiveness of our method.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2023
- DOI:
- arXiv:
- arXiv:2301.12065
- Bibcode:
- 2023arXiv230112065W
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning