Robust Distributed Bayesian Learning with Stragglers via Consensus Monte Carlo
Abstract
This paper studies distributed Bayesian learning in a setting encompassing a central server and multiple workers by focusing on the problem of mitigating the impact of stragglers. The standard one-shot, or embarrassingly parallel, Bayesian learning protocol known as consensus Monte Carlo (CMC) is generalized by proposing two straggler-resilient solutions based on grouping and coding. Two main challenges in designing straggler-resilient algorithms for CMC are the need to estimate the statistics of the workers' outputs across multiple shots, and the joint non-linear post-processing of the outputs of the workers carried out at the server. This is in stark contrast to other distributed settings like gradient coding, which only require the per-shot sum of the workers' outputs. The proposed methods, referred to as Group-based CMC (G-CMC) and Coded CMC (C-CMC), leverage redundant computing at the workers in order to enable the estimation of global posterior samples at the server based on partial outputs from the workers. Simulation results show that C-CMC may outperform G-CMC for a small number of workers, while G-CMC is generally preferable for a larger number of workers.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2021
- DOI:
- 10.48550/arXiv.2112.09794
- arXiv:
- arXiv:2112.09794
- Bibcode:
- 2021arXiv211209794H
- Keywords:
-
- Computer Science - Machine Learning;
- Electrical Engineering and Systems Science - Signal Processing
- E-Print:
- Accepted for publication in IEEE GLOBECOM 2022