ReBoot: Distributed statistical learning via refitting Bootstrap samples
Abstract
In this paper, we study a oneshot distributed learning algorithm via refitting Bootstrap samples, which we refer to as ReBoot. Given the local models that are fit on multiple independent subsamples, ReBoot refits a new model on the union of the Bootstrap samples drawn from these local models. The whole procedure requires only one round of communication of model parameters. Theoretically, we analyze the statistical rate of ReBoot for generalized linear models (GLM) and noisy phase retrieval, which represent convex and nonconvex problems respectively. In both cases, ReBoot provably achieves the fullsample statistical rate whenever the subsample size is not too small. In particular, we show that the systematic bias of ReBoot, the error that is independent of the number of subsamples, is $O(n ^ {2})$ in GLM, where n is the subsample size. This rate is sharper than that of model parameter averaging and its variants, implying the higher tolerance of ReBoot with respect to data splits to maintain the fullsample rate. Simulation study exhibits the statistical advantage of ReBoot over competing methods including averaging and CSL (Communicationefficient Surrogate Likelihood) with up to two rounds of gradient communication. Finally, we propose FedReBoot, an iterative version of ReBoot, to aggregate convolutional neural networks for image classification, which exhibits substantial superiority over FedAve within early rounds of communication.
 Publication:

arXiv eprints
 Pub Date:
 July 2022
 arXiv:
 arXiv:2207.09098
 Bibcode:
 2022arXiv220709098W
 Keywords:

 Statistics  Methodology;
 Mathematics  Statistics Theory