Device Scheduling and Update Aggregation Policies for Asynchronous Federated Learning
Abstract
Federated Learning (FL) is a newly emerged decentralized machine learning (ML) framework that combines on-device local training with server-based model synchronization to train a centralized ML model over distributed nodes. In this paper, we propose an asynchronous FL framework with periodic aggregation to eliminate the straggler issue in FL systems. For the proposed model, we investigate several device scheduling and update aggregation policies and compare their performances when the devices have heterogeneous computation capabilities and training data distributions. From the simulation results, we conclude that the scheduling and aggregation design for asynchronous FL can be rather different from the synchronous case. For example, a norm-based significance-aware scheduling policy might not be efficient in an asynchronous FL setting, and an appropriate "age-aware" weighting design for the model aggregation can greatly improve the learning performance of such systems.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2021
- DOI:
- 10.48550/arXiv.2107.11415
- arXiv:
- arXiv:2107.11415
- Bibcode:
- 2021arXiv210711415H
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing;
- Computer Science - Information Theory;
- Electrical Engineering and Systems Science - Signal Processing
- E-Print:
- 5 pages, 4 figures, accepted in 22nd IEEE international workshop on signal processing advances in wireless communications (SPAWC 2021)