Randomized Block-Diagonal Preconditioning for Parallel Learning

doi:10.48550/arXiv.2006.13591

Randomized Block-Diagonal Preconditioning for Parallel Learning

We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form. Such a structural constraint comes with the advantage that the update computation is block-separable and can be parallelized across multiple independent tasks. Our main contribution is to demonstrate that the convergence of these methods can significantly be improved by a randomization technique which corresponds to repartitioning coordinates across tasks during the optimization procedure. We provide a theoretical analysis that accurately characterizes the expected convergence gains of repartitioning and validate our findings empirically on various traditional machine learning tasks. From an implementation perspective, block-separable models are well suited for parallelization and, when shared memory is available, randomization can be implemented on top of existing methods very efficiently to improve convergence.

Publication:

arXiv e-prints

Pub Date:

June 2020

DOI:

10.48550/arXiv.2006.13591

arXiv:

arXiv:2006.13591

Bibcode:

2020arXiv200613591M

Keywords:

Computer Science - Machine Learning;
Computer Science - Distributed;
Parallel;
and Cluster Computing;
Statistics - Machine Learning

E-Print:

improvement in Theorem 3 compared to ICML 2020 version

NASA/ADS

Randomized Block-Diagonal Preconditioning for Parallel Learning

Abstract