Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid
Abstract
A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x speedup over traditional layer-wise model parallelism techniques using the same number of compute units.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2020
- DOI:
- 10.48550/arXiv.2007.07336
- arXiv:
- arXiv:2007.07336
- Bibcode:
- 2020arXiv200707336K
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing;
- Computer Science - Performance;
- Statistics - Machine Learning
- E-Print:
- 7 pages, 6 figures, 27 citations. Accepted to 2020 IEEE High Performance Extreme Computing Conference - Outstanding Paper Award