Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

doi:10.48550/arXiv.1609.06870

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neuronal Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, we present simple but fixed theoretic constraints, preventing effective scaling of DNN training beyond only a few dozen nodes. This leads to poor scalability of DNN training in most practical scenarios.

Publication:

arXiv e-prints

Pub Date:

September 2016

DOI:

10.48550/arXiv.1609.06870

arXiv:

arXiv:1609.06870

Bibcode:

2016arXiv160906870K

Keywords:

Computer Science - Computer Vision and Pattern Recognition

NASA/ADS

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

Abstract