Exploitation of Stragglers in Coded Computation
Abstract
In cloud computing systems slow processing nodes, often referred to as "stragglers", can significantly extend the computation time. Recent results have shown that error correction coding can be used to reduce the effect of stragglers. In this work we introduce a scheme that, in addition to using error correction to distribute mixed jobs across nodes, is also able to exploit the work completed by all nodes, including stragglers. We first consider vector-matrix multiplication and apply maximum distance separable (MDS) codes to small blocks of sub-matrices. The worker nodes process blocks sequentially, working block-by-block, transmitting partial per-block results to the master as they are completed. Sub-blocking allows a more continuous completion process, which thereby allows us to exploit the work of a much broader spectrum of processors and reduces computation time. We then apply this technique to matrix-matrix multiplication using product code. In this case, we show that the order of computing sub-tasks is a new degree of design freedom that can be exploited to reduce computation time further. We propose a novel approach to analyze the finishing time, which is different from typical order statistics. Simulation results show that the expected computation time decreases by a factor of at least two in compared to previous methods.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2018
- DOI:
- 10.48550/arXiv.1806.10253
- arXiv:
- arXiv:1806.10253
- Bibcode:
- 2018arXiv180610253K
- Keywords:
-
- Computer Science - Information Theory
- E-Print:
- IEEE Int. Symp. Inf. Theory (ISIT) 2018