Non-Asymptotic Delay Bounds for Multi-Server Systems with Synchronization Constraints
Abstract
Multi-server systems have received increasing attention with important implementations such as Google MapReduce, Hadoop, and Spark. Common to these systems are a fork operation, where jobs are first divided into tasks that are processed in parallel, and a later join operation, where completed tasks wait until the results of all tasks of a job can be combined and the job leaves the system. The synchronization constraint of the join operation makes the analysis of fork-join systems challenging and few explicit results are known. In this work, we model fork-join systems using a max-plus server model that enables us to derive statistical bounds on waiting and sojourn times for general arrival and service time processes. We contribute end-to-end delay bounds for multi-stage fork-join networks that grow in $\mathcal{O}(h \ln k)$ for $h$ fork-join stages, each with $k$ parallel servers. We perform a detailed comparison of different multi-server configurations and highlight their pros and cons. We also include an analysis of single-queue fork-join systems that are non-idling and achieve a fundamental performance gain, and compare these results to both simulation and a live Spark system.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2016
- DOI:
- 10.48550/arXiv.1610.06309
- arXiv:
- arXiv:1610.06309
- Bibcode:
- 2016arXiv161006309F
- Keywords:
-
- Computer Science - Performance;
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing
- E-Print:
- arXiv admin note: text overlap with arXiv:1512.08354