Templating Shuffles
Abstract
Cloud data centers are evolving fast. At the same time, today's large-scale data analytics applications require non-trivial performance tuning that is often specific to the applications, workloads, and data center infrastructure. We propose TeShu, which makes network shuffling an extensible unified service layer common to all data analytics. Since an optimal shuffle depends on a myriad of factors, TeShu introduces parameterized shuffle templates, instantiated by accurate and efficient sampling that enables TeShu to dynamically adapt to different application workloads and data center layouts. Our preliminary experimental results show that TeShu efficiently enables shuffling optimizations that improve performance and adapt to a variety of data center network scenarios.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2022
- DOI:
- arXiv:
- arXiv:2207.10746
- Bibcode:
- 2022arXiv220710746Z
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing;
- 68M14;
- C.2.4
- E-Print:
- The technical report of TeShu, which has been accepted at CIDR 2023