SAF: Simulated Annealing Fair Scheduling for Hadoop Yarn Clusters
Abstract
Apache introduced YARN as the next generation of the Hadoop framework, providing resource management and a central platform to deliver consistent data governance tools across Hadoop clusters. Hadoop YARN supports multiple frameworks like MapReduce to process different types of data and works with different scheduling policies such as FIFO, Capacity, and Fair schedulers. DRF is the best option that uses short-term, without considering history information, convergence to fairness for multi-type resource allocation. However, DRF performance is still not satisfying due to trade-offs between fairness and performance regarding resource utilization. To address this problem, we propose Simulated Annealing Fair scheduling, SAF, a long-term fair scheme in resource allocation to have fairness and excellent performance in terms of resource utilization and MakeSpan. We introduce a new parameter as entropy, which is an approach to indicates the disorder in the fairness of allocated resources of the whole cluster. We implemented SAF as a pluggable scheduler in Hadoop Yarn Cluster and evaluated it with standard MapReduce benchmarks in Yarn Scheduler Load Simulator (SLS) and CloudSim Plus simulation framework. Finally, the results of both simulation tools are evidence to prove our claim. Compared to DRF, SAF increases resource utilization of YARN clusters significantly and decreases MakeSpan to an appropriate level.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2020
- DOI:
- 10.48550/arXiv.2008.12586
- arXiv:
- arXiv:2008.12586
- Bibcode:
- 2020arXiv200812586G
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing
- E-Print:
- 28 pages, 8 sections, 7 figures