HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline

doi:10.48550/arXiv.2001.02338

HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline

Prior research in resource scheduling for machine learning training workloads has largely focused on minimizing job completion times. Commonly, these model training workloads collectively search over a large number of parameter values that control the learning process in a hyperparameter search. It is preferable to identify and maximally provision the best-performing hyperparameter configuration (trial) to achieve the highest accuracy result as soon as possible. To optimally trade-off evaluating multiple configurations and training the most promising ones by a fixed deadline, we design and build HyperSched -- a dynamic application-level resource scheduler to track, identify, and preferentially allocate resources to the best performing trials to maximize accuracy by the deadline. HyperSched leverages three properties of a hyperparameter search workload over-looked in prior work - trial disposability, progressively identifiable rankings among different configurations, and space-time constraints - to outperform standard hyperparameter search algorithms across a variety of benchmarks.

Publication:

arXiv e-prints

Pub Date:

January 2020

DOI:

10.48550/arXiv.2001.02338

arXiv:

arXiv:2001.02338

Bibcode:

2020arXiv200102338L

Keywords:

Computer Science - Distributed;
Parallel;
and Cluster Computing;
Computer Science - Machine Learning

NASA/ADS

HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline

Abstract