Faster Boosting with Smaller Memory

doi:10.48550/arXiv.1901.09047

Faster Boosting with Smaller Memory

State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small. This is achieved using a combination of three techniques: early stopping, effective sample size, and stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory.

Publication:

arXiv e-prints

Pub Date:

January 2019

DOI:

10.48550/arXiv.1901.09047

arXiv:

arXiv:1901.09047

Bibcode:

2019arXiv190109047A

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

NeurIPS 2019

NASA/ADS

Faster Boosting with Smaller Memory

Abstract