Underbagging Nearest Neighbors for Imbalanced Classification
Abstract
In this paper, we propose an ensemble learning algorithm called \textit{underbagging $k$nearest neighbors} (\textit{underbagging $k$NN}) for imbalanced classification problems. On the theoretical side, by developing a new learning theory analysis, we show that with properly chosen parameters, i.e., the number of nearest neighbors $k$, the expected subsample size $s$, and the bagging rounds $B$, optimal convergence rates for underbagging $k$NN can be achieved under mild assumptions w.r.t.~the arithmetic mean (AM) of recalls. Moreover, we show that with a relatively small $B$, the expected subsample size $s$ can be much smaller than the number of training data $n$ at each bagging round, and the number of nearest neighbors $k$ can be reduced simultaneously, especially when the data are highly imbalanced, which leads to substantially lower time complexity and roughly the same space complexity. On the practical side, we conduct numerical experiments to verify the theoretical results on the benefits of the underbagging technique by the promising AM performance and efficiency of our proposed algorithm.
 Publication:

arXiv eprints
 Pub Date:
 September 2021
 arXiv:
 arXiv:2109.00531
 Bibcode:
 2021arXiv210900531H
 Keywords:

 Statistics  Machine Learning;
 Computer Science  Machine Learning