A Note on KL-UCB+ Policy for the Stochastic Bandit

doi:10.48550/arXiv.1903.07839

A Note on KL-UCB+ Policy for the Stochastic Bandit

Honda, Junya

A classic setting of the stochastic K-armed bandit problem is considered in this note. In this problem it has been known that KL-UCB policy achieves the asymptotically optimal regret bound and KL-UCB+ policy empirically performs better than the KL-UCB policy although the regret bound for the original form of the KL-UCB+ policy has been unknown. This note demonstrates that a simple proof of the asymptotic optimality of the KL-UCB+ policy can be given by the same technique as those used for analyses of other known policies.

Publication:

arXiv e-prints

Pub Date:

March 2019

DOI:

10.48550/arXiv.1903.07839

arXiv:

arXiv:1903.07839

Bibcode:

2019arXiv190307839H

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

6 pages, corrected typos

NASA/ADS

A Note on KL-UCB+ Policy for the Stochastic Bandit

Abstract