Regret Analysis of the Anytime Optimally Confident UCB Algorithm

doi:10.48550/arXiv.1603.08661

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

Lattimore, Tor

I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise. The new algorithm is simple, intuitive (in hindsight) and comes with the strongest finite-time regret guarantees for a horizon-free algorithm so far. I also show a finite-time lower bound that nearly matches the upper bound.

Publication:

arXiv e-prints

Pub Date:

March 2016

DOI:

10.48550/arXiv.1603.08661

arXiv:

arXiv:1603.08661

Bibcode:

2016arXiv160308661L

Keywords:

Computer Science - Machine Learning;
Mathematics - Statistics Theory;
Statistics - Machine Learning

E-Print:

16 pages

NASA/ADS

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

Abstract