Regret Analysis of the Anytime Optimally Confident UCB Algorithm
Abstract
I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise. The new algorithm is simple, intuitive (in hindsight) and comes with the strongest finite-time regret guarantees for a horizon-free algorithm so far. I also show a finite-time lower bound that nearly matches the upper bound.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2016
- DOI:
- 10.48550/arXiv.1603.08661
- arXiv:
- arXiv:1603.08661
- Bibcode:
- 2016arXiv160308661L
- Keywords:
-
- Computer Science - Machine Learning;
- Mathematics - Statistics Theory;
- Statistics - Machine Learning
- E-Print:
- 16 pages