Bounded regret in stochastic multiarmed bandits
Abstract
We study the stochastic multiarmed bandit problem when one knows the value $\mu^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $\Delta$. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows $\Delta$, and bounded regret of order $1/\Delta$ is not possible if one only knows $\mu^{(\star)}$
 Publication:

arXiv eprints
 Pub Date:
 February 2013
 arXiv:
 arXiv:1302.1611
 Bibcode:
 2013arXiv1302.1611B
 Keywords:

 Mathematics  Statistics Theory;
 Computer Science  Machine Learning;
 Statistics  Machine Learning;
 62L05