Batched bandit problems

doi:10.48550/arXiv.1505.00369

Batched bandit problems

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Publication:

arXiv e-prints

Pub Date:

May 2015

DOI:

10.48550/arXiv.1505.00369

arXiv:

arXiv:1505.00369

Bibcode:

2015arXiv150500369P

Keywords:

Mathematics - Statistics Theory

E-Print:

Published at http://dx.doi.org/10.1214/15-AOS1381 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

NASA/ADS

Batched bandit problems

Abstract