Pure Exploration with Multiple Correct Answers
Abstract
We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2019
- DOI:
- 10.48550/arXiv.1902.03475
- arXiv:
- arXiv:1902.03475
- Bibcode:
- 2019arXiv190203475D
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning