Optimal Best-Arm Identification in Bandits with Access to Offline Data

doi:10.48550/arXiv.2306.09048

Optimal Best-Arm Identification in Bandits with Access to Offline Data

Learning paradigms based purely on offline data as well as those based solely on sequential online learning have been well-studied in the literature. In this paper, we consider combining offline data with online learning, an area less studied but of obvious practical importance. We consider the stochastic $K$-armed bandit problem, where our goal is to identify the arm with the highest mean in the presence of relevant offline data, with confidence $1-\delta$. We conduct a lower bound analysis on policies that provide such $1-\delta$ probabilistic correctness guarantees. We develop algorithms that match the lower bound on sample complexity when $\delta$ is small. Our algorithms are computationally efficient with an average per-sample acquisition cost of $\tilde{O}(K)$, and rely on a careful characterization of the optimality conditions of the lower bound problem.

Publication:

arXiv e-prints

Pub Date:

June 2023

DOI:

10.48550/arXiv.2306.09048

arXiv:

arXiv:2306.09048

Bibcode:

2023arXiv230609048A

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

45 pages, 5 figures

NASA/ADS

Optimal Best-Arm Identification in Bandits with Access to Offline Data

Abstract