Weak Signal Asymptotics for Sequentially Randomized Experiments

doi:10.48550/arXiv.2101.09855

Weak Signal Asymptotics for Sequentially Randomized Experiments

We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ grows. In this regime, we show that the sample paths of a class of sequentially randomized experiments -- adapted to this scaling regime and with arm selection probabilities that vary continuously with state -- converge weakly to a diffusion limit, given as the solution to a stochastic differential equation. The diffusion limit enables us to derive refined, instance-specific characterization of stochastic dynamics, and to obtain several insights on the regret and belief evolution of a number of sequential experiments including Thompson sampling (but not UCB, which does not satisfy our continuity assumption). We show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from sub-optimal regret performance when the reward gaps are relatively large. Conversely, we find that a version of Thompson sampling with an asymptotically uninformative prior variance achieves near-optimal instance-specific regret scaling, including with large reward gaps, but these good regret properties come at the cost of highly unstable posterior beliefs.

Publication:

arXiv e-prints

Pub Date:

January 2021

DOI:

10.48550/arXiv.2101.09855

arXiv:

arXiv:2101.09855

Bibcode:

2021arXiv210109855K

Keywords:

Mathematics - Statistics Theory;
Computer Science - Machine Learning;
62B15;
60J70

E-Print:

Forthcoming in Management Science. An earlier draft of this paper was circulated under the title "Diffusion Asymptotics for Sequential Experiments.'' Xu Kuang published under a different full name in earlier versions of this manuscript. Please use X. Kuang and S. Wager when citing this paper

NASA/ADS

Weak Signal Asymptotics for Sequentially Randomized Experiments

Abstract