Fighting Contextual Bandits with Stochastic Smoothing

doi:10.48550/arXiv.1810.05188

Fighting Contextual Bandits with Stochastic Smoothing

We introduce a new stochastic smoothing perspective to study adversarial contextual bandit problems. We propose a general algorithm template that represents random perturbation based algorithms and identify several perturbation distributions that lead to strong regret bounds. Using the idea of smoothness, we provide an $O(\sqrt{T})$ zero-order bound for the vanilla algorithm and an $O(L^{*2/3}_{T})$ first-order bound for the clipped version. These bounds hold when the algorithms use with a variety of distributions that have a bounded hazard rate. Our algorithm template includes EXP4 as a special case corresponding to the Gumbel perturbation. Our regret bounds match existing results for EXP4 without relying on the specific properties of the algorithm.

Publication:

arXiv e-prints

Pub Date:

October 2018

DOI:

10.48550/arXiv.1810.05188

arXiv:

arXiv:1810.05188

Bibcode:

2018arXiv181005188J

Keywords:

Statistics - Machine Learning;
Computer Science - Machine Learning

E-Print:

merged to a manuscript "Online Learning via the Differential Privacy Lens," which can be found here: arXiv:1711.10019

ADS

Fighting Contextual Bandits with Stochastic Smoothing

Abstract