Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

doi:10.48550/arXiv.2006.00475

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Lattimore, Tor

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

Publication:

arXiv e-prints

Pub Date:

May 2020

DOI:

10.48550/arXiv.2006.00475

arXiv:

arXiv:2006.00475

Bibcode:

2020arXiv200600475L

Keywords:

Mathematics - Optimization and Control;
Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

To appear in Mathematical Statistics and Learning. 22 pages, 6 figures

NASA/ADS

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Abstract