Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation
Abstract
We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2020
- DOI:
- 10.48550/arXiv.2006.00475
- arXiv:
- arXiv:2006.00475
- Bibcode:
- 2020arXiv200600475L
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- To appear in Mathematical Statistics and Learning. 22 pages, 6 figures