A Note on a Tight Lower Bound for MNL-Bandit Assortment Selection Models

doi:10.48550/arXiv.1709.06109

A Note on a Tight Lower Bound for MNL-Bandit Assortment Selection Models

In this short note we consider a dynamic assortment planning problem under the capacitated multinomial logit (MNL) bandit model. We prove a tight lower bound on the accumulated regret that matches existing regret upper bounds for all parameters (time horizon $T$, number of items $N$ and maximum assortment capacity $K$) up to logarithmic factors. Our results close an $O(\sqrt{K})$ gap between upper and lower regret bounds from existing works.

Publication:

arXiv e-prints

Pub Date:

September 2017

DOI:

10.48550/arXiv.1709.06109

arXiv:

arXiv:1709.06109

Bibcode:

2017arXiv170906109C

Keywords:

Statistics - Machine Learning;
Computer Science - Machine Learning

E-Print:

Final version, 4 pages (double column)

NASA/ADS

A Note on a Tight Lower Bound for MNL-Bandit Assortment Selection Models

Abstract