Coordination without communication: optimal regret in two players multi-armed bandits

doi:10.48550/arXiv.2002.07596

Coordination without communication: optimal regret in two players multi-armed bandits

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.

Publication:

arXiv e-prints

Pub Date:

February 2020

DOI:

10.48550/arXiv.2002.07596

arXiv:

arXiv:2002.07596

Bibcode:

2020arXiv200207596B

Keywords:

Computer Science - Computer Science and Game Theory;
Computer Science - Machine Learning;
Computer Science - Multiagent Systems;
Statistics - Machine Learning

E-Print:

28 pages, 5 figures. V2: minor revision

NASA/ADS

Coordination without communication: optimal regret in two players multi-armed bandits

Abstract