On Distributed Cooperative Decision-Making in Multiarmed Bandits
Abstract
We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection of arms. We rigorously analyze the performance of the cooperative UCB algorithm and characterize the influence of communication graph structure on the decision-making performance of the group.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2015
- DOI:
- 10.48550/arXiv.1512.06888
- arXiv:
- arXiv:1512.06888
- Bibcode:
- 2015arXiv151206888L
- Keywords:
-
- Electrical Engineering and Systems Science - Systems and Control;
- Computer Science - Multiagent Systems;
- Mathematics - Optimization and Control;
- Statistics - Machine Learning
- E-Print:
- This revision provides a correction to the original paper, which appeared in the Proceedings of the 2016 European Control Conference (ECC). The second statement of Proposition 1, Theorem 1 and their proofs are new. The new Theorem 1 is used to prove the regret bounds in Theorem 2