Optimistic Thompson Sampling for No-Regret Learning in Unknown Games

doi:10.48550/arXiv.2402.09456

Optimistic Thompson Sampling for No-Regret Learning in Unknown Games

This work tackles the complexities of multi-player scenarios in \emph{unknown games}, where the primary challenge lies in navigating the uncertainty of the environment through bandit feedback alongside strategic decision-making. We introduce Thompson Sampling (TS)-based algorithms that exploit the information of opponents' actions and reward structures, leading to a substantial reduction in experimental budgets -- achieving over tenfold improvements compared to conventional approaches. Notably, our algorithms demonstrate that, given specific reward structures, the regret bound depends logarithmically on the total action space, significantly alleviating the curse of multi-player. Furthermore, we unveil the \emph{Optimism-then-NoRegret} (OTN) framework, a pioneering methodology that seamlessly incorporates our advancements with established algorithms, showcasing its utility in practical scenarios such as traffic routing and radar sensing in the real world.

Publication:

arXiv e-prints

Pub Date:

February 2024

DOI:

10.48550/arXiv.2402.09456

arXiv:

arXiv:2402.09456

Bibcode:

2024arXiv240209456L

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Computer Science and Game Theory;
Statistics - Machine Learning

NASA/ADS

Optimistic Thompson Sampling for No-Regret Learning in Unknown Games

Abstract