Scalable Exploration via Ensemble++
Abstract
Scalable exploration in high-dimensional, complex environments is a significant challenge in sequential decision making, especially when utilizing neural networks. Ensemble sampling, a practical approximation of Thompson sampling, is widely adopted but often suffers performance degradation due to {ensemble coupling} in shared layer architectures, leading to reduced diversity and ineffective exploration. In this paper, we introduce Ensemble++, a novel method that addresses these challenges through architectural and algorithmic innovations. To prevent ensemble coupling, Ensemble++ decouples mean and uncertainty estimation by separating the base network and ensemble components, employs a symmetrized loss function and the stop-gradient operator. To further enhance exploration, it generates richer hypothesis spaces through random linear combinations of ensemble components using continuous index sampling. Theoretically, we prove that Ensemble++ matches the regret bounds of exact Thompson sampling in linear contextual bandits while maintaining a scalable per-step computational complexity of $\tilde{O}( \log T)$. This provides the first rigorous analysis demonstrating that ensemble sampling can be an scalable and effective approximation to Thompson Sampling, closing a key theoretical gap in exploration efficiency. Empirically, we demonstrate Ensemble++'s effectiveness in both regret minimization and computational efficiency across a range of nonlinear bandit environments, including a language-based contextual bandits where the agents employ GPT backbones. Our results highlight the capability of Ensemble++ for real-time adaptation in complex environments where computational and data collection budgets are constrained. \url{https://github.com/szrlee/Ensemble_Plus_Plus}
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2024
- DOI:
- arXiv:
- arXiv:2407.13195
- Bibcode:
- 2024arXiv240713195L
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- Computer Science - Human-Computer Interaction;
- Computer Science - Information Theory;
- Statistics - Machine Learning
- E-Print:
- 54 pages