On The Statistical Complexity of Offline Decision-Making
Abstract
We study the statistical complexity of offline decision-making with function approximation, establishing (near) minimax-optimal rates for stochastic contextual bandits and Markov decision processes. The performance limits are captured by the pseudo-dimension of the (value) function class and a new characterization of the behavior policy that \emph{strictly} subsumes all the previous notions of data coverage in the offline decision-making literature. In addition, we seek to understand the benefits of using offline data in online decision-making and show nearly minimax-optimal rates in a wide range of regimes.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- arXiv:
- arXiv:2501.06339
- Bibcode:
- 2025arXiv250106339N
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- Statistics - Machine Learning
- E-Print:
- arXiv version for the ICML'24 paper