Sample Complexity of KernelBased QLearning
Abstract
Modern reinforcement learning (RL) often faces an enormous stateaction space. Existing analytical results are typically for settings with a small number of stateactions, or simple models such as linearly modeled Qfunctions. To derive statistically efficient RL policies handling large stateaction spaces, with more general Qfunctions, some recent works have considered nonlinear function approximation using kernel ridge regression. In this work, we derive sample complexities for kernel based Qlearning when a generative model exists. We propose a nonparametric Qlearning algorithm which finds an $\epsilon$optimal policy in an arbitrarily large scale discounted MDP. The sample complexity of the proposed algorithm is order optimal with respect to $\epsilon$ and the complexity of the kernel (in terms of its information gain). To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.
 Publication:

arXiv eprints
 Pub Date:
 February 2023
 DOI:
 10.48550/arXiv.2302.00727
 arXiv:
 arXiv:2302.00727
 Bibcode:
 2023arXiv230200727Y
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Artificial Intelligence;
 Statistics  Machine Learning