Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

doi:10.48550/arXiv.1905.12425

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret $\tilde{\mathcal{O}}(\sqrt{DSAT})$ up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.

Publication:

arXiv e-prints

Pub Date:

May 2019

DOI:

10.48550/arXiv.1905.12425

arXiv:

arXiv:1905.12425

Bibcode:

2019arXiv190512425T

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Computer Science and Game Theory;
Statistics - Machine Learning

E-Print:

the algorithm has been simplified (no need to look at lower bound of the reward and transitions). Proof has been significantly clean-up. The previous "assumption" is clarified as a condition of the algorithm well-known as sub-modularity. The proof that the bounds satisfy the submodularity is clean-up

NASA/ADS

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

Abstract