Constrained episodic reinforcement learning in concave-convex and knapsack settings

doi:10.48550/arXiv.2006.05051

Constrained episodic reinforcement learning in concave-convex and knapsack settings

We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either the feasibility question or settings with a single episode. Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.

Publication:

arXiv e-prints

Pub Date:

June 2020

DOI:

10.48550/arXiv.2006.05051

arXiv:

arXiv:2006.05051

Bibcode:

2020arXiv200605051B

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Data Structures and Algorithms;
Statistics - Machine Learning

E-Print:

The NeurIPS 2020 version of this paper includes a small bug, leading to an incorrect dependence on H in Theorem 3.4. This version fixes it by adjusting Eq. (9), Theorem 3.4 and the relevant proofs. Changes in the main text are noted in red. Changes in the appendix are limited to Appendices B.1, B.5, and B.6 and the statement of Lemma F.3

NASA/ADS

Constrained episodic reinforcement learning in concave-convex and knapsack settings

Abstract