An Analytical Update Rule for General Policy Optimization

doi:10.48550/arXiv.2112.02045

An Analytical Update Rule for General Policy Optimization

We present an analytical policy update rule that is independent of parametric function approximators. The policy update rule is suitable for optimizing general stochastic policies and has a monotonic improvement guarantee. It is derived from a closed-form solution to trust-region optimization using calculus of variation, following a new theoretical result that tightens existing bounds for policy improvement using trust-region methods. The update rule builds a connection between policy search methods and value function methods. Moreover, off-policy reinforcement learning algorithms can be derived from the update rule since it does not need to compute integration over on-policy states. In addition, the update rule extends immediately to cooperative multi-agent systems when policy updates are performed by one agent at a time.

Publication:

arXiv e-prints

Pub Date:

December 2021

DOI:

10.48550/arXiv.2112.02045

arXiv:

arXiv:2112.02045

Bibcode:

2021arXiv211202045L

Keywords:

Computer Science - Artificial Intelligence

NASA/ADS

An Analytical Update Rule for General Policy Optimization

Abstract