Path Integral Policy Improvement with Covariance Matrix Adaptation

doi:10.48550/arXiv.1206.4621

Path Integral Policy Improvement with Covariance Matrix Adaptation

There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI2 to other members of the same family - Cross-Entropy Methods and CMAES - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI2-CMA for "Path Integral Policy Improvement with Covariance Matrix Adaptation". PI2-CMA's main advantage is that it determines the magnitude of the exploration noise automatically.

Publication:

arXiv e-prints

Pub Date:

June 2012

DOI:

10.48550/arXiv.1206.4621

arXiv:

arXiv:1206.4621

Bibcode:

2012arXiv1206.4621S

Keywords:

Computer Science - Machine Learning

E-Print:

ICML2012

NASA/ADS

Path Integral Policy Improvement with Covariance Matrix Adaptation

Abstract