Continuous control with deep reinforcement learning

doi:10.48550/arXiv.1509.02971

Continuous control with deep reinforcement learning

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Publication:

arXiv e-prints

Pub Date:

September 2015

DOI:

10.48550/arXiv.1509.02971

arXiv:

arXiv:1509.02971

Bibcode:

2015arXiv150902971L

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

10 pages + supplementary

NASA/ADS

Continuous control with deep reinforcement learning

Abstract