Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement Learning

doi:10.48550/arXiv.2304.13571

Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement Learning

Reinforcement learning is a growing field in AI with a lot of potential. Intelligent behavior is learned automatically through trial and error in interaction with the environment. However, this learning process is often costly. Using variational quantum circuits as function approximators potentially can reduce this cost. In order to implement this, we propose the quantum natural policy gradient (QNPG) algorithm -- a second-order gradient-based routine that takes advantage of an efficient approximation of the quantum Fisher information matrix. We experimentally demonstrate that QNPG outperforms first-order based training on Contextual Bandits environments regarding convergence speed and stability and moreover reduces the sample complexity. Furthermore, we provide evidence for the practical feasibility of our approach by training on a 12-qubit hardware device.

Publication:

arXiv e-prints

Pub Date:

April 2023

DOI:

10.48550/arXiv.2304.13571

arXiv:

arXiv:2304.13571

Bibcode:

2023arXiv230413571M

Keywords:

Quantum Physics;
Computer Science - Machine Learning

E-Print:

Accepted to the 1st International Workshop on Quantum Machine Learning: From Foundations to Applications (QML@QCE 2023), Bellevue, Washington, USA. 6 pages, 4 figures, 1 table

ADS

Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement Learning

Abstract