Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

doi:10.48550/arXiv.1904.11392

Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero. Finally, we prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm outperforms both an adaptive control based method and a deep neural networks based algorithm by a large margin in our simulations.

Publication:

arXiv e-prints

Pub Date:

April 2019

DOI:

10.48550/arXiv.1904.11392

arXiv:

arXiv:1904.11392

Bibcode:

2019arXiv190411392W

Keywords:

Quantitative Finance - Portfolio Management;
Computer Science - Computational Engineering;
Finance;
and Science;
Computer Science - Machine Learning;
Mathematics - Optimization and Control;
91G10

E-Print:

39 pages, 5 figures

NASA/ADS

Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

Abstract