Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework
Abstract
We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero. Finally, we prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm outperforms both an adaptive control based method and a deep neural networks based algorithm by a large margin in our simulations.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2019
- DOI:
- 10.48550/arXiv.1904.11392
- arXiv:
- arXiv:1904.11392
- Bibcode:
- 2019arXiv190411392W
- Keywords:
-
- Quantitative Finance - Portfolio Management;
- Computer Science - Computational Engineering;
- Finance;
- and Science;
- Computer Science - Machine Learning;
- Mathematics - Optimization and Control;
- 91G10
- E-Print:
- 39 pages, 5 figures