Reinforcement Learning with Dynamic Convex Risk Measures
Abstract
We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2021
- DOI:
- 10.48550/arXiv.2112.13414
- arXiv:
- arXiv:2112.13414
- Bibcode:
- 2021arXiv211213414C
- Keywords:
-
- Computer Science - Machine Learning;
- Quantitative Finance - Computational Finance;
- Quantitative Finance - Mathematical Finance;
- Quantitative Finance - Risk Management;
- Quantitative Finance - Trading and Market Microstructure
- E-Print:
- 26 pages, 9 figures