RiskSensitive Reinforcement Learning
Abstract
The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a longrun objective such as the infinitehorizon cumulative discounted or longrun average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either in the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., variance, exponential utility, percentile performance, chance constraints, value at risk (quantile), conditional valueatrisk, coherent risk measure, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinitehorizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the riskconstrained RL framework, cover popular risk measures based on variance, conditional valueatrisk, and chance constraints, and present a template for a risksensitive RL algorithm. Next, we study risksensitive RL with the objective of minimizing risk in an unconstrained framework, and cover cumulative prospect theory and coherent risk measures as special cases. We survey some of the recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures, in constrained as well as unconstrained frameworks. This nonexhaustive survey is aimed at giving a flavor of the challenges involved in solving risksensitive RL problems, and outlining some potential future research directions.
 Publication:

arXiv eprints
 Pub Date:
 October 2018
 arXiv:
 arXiv:1810.09126
 Bibcode:
 2018arXiv181009126P
 Keywords:

 Computer Science  Machine Learning;
 Mathematics  Optimization and Control;
 Statistics  Machine Learning