Minimizing the Outage Probability in a Markov Decision Process

doi:10.48550/arXiv.2302.14714

Minimizing the Outage Probability in a Markov Decision Process

Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is greater than a given value. The algorithm can be seen as an extension of the value iteration algorithm. We also show how the proposed algorithm could be generalized to use neural networks, similarly to the deep Q learning extension of Q learning.

Publication:

arXiv e-prints

Pub Date:

February 2023

DOI:

10.48550/arXiv.2302.14714

arXiv:

arXiv:2302.14714

Bibcode:

2023arXiv230214714C

Keywords:

Computer Science - Machine Learning

E-Print:

Accepted at the Information Theory Workshop (ITW) 2023

ADS

Minimizing the Outage Probability in a Markov Decision Process

Abstract