Markov decision processes: on the convergence of the Monte-Carlo first visit algorithm
Abstract
We consider the Monte-Carlo first visit algorithm, of which the goal is to find the optimal control in a Markov decision process with finite state space and finite number of possible actions. We show its convergence when the discount factor is smaller than $1/2$.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- arXiv:
- arXiv:2501.08800
- Bibcode:
- 2025arXiv250108800D
- Keywords:
-
- Mathematics - Probability;
- Mathematics - Optimization and Control