Autonomous exploration for navigating in non-stationary CMPs
Abstract
We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM and prove an upper bound on the exploration steps in terms of the number of changes.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2019
- DOI:
- 10.48550/arXiv.1910.08446
- arXiv:
- arXiv:1910.08446
- Bibcode:
- 2019arXiv191008446G
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning