Learning from Humans as an I-POMDP
Abstract
The interactive partially observable Markov decision process (I-POMDP) is a recently developed framework which extends the POMDP to the multi-agent setting by including agent models in the state space. This paper argues for formulating the problem of an agent learning interactively from a human teacher as an I-POMDP, where the agent \emph{programming} to be learned is captured by random variables in the agent's state space, all \emph{signals} from the human teacher are treated as observed random variables, and the human teacher, modeled as a distinct agent, is explicitly represented in the agent's state space. The main benefits of this approach are: i. a principled action selection mechanism, ii. a principled belief update mechanism, iii. support for the most common teacher \emph{signals}, and iv. the anticipated production of complex beneficial interactions. The proposed formulation, its benefits, and several open questions are presented.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2012
- DOI:
- 10.48550/arXiv.1204.0274
- arXiv:
- arXiv:1204.0274
- Bibcode:
- 2012arXiv1204.0274W
- Keywords:
-
- Computer Science - Robotics;
- Computer Science - Artificial Intelligence