Learning from Humans as an I-POMDP

doi:10.48550/arXiv.1204.0274

Learning from Humans as an I-POMDP

The interactive partially observable Markov decision process (I-POMDP) is a recently developed framework which extends the POMDP to the multi-agent setting by including agent models in the state space. This paper argues for formulating the problem of an agent learning interactively from a human teacher as an I-POMDP, where the agent \emph{programming} to be learned is captured by random variables in the agent's state space, all \emph{signals} from the human teacher are treated as observed random variables, and the human teacher, modeled as a distinct agent, is explicitly represented in the agent's state space. The main benefits of this approach are: i. a principled action selection mechanism, ii. a principled belief update mechanism, iii. support for the most common teacher \emph{signals}, and iv. the anticipated production of complex beneficial interactions. The proposed formulation, its benefits, and several open questions are presented.

Publication:

arXiv e-prints

Pub Date:

April 2012

DOI:

10.48550/arXiv.1204.0274

arXiv:

arXiv:1204.0274

Bibcode:

2012arXiv1204.0274W

Keywords:

Computer Science - Robotics;
Computer Science - Artificial Intelligence

NASA/ADS

Learning from Humans as an I-POMDP

Abstract