Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

doi:10.48550/arXiv.2202.02433

Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning (IL) algorithm derived via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.

Publication:

arXiv e-prints

Pub Date:

February 2022

DOI:

10.48550/arXiv.2202.02433

arXiv:

arXiv:2202.02433

Bibcode:

2022arXiv220202433M

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence

E-Print:

ICML 2022. Project website: https://sites.google.com/view/smodice/home

NASA/ADS

Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

Abstract