Distributional Reinforcement Learning for Energy-Based Sequential Models

doi:10.48550/arXiv.1912.08517

Distributional Reinforcement Learning for Energy-Based Sequential Models

Global Autoregressive Models (GAMs) are a recent proposal [Parshakova et al., CoNLL 2019] for exploiting global properties of sequences for data-efficient learning of seq2seq models. In the first phase of training, an Energy-Based model (EBM) over sequences is derived. This EBM has high representational power, but is unnormalized and cannot be directly exploited for sampling. To address this issue [Parshakova et al., CoNLL 2019] proposes a distillation technique, which can only be applied under limited conditions. By relating this problem to Policy Gradient techniques in RL, but in a \emph{distributional} rather than \emph{optimization} perspective, we propose a general approach applicable to any sequential EBM. Its effectiveness is illustrated on GAM-based experiments.

Publication:

arXiv e-prints

Pub Date:

December 2019

DOI:

10.48550/arXiv.1912.08517

arXiv:

arXiv:1912.08517

Bibcode:

2019arXiv191208517P

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

OptRL workshop (Optimization Foundations for Reinforcement Learning) at Neurips 2019

NASA/ADS

Distributional Reinforcement Learning for Energy-Based Sequential Models

Abstract