Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

doi:10.48550/arXiv.1710.11277

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking domain show that the proposed Adversarial A2C can accelerate policy exploration efficiently.

Publication:

arXiv e-prints

Pub Date:

October 2017

DOI:

10.48550/arXiv.1710.11277

arXiv:

arXiv:1710.11277

Bibcode:

2017arXiv171011277P

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence;
Computer Science - Machine Learning

E-Print:

5 pages, 3 figures, ICASSP 2018

NASA/ADS

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

Abstract