Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks

doi:10.48550/arXiv.2002.00768

Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks

Spoken dialogue systems typically use a list of top-N ASR hypotheses for inferring the semantic meaning and tracking the state of the dialogue. However ASR graphs, such as confusion networks (confnets), provide a compact representation of a richer hypothesis space than a top-N ASR list. In this paper, we study the benefits of using confusion networks with a state-of-the-art neural dialogue state tracker (DST). We encode the 2-dimensional confnet into a 1-dimensional sequence of embeddings using an attentional confusion network encoder which can be used with any DST system. Our confnet encoder is plugged into the state-of-the-art 'Global-locally Self-Attentive Dialogue State Tacker' (GLAD) model for DST and obtains significant improvements in both accuracy and inference time compared to using top-N ASR hypotheses.

Publication:

arXiv e-prints

Pub Date:

February 2020

DOI:

10.48550/arXiv.2002.00768

arXiv:

arXiv:2002.00768

Bibcode:

2020arXiv200200768P

Keywords:

Computer Science - Computation and Language;
Computer Science - Machine Learning;
Computer Science - Sound;
Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

Accepted at Interspeech-2020

NASA/ADS

Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks

Abstract