Open-Domain Dialog Evaluation using Follow-Ups Likelihood
Abstract
Automatic evaluation of open-domain dialogs remains an unsolved problem. Moreover, existing methods do not correlate strongly with human annotations. This paper presents a new automated evaluation method using follow-ups: we measure the probability that a language model will continue the conversation with a fixed set of follow-ups (e.g., not really relevant here, what are you trying to say). When compared against twelve existing methods, our new evaluation achieves the highest correlation with human evaluations.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2022
- DOI:
- 10.48550/arXiv.2209.05185
- arXiv:
- arXiv:2209.05185
- Bibcode:
- 2022arXiv220905185D
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- Accepted at COLING 2022