Identifying Semantic Divergences in Parallel Text without Annotations

doi:10.48550/arXiv.1803.11112

Identifying Semantic Divergences in Parallel Text without Annotations

Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.

Publication:

arXiv e-prints

Pub Date:

March 2018

DOI:

10.48550/arXiv.1803.11112

arXiv:

arXiv:1803.11112

Bibcode:

2018arXiv180311112V

Keywords:

Computer Science - Computation and Language

E-Print:

Accepted as a full paper to NAACL 2018

NASA/ADS

Identifying Semantic Divergences in Parallel Text without Annotations

Abstract