Context-Aware Cross-Lingual Mapping

doi:10.48550/arXiv.1903.03243

Context-Aware Cross-Lingual Mapping

Cross-lingual word vectors are typically obtained by fitting an orthogonal matrix that maps the entries of a bilingual dictionary from a source to a target vector space. Word vectors, however, are most commonly used for sentence or document-level representations that are calculated as the weighted average of word embeddings. In this paper, we propose an alternative to word-level mapping that better reflects sentence-level cross-lingual similarity. We incorporate context in the transformation matrix by directly mapping the averaged embeddings of aligned sentences in a parallel corpus. We also implement cross-lingual mapping of deep contextualized word embeddings using parallel sentences with word alignments. In our experiments, both approaches resulted in cross-lingual sentence embeddings that outperformed context-independent word mapping in sentence translation retrieval. Furthermore, the sentence-level transformation could be used for word-level mapping without loss in word translation quality.

Publication:

arXiv e-prints

Pub Date:

March 2019

DOI:

10.48550/arXiv.1903.03243

arXiv:

arXiv:1903.03243

Bibcode:

2019arXiv190303243A

Keywords:

Computer Science - Computation and Language

E-Print:

NAACL-HLT 2019 (short paper). 5 pages, 1 figure

NASA/ADS

Context-Aware Cross-Lingual Mapping

Abstract