Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model

doi:10.48550/arXiv.1810.12836

Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model

A significant roadblock in multilingual neural language modeling is the lack of labeled non-English data. One potential method for overcoming this issue is learning cross-lingual text representations that can be used to transfer the performance from training on English tasks to non-English tasks, despite little to no task-specific non-English data. In this paper, we explore a natural setup for learning cross-lingual sentence representations: the dual-encoder. We provide a comprehensive evaluation of our cross-lingual representations on a number of monolingual, cross-lingual, and zero-shot/few-shot learning tasks, and also give an analysis of different learned cross-lingual embedding spaces.

Publication:

arXiv e-prints

Pub Date:

October 2018

DOI:

10.48550/arXiv.1810.12836

arXiv:

arXiv:1810.12836

Bibcode:

2018arXiv181012836C

Keywords:

Computer Science - Computation and Language

E-Print:

Accepted at the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

NASA/ADS

Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model

Abstract