Multilingual Factor Analysis
Abstract
In this work we approach the task of learning multilingual word representations in an offline manner by fitting a generative latent variable model to a multilingual dictionary. We model equivalent words in different languages as different views of the same word generated by a common latent variable representing their latent lexical meaning. We explore the task of alignment by querying the fitted model for multilingual embeddings achieving competitive results across a variety of tasks. The proposed model is robust to noise in the embedding space making it a suitable method for distributed representations learned from noisy corpora.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2019
- DOI:
- 10.48550/arXiv.1905.05547
- arXiv:
- arXiv:1905.05547
- Bibcode:
- 2019arXiv190505547V
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Computation and Language;
- Statistics - Machine Learning
- E-Print:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics