Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results
Abstract
Machine translation between Arabic and Hebrew has so far been limited by a lack of parallel corpora, despite the political and cultural importance of this language pair. Previous work relied on manually-crafted grammars or pivoting via English, both of which are unsatisfactory for building a scalable and accurate MT system. In this work, we compare standard phrase-based and neural systems on Arabic-Hebrew translation. We experiment with tokenization by external tools and sub-word modeling by character-level neural models, and show that both methods lead to improved translation performance, with a small advantage to the neural models.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2016
- DOI:
- 10.48550/arXiv.1609.07701
- arXiv:
- arXiv:1609.07701
- Bibcode:
- 2016arXiv160907701B
- Keywords:
-
- Computer Science - Computation and Language;
- I.2.7
- E-Print:
- SeMaT 2016