HausaMT v1.0: Towards English-Hausa Neural Machine Translation

doi:10.48550/arXiv.2006.05014

HausaMT v1.0: Towards English-Hausa Neural Machine Translation

Akinfaderin, Adewale

Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English-Hausa machine translation, which is considered a task for low-resource language. The Hausa language is the second largest Afro-Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa-English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder-decoder architecture with two tokenization approaches: standard word-level tokenization and Byte Pair Encoding (BPE) subword tokenization.

Publication:

arXiv e-prints

Pub Date:

June 2020

DOI:

10.48550/arXiv.2006.05014

arXiv:

arXiv:2006.05014

Bibcode:

2020arXiv200605014A

Keywords:

Computer Science - Computation and Language;
Computer Science - Machine Learning

E-Print:

Accepted at 4th Widening NLP Workshop, Annual Meeting of the Association for Computational Linguistics, ACL 2020

NASA/ADS

HausaMT v1.0: Towards English-Hausa Neural Machine Translation

Abstract