YAD: Leveraging T5 for Improved Automatic Diacritization of Yorùbá Text
Abstract
In this work, we present Yorùbá automatic diacritization (YAD) benchmark dataset for evaluating Yorùbá diacritization systems. In addition, we pre-train text-to-text transformer, T5 model for Yorùbá and showed that this model outperform several multilingually trained T5 models. Lastly, we showed that more data and larger models are better at diacritization for Yorùbá
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.20218
- Bibcode:
- 2024arXiv241220218O
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- Accepted at AfricaNLP Workshop at ICLR 2024