How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM
Abstract
Instruction tuning a large language model with multiple languages can prepare it for multilingual downstream tasks. Nonetheless, it is yet to be determined whether having a handful of languages is sufficient, or whether the benefits increase with the inclusion of more. By fine-tuning large multilingual models on 1 to 52 languages, we present a case study on BLOOM to understand three pertinent factors affecting performance: the number of languages, language exposure, and similarity between training and test languages. Overall we found that 1) expanding language coverage in multilingual instruction tuning proves to be beneficial; 2) accuracy often significantly boots if the test language appears in the instruction mixture; 3) languages' genetic features correlate with cross-lingual transfer more than merely the number of language but different languages benefit to various degrees.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2024
- DOI:
- arXiv:
- arXiv:2404.04850
- Bibcode:
- 2024arXiv240404850J
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- COLING 2025