How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM

doi:10.48550/arXiv.2404.04850

How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM

Instruction tuning a large language model with multiple languages can prepare it for multilingual downstream tasks. Nonetheless, it is yet to be determined whether having a handful of languages is sufficient, or whether the benefits increase with the inclusion of more. By fine-tuning large multilingual models on 1 to 52 languages, we present a case study on BLOOM to understand three pertinent factors affecting performance: the number of languages, language exposure, and similarity between training and test languages. Overall we found that 1) expanding language coverage in multilingual instruction tuning proves to be beneficial; 2) accuracy often significantly boots if the test language appears in the instruction mixture; 3) languages' genetic features correlate with cross-lingual transfer more than merely the number of language but different languages benefit to various degrees.

Publication:

arXiv e-prints

Pub Date:

April 2024

DOI:

10.48550/arXiv.2404.04850

arXiv:

arXiv:2404.04850

Bibcode:

2024arXiv240404850J

Keywords:

Computer Science - Computation and Language

E-Print:

COLING 2025

ADS

How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM

Abstract