The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs

doi:10.48550/arXiv.2407.18786

The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs

This paper studies gender bias in machine translation through the lens of Large Language Models (LLMs). Four widely-used test sets are employed to benchmark various base LLMs, comparing their translation quality and gender bias against state-of-the-art Neural Machine Translation (NMT) models for English to Catalan (En $\rightarrow$ Ca) and English to Spanish (En $\rightarrow$ Es) translation directions. Our findings reveal pervasive gender bias across all models, with base LLMs exhibiting a higher degree of bias compared to NMT models. To combat this bias, we explore prompting engineering techniques applied to an instruction-tuned LLM. We identify a prompt structure that significantly reduces gender bias by up to 12% on the WinoMT evaluation dataset compared to more straightforward prompts. These results significantly reduce the gender bias accuracy gap between LLMs and traditional NMT systems.

Publication:

arXiv e-prints

Pub Date:

July 2024

DOI:

10.48550/arXiv.2407.18786

arXiv:

arXiv:2407.18786

Bibcode:

2024arXiv240718786S

Keywords:

Computer Science - Computation and Language

NASA/ADS

The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs

Abstract