SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation
Abstract
In-context learning (ICL) greatly improves the performance of large language models (LLMs) on various down-stream tasks, where the improvement highly depends on the quality of demonstrations. In this work, we introduce syntactic knowledge to select better in-context examples for machine translation (MT). We propose a new strategy, namely Syntax-augmented COverage-based In-context example selection (SCOI), leveraging the deep syntactic structure beyond conventional word matching. Specifically, we measure the set-level syntactic coverage by computing the coverage of polynomial terms with the help of a simplified tree-to-polynomial algorithm, and lexical coverage using word overlap. Furthermore, we devise an alternate selection approach to combine both coverage measures, taking advantage of syntactic and lexical information. We conduct experiments with two multi-lingual LLMs on six translation directions. Empirical results show that our proposed SCOI obtains the highest average COMET score among all learning-free methods, indicating that combining syntactic and lexical coverage successfully helps to select better in-context examples for MT. Our code is available at https://github.com/JamyDon/SCOI.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2024
- DOI:
- arXiv:
- arXiv:2408.04872
- Bibcode:
- 2024arXiv240804872T
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- EMNLP 2024 main conference long paper. 16 pages, 2 figures, 14 tables