Investigating Information Inconsistency in Multilingual Open-Domain Question Answering
Abstract
Retrieval based open-domain QA systems use retrieved documents and answer-span selection over retrieved documents to find best-answer candidates. We hypothesize that multilingual Question Answering (QA) systems are prone to information inconsistency when it comes to documents written in different languages, because these documents tend to provide a model with varying information about the same topic. To understand the effects of the biased availability of information and cultural influence, we analyze the behavior of multilingual open-domain question answering models with a focus on retrieval bias. We analyze if different retriever models present different passages given the same question in different languages on TyDi QA and XOR-TyDi QA, two multilingualQA datasets. We speculate that the content differences in documents across languages might reflect cultural divergences and/or social biases.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2022
- DOI:
- 10.48550/arXiv.2205.12456
- arXiv:
- arXiv:2205.12456
- Bibcode:
- 2022arXiv220512456P
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Artificial Intelligence;
- Computer Science - Information Retrieval;
- Computer Science - Machine Learning