Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens

doi:10.48550/arXiv.2406.15173

Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens

Large Language Models (LLMs) like ChatGPT or Bard have revolutionized information retrieval and captivated the audience with their ability to generate custom responses in record time, regardless of the topic. In this article, we assess the capabilities of various LLMs in producing reliable, comprehensive, and sufficiently relevant responses about historical facts in French. To achieve this, we constructed a testbed comprising numerous history-related questions of varying types, themes, and levels of difficulty. Our evaluation of responses from ten selected LLMs reveals numerous shortcomings in both substance and form. Beyond an overall insufficient accuracy rate, we highlight uneven treatment of the French language, as well as issues related to verbosity and inconsistency in the responses provided by LLMs.

Publication:

arXiv e-prints

Pub Date:

June 2024

DOI:

10.48550/arXiv.2406.15173

arXiv:

arXiv:2406.15173

Bibcode:

2024arXiv240615173C

Keywords:

Computer Science - Information Retrieval;
Computer Science - Artificial Intelligence

E-Print:

in French language

NASA/ADS

Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens

Abstract