BoostER: Leveraging Large Language Models for Enhancing Entity Resolution

doi:10.48550/arXiv.2403.06434

BoostER: Leveraging Large Language Models for Enhancing Entity Resolution

Entity resolution, which involves identifying and merging records that refer to the same real-world entity, is a crucial task in areas like Web data integration. This importance is underscored by the presence of numerous duplicated and multi-version data resources on the Web. However, achieving high-quality entity resolution typically demands significant effort. The advent of Large Language Models (LLMs) like GPT-4 has demonstrated advanced linguistic capabilities, which can be a new paradigm for this task. In this paper, we propose a demonstration system named BoostER that examines the possibility of leveraging LLMs in the entity resolution process, revealing advantages in both easy deployment and low cost. Our approach optimally selects a set of matching questions and poses them to LLMs for verification, then refines the distribution of entity resolution results with the response of LLMs. This offers promising prospects to achieve a high-quality entity resolution result for real-world applications, especially to individuals or small companies without the need for extensive model training or significant financial investment.

Publication:

arXiv e-prints

Pub Date:

March 2024

DOI:

10.48550/arXiv.2403.06434

arXiv:

arXiv:2403.06434

Bibcode:

2024arXiv240306434L

Keywords:

Computer Science - Databases

E-Print:

4 pages, 3 figures, The Web Conf 2024 - WWW'24

NASA/ADS

BoostER: Leveraging Large Language Models for Enhancing Entity Resolution

Abstract