(Almost) all of entity resolution
Abstract
Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme—integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as structured entity resolution (record linkage or deduplication). Here, we review motivational applications and seminal papers that have led to the growth of this area. We review modern probabilistic and Bayesian methods in statistics, computer science, machine learning, database management, economics, political science, and other disciplines that are used throughout industry and academia in applications such as human rights, official statistics, medicine, and citation networks, among others. Last, we discuss current research topics of practical importance. This article reviews entity resolution or record linkage in both statistics and computer science.
- Publication:
-
Science Advances
- Pub Date:
- March 2022
- DOI:
- 10.1126/sciadv.abi8021
- Bibcode:
- 2022SciA....8I8021B