A hierarchical Bayesian approach to record linkage and population size problems
Abstract
We propose and illustrate a hierarchical Bayesian approach for matching statistical records observed on different occasions. We show how this model can be profitably adopted both in record linkage problems and in capture--recapture setups, where the size of a finite population is the real object of interest. There are at least two important differences between the proposed model-based approach and the current practice in record linkage. First, the statistical model is built up on the actually observed categorical variables and no reduction (to 0--1 comparisons) of the available information takes place. Second, the hierarchical structure of the model allows a two-way propagation of the uncertainty between the parameter estimation step and the matching procedure so that no plug-in estimates are used and the correct uncertainty is accounted for both in estimating the population size and in performing the record linkage. We illustrate and motivate our proposal through a real data example and simulations.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2010
- DOI:
- 10.48550/arXiv.1011.2649
- arXiv:
- arXiv:1011.2649
- Bibcode:
- 2010arXiv1011.2649T
- Keywords:
-
- Statistics - Applications;
- Statistics - Methodology
- E-Print:
- Published in at http://dx.doi.org/10.1214/10-AOAS447 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)