Integrating representative and non-representative survey data for efficient inference
Abstract
Non-representative surveys are commonly used and widely available but suffer from selection bias that generally cannot be entirely eliminated using weighting techniques. Instead, we propose a Bayesian method to synthesize longitudinal representative unbiased surveys with non-representative biased surveys by estimating the degree of selection bias over time. We show using a simulation study that synthesizing biased and unbiased surveys together out-performs using the unbiased surveys alone, even if the selection bias may evolve in a complex manner over time. Using COVID-19 vaccination data, we are able to synthesize two large sample biased surveys with an unbiased survey to reduce uncertainty in now-casting and inference estimates while simultaneously retaining the empirical credible interval coverage. Ultimately, we are able to conceptually obtain the properties of a large sample unbiased survey if the assumed unbiased survey, used to anchor the estimates, is unbiased for all time-points.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2024
- DOI:
- 10.48550/arXiv.2404.02283
- arXiv:
- arXiv:2404.02283
- Bibcode:
- 2024arXiv240402283D
- Keywords:
-
- Statistics - Methodology;
- Statistics - Applications
- E-Print:
- New version includes fixed typos, Monte Carlo Standard error for the simulation (added in V2), and some clarifications