Representative Pure Risk Estimation by Using Data from Epidemiologic Studies, Surveys, and Registries: Estimating Risks for Minority Subgroups
Abstract
Representative risk estimation is fundamental to clinical decision-making. However, risks are often estimated from non-representative epidemiologic studies, which usually underrepresent minorities. "Model-based" methods use population registries to improve externally validity of risk estimation but assume hazard ratios (HR) are generalizable from samples to the target finite population. "Pseudoweighting" methods improve representativeness of studies by using an external probability-based survey as the reference, but the resulting estimators can be biased due to propensity model misspecification or inefficient due to variable pseudoweights or small sample sizes of minorities in the cohort and/or survey. We propose a two-step pseudoweighting procedure that poststratifies the event rates among age/race/sex strata in the pseudoweighted cohort to the population rates to produce efficient and robust pure risk estimation (i.e., a cause-specific absolute risk in the absence of competing events). For developing an all-cause mortality risk model representative for the US, our findings suggest that HRs for minorities are not generalizable, and that surveys can have inadequate numbers of events for minorities. Poststratification on event rates is crucial for obtaining reliable risk estimation for minority subgroups.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2022
- DOI:
- arXiv:
- arXiv:2203.05409
- Bibcode:
- 2022arXiv220305409W
- Keywords:
-
- Statistics - Methodology;
- Statistics - Applications