COMBO and COMMA: R packages for regression modeling and inference in the presence of misclassified binary mediator or outcome variables
Abstract
Misclassified binary outcome or mediator variables can cause unpredictable bias in resulting parameter estimates. As more datasets that were not originally collected for research purposes are being used for studies in the social and health sciences, the need for methods that address data quality concerns is growing. In this paper, we describe two R packages, COMBO and COMMA, that implement bias-correction methods for misclassified binary outcome and mediator variables, respectively. These likelihood-based approaches do not require gold standard measures and allow for estimation of sensitivity and specificity rates for the misclassified variable(s). In addition, these R packages automatically apply crucial label switching corrections, allowing researchers to circumvent the inherent permutation invariance of the misclassification model likelihood. We demonstrate COMBO for single-outcome cases using a study of bar exam passage. We develop and evaluate a risk prediction model based on noisy indicators in a pretrial risk assessment study to demonstrate COMBO for multi-outcome cases. In addition, we use COMMA to evaluate the mediating effect of potentially misdiagnosed gestational hypertension on the maternal ethnicity-birthweight relationship.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- arXiv:
- arXiv:2501.08320
- Bibcode:
- 2025arXiv250108320H
- Keywords:
-
- Statistics - Computation;
- Statistics - Other Statistics
- E-Print:
- 99 pages, 7 figures