Variable Selection for Multiply-imputed Data: A Bayesian Framework
Abstract
Multiple imputation is a widely used technique to handle missing data in large observational studies. For variable selection on multiply-imputed datasets, however, if we conduct selection on each imputed dataset separately, different sets of important variables may be obtained. MI-LASSO, one of the most popular solutions to this problem, regards the same variable across all separate imputed datasets as a group of variables and exploits Group-LASSO to yield a consistent variable selection across all the multiply-imputed datasets. In this paper, we extend the MI-LASSO model into Bayesian framework and utilize five different Bayesian MI-LASSO models to perform variable selection on multiply-imputed data. These five models consist of three shrinkage priors based and two discrete mixture prior based approaches. We conduct a simulation study investigating the practical characteristics of each model across various settings. We further demonstrate these methods via a case study using the multiply-imputed data from the University of Michigan Dioxin Exposure Study. The Python package BMIselect is hosted on Github under an Apache-2.0 license: https://github.com/zjg540066169/Bmiselect.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2022
- DOI:
- arXiv:
- arXiv:2211.00114
- Bibcode:
- 2022arXiv221100114Z
- Keywords:
-
- Statistics - Methodology