A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models
Abstract
Dirichlet Process Mixture (DPM) models have been increasingly employed to specify random partition models that take into account possible patterns within the covariates. Furthermore, to deal with large numbers of covariates, methods for selecting the most important covariates have been proposed. Commonly, the covariates are chosen either for their importance in determining the clustering of the observations or for their effect on the level of a response variable (when a regression model is specified). Typically both strategies involve the specification of latent indicators that regulate the inclusion of the covariates in the model. Common examples involve the use of spike and slab prior distributions. In this work we review the most relevant DPM models that include covariate information in the induced partition of the observations and we focus on available variable selection techniques for these models. We highlight the main features of each model and demonstrate them in simulations and in a real data application.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2015
- DOI:
- 10.48550/arXiv.1508.00129
- arXiv:
- arXiv:1508.00129
- Bibcode:
- 2015arXiv150800129B
- Keywords:
-
- Statistics - Applications
- E-Print:
- 26 pages, 5 figures