Finite mixture regression: A sparse variable selection by model selection for clustering
Abstract
We consider a finite mixture of Gaussian regression model for high- dimensional data, where the number of covariates may be much larger than the sample size. We propose to estimate the unknown conditional mixture density by a maximum likelihood estimator, restricted on relevant variables selected by an 1-penalized maximum likelihood estimator. We get an oracle inequality satisfied by this estimator with a Jensen-Kullback-Leibler type loss. Our oracle inequality is deduced from a general model selection theorem for maximum likelihood estimators with a random model collection. We can derive the penalty shape of the criterion, which depends on the complexity of the random model collection.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2014
- DOI:
- 10.48550/arXiv.1409.1331
- arXiv:
- arXiv:1409.1331
- Bibcode:
- 2014arXiv1409.1331D
- Keywords:
-
- Mathematics - Statistics Theory
- E-Print:
- 20 pages. arXiv admin note: text overlap with arXiv:1103.2021 by other authors