Debiased Lasso After Sample Splitting for Estimation and Inference in High Dimensional Generalized Linear Models
Abstract
We consider random sample splitting for estimation and inference in high dimensional generalized linear models, where we first apply the lasso to select a submodel using one subsample and then apply the debiased lasso to fit the selected model using the remaining subsample. We show that, no matter including a prespecified subset of regression coefficients or not, the debiased lasso estimation of the selected submodel after a single splitting follows a normal distribution asymptotically. Furthermore, for a set of prespecified regression coefficients, we show that a multiple splitting procedure based on the debiased lasso can address the loss of efficiency associated with sample splitting and produce asymptotically normal estimates under mild conditions. Our simulation results indicate that using the debiased lasso instead of the standard maximum likelihood estimator in the estimation stage can vastly reduce the bias and variance of the resulting estimates. We illustrate the proposed multiple splitting debiased lasso method with an analysis of the smoking data of the Mid-South Tobacco Case-Control Study.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2023
- DOI:
- arXiv:
- arXiv:2302.14218
- Bibcode:
- 2023arXiv230214218V
- Keywords:
-
- Statistics - Methodology
- E-Print:
- 28 pages, 0 figures