SIGLE: a valid procedure for Selective Inference with the Generalized Linear Lasso
Abstract
This article investigates uncertainty quantification of the generalized linear lasso~(GLL), a popular variable selection method in high-dimensional regression settings. In many fields of study, researchers use data-driven methods to select a subset of variables that are most likely to be associated with a response variable. However, such variable selection methods can introduce bias and increase the likelihood of false positives, leading to incorrect conclusions. In this paper, we propose a post-selection inference framework that addresses these issues and allows for valid statistical inference after variable selection using GLL. We show that our method provides accurate $p$-values and confidence intervals, while maintaining high statistical power. In a second stage, we focus on the sparse logistic regression, a popular classifier in high-dimensional statistics. We show with extensive numerical simulations that SIGLE is more powerful than state-of-the-art PSI methods. SIGLE relies on a new method to sample states from the distribution of observations conditional on the selection event. This method is based on a simulated annealing strategy whose energy is given by the first order conditions of the logistic lasso.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2022
- DOI:
- 10.48550/arXiv.2203.15348
- arXiv:
- arXiv:2203.15348
- Bibcode:
- 2022arXiv220315348D
- Keywords:
-
- Mathematics - Statistics Theory;
- Statistics - Methodology
- E-Print:
- New version of our work with additional numerical experiments