SIGLE: a valid procedure for Selective Inference with the Generalized Linear Lasso

doi:10.48550/arXiv.2203.15348

SIGLE: a valid procedure for Selective Inference with the Generalized Linear Lasso

This article investigates uncertainty quantification of the generalized linear lasso~(GLL), a popular variable selection method in high-dimensional regression settings. In many fields of study, researchers use data-driven methods to select a subset of variables that are most likely to be associated with a response variable. However, such variable selection methods can introduce bias and increase the likelihood of false positives, leading to incorrect conclusions. In this paper, we propose a post-selection inference framework that addresses these issues and allows for valid statistical inference after variable selection using GLL. We show that our method provides accurate $p$-values and confidence intervals, while maintaining high statistical power. In a second stage, we focus on the sparse logistic regression, a popular classifier in high-dimensional statistics. We show with extensive numerical simulations that SIGLE is more powerful than state-of-the-art PSI methods. SIGLE relies on a new method to sample states from the distribution of observations conditional on the selection event. This method is based on a simulated annealing strategy whose energy is given by the first order conditions of the logistic lasso.

Publication:

arXiv e-prints

Pub Date:

March 2022

DOI:

10.48550/arXiv.2203.15348

arXiv:

arXiv:2203.15348

Bibcode:

2022arXiv220315348D

Keywords:

Mathematics - Statistics Theory;
Statistics - Methodology

E-Print:

New version of our work with additional numerical experiments

NASA/ADS

SIGLE: a valid procedure for Selective Inference with the Generalized Linear Lasso

Abstract