Post-selection inference with HSIC-Lasso
Abstract
Detecting influential features in non-linear and/or high-dimensional data is a challenging and increasingly important task in machine learning. Variable selection methods have thus been gaining much attention as well as post-selection inference. Indeed, the selected features can be significantly flawed when the selection procedure is not accounted for. We propose a selective inference procedure using the so-called model-free "HSIC-Lasso" based on the framework of truncated Gaussians combined with the polyhedral lemma. We then develop an algorithm, which allows for low computational costs and provides a selection of the regularisation parameter. The performance of our method is illustrated by both artificial and real-world data based experiments, which emphasise a tight control of the type-I error, even for small sample sizes.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2020
- DOI:
- 10.48550/arXiv.2010.15659
- arXiv:
- arXiv:2010.15659
- Bibcode:
- 2020arXiv201015659F
- Keywords:
-
- Mathematics - Statistics Theory;
- Statistics - Machine Learning
- E-Print:
- Changes to previous version: * Incorporating comments and remarks from reviewers * Evaluation of power of the proposed method * Summarising behaviour for different hyper-parameters in one paragraph, instead of several figures * Pseudocode of the algorithm * Additional, in-depth experiment on real-world data