Model-free Feature Screening and FDR Control with Knockoff Features
Abstract
This paper proposes a model-free and data-adaptive feature screening method for ultra-high dimensional datasets. The proposed method is based on the projection correlation which measures the dependence between two random vectors. This projection correlation based method does not require specifying a regression model and applies to the data in the presence of heavy-tailed errors and multivariate response. It enjoys both sure screening and rank consistency properties under weak assumptions. Further, a two-step approach is proposed to control the false discovery rate (FDR) in feature screening with the help of knockoff features. It can be shown that the proposed two-step approach enjoys both sure screening and FDR control if the pre-specified FDR level $\alpha$ is greater or equal to $1/s$, where $s$ is the number of active features. The superior empirical performance of the proposed methods is justified by various numerical experiments and real data applications.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2019
- DOI:
- 10.48550/arXiv.1908.06597
- arXiv:
- arXiv:1908.06597
- Bibcode:
- 2019arXiv190806597L
- Keywords:
-
- Statistics - Methodology;
- Statistics - Machine Learning
- E-Print:
- doi:10.1080/01621459.2020.1783274