The leaveonecovariateout conditional randomization test
Abstract
Conditional independence testing is an important problem, yet provably hard without assumptions. One of the assumptions that has become popular of late is called "modelX", where we assume we know the joint distribution of the covariates, but assume nothing about the conditional distribution of the outcome given the covariates. Knockoffs is a popular methodology associated with this framework, but it suffers from two main drawbacks: only onebit $p$values are available for inference on each variable, and the method is randomized with significant variability across runs in practice. The conditional randomization test (CRT) is thought to be the "right" solution under modelX, but usually viewed as computationally inefficient. This paper proposes a computationally efficient leaveonecovariateout (LOCO) CRT that addresses both drawbacks of knockoffs. LOCO CRT produces valid $p$values that can be used to control the familywise error rate, and has nearly zero algorithmic variability. For L1 regularized Mestimators, we develop an even faster variant called L1ME CRT, which reuses computation by leveraging a novel observation about the stability of the crossvalidated lasso to removing inactive variables. Last, for multivariate Gaussian covariates, we present a closed form expression for the LOCO CRT $p$value, thus completely eliminating resampling in this important special case.
 Publication:

arXiv eprints
 Pub Date:
 June 2020
 arXiv:
 arXiv:2006.08482
 Bibcode:
 2020arXiv200608482K
 Keywords:

 Statistics  Methodology;
 Statistics  Machine Learning
 EPrint:
 This paper has been withdrawn by the authors, because it has now been merged with (and superseded by) a parallel work arXiv:2006.03980 by Molei Liu and Lucas Janson