Selective machine learning of doubly robust functionals
Abstract
While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a selective machine learning framework for making inferences about a finite-dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function and several candidate machine learning algorithms are available for estimating the nuisance parameters. We introduce a new selection criterion aimed at bias reduction in estimating the functional of interest based on a novel definition of pseudo-risk inspired by the double robustness property. Intuitively, the proposed criterion selects a pair of learners with the smallest pseudo-risk, so that the estimated functional is least sensitive to perturbations of a nuisance parameter. We establish an oracle property for a multi-fold cross-validation version of the new selection criterion which states that our empirical criterion performs nearly as well as an oracle with a priori knowledge of the pseudo-risk for each pair of candidate learners. Finally, we apply the approach to model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learners to account for confounding in an observational study which we illustrate in simulations and a data application.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2019
- DOI:
- 10.48550/arXiv.1911.02029
- arXiv:
- arXiv:1911.02029
- Bibcode:
- 2019arXiv191102029C
- Keywords:
-
- Statistics - Methodology;
- Mathematics - Statistics Theory;
- Statistics - Machine Learning
- E-Print:
- To appear in Biometrika