High-dimensional sliced inverse regression with endogeneity
Abstract
Sliced inverse regression (SIR) is a popular sufficient dimension reduction method that identifies a few linear transformations of the covariates without losing regression information with the response. In high-dimensional settings, SIR can be combined with sparsity penalties to achieve sufficient dimension reduction and variable selection simultaneously. Nevertheless, both classical and sparse estimators assume the covariates are exogenous. However, endogeneity can arise in a variety of situations, such as when variables are omitted or are measured with error. In this article, we show such endogeneity invalidates SIR estimators, leading to inconsistent estimation of the true central subspace. To address this challenge, we propose a two-stage Lasso SIR estimator, which first constructs a sparse high-dimensional instrumental variables model to obtain fitted values of the covariates spanned by the instruments, and then applies SIR augmented with a Lasso penalty on these fitted values. We establish theoretical bounds for the estimation and selection consistency of the true central subspace for the proposed estimators, allowing the number of covariates and instruments to grow exponentially with the sample size. Simulation studies and applications to two real-world datasets in nutrition and genetics illustrate the superior empirical performance of the two-stage Lasso SIR estimator compared with existing methods that disregard endogeneity and/or nonlinearity in the outcome model.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- arXiv:
- arXiv:2412.15530
- Bibcode:
- 2024arXiv241215530N
- Keywords:
-
- Statistics - Methodology
- E-Print:
- 63 pages