Statistical inference with F-statistics when fitting simple models to high-dimensional data
Abstract
We study linear subset regression in the context of the high-dimensional overall model $y = \vartheta+\theta' z + \epsilon$ with univariate response $y$ and a $d$-vector of random regressors $z$, independent of $\epsilon$. Here, "high-dimensional" means that the number $d$ of available explanatory variables is much larger than the number $n$ of observations. We consider simple linear sub-models where $y$ is regressed on a set of $p$ regressors given by $x = M'z$, for some $d \times p$ matrix $M$ of full rank $p < n$. The corresponding simple model, i.e., $y=\alpha+\beta' x + e$, can be justified by imposing appropriate restrictions on the unknown parameter $\theta$ in the overall model; otherwise, this simple model can be grossly misspecified. In this paper, we establish asymptotic validity of the standard $F$-test on the surrogate parameter $\beta$, in an appropriate sense, even when the simple model is misspecified.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2019
- DOI:
- 10.48550/arXiv.1902.04304
- arXiv:
- arXiv:1902.04304
- Bibcode:
- 2019arXiv190204304L
- Keywords:
-
- Mathematics - Statistics Theory