The conditionality principle in high-dimensional regression
Abstract
Consider a high-dimensional linear regression problem, where the number of covariates is larger than the number of observations and the interest is in estimating the conditional variance of the response variable given the covariates. A conditional and unconditioned framework are considered, where conditioning is with respect to the covariates, which are ancillary to the parameter of interest. In recent papers, a consistent estimator was developed in the unconditional framework when the marginal distribution of the covariates is normal with known mean and variance. In the present work, a certain Bayesian hypothesis test is formulated under the conditional framework, and it is shown that the Bayes risk is a constant. This implies that no consistent estimator exists in the conditional framework. However, when the marginal distribution of the covariates is normal, the conditional error of the above consistent estimator converges to zero, with probability converging to one. It follows that even in the conditional setting, information about the marginal distribution of an ancillary statistic may have a significant impact on statistical inference. The practical implication in the context of high-dimensional regression models is that additional observations, where only the covariates are given, are potentially very useful and should not be ignored. This finding is most relevant to semi-supervised learning problems where covariate information is easy to obtain.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2018
- DOI:
- 10.48550/arXiv.1806.10008
- arXiv:
- arXiv:1806.10008
- Bibcode:
- 2018arXiv180610008A
- Keywords:
-
- Mathematics - Statistics Theory