Covariate Selection Based on a Assumptonfree Approach to Linear Regression with Exact Probabilities
Abstract
In this paper we give a completely new approach to the problem of covariate selection in linear regression. A covariate or a set of covariates is included only if it is better in the sense of least squares than the same number of Gaussian covariates consisting of i.i.d. $N(0,1)$ random variables. The Gaussian Pvalue is defined as the probability that the Gaussian covariates are better. It is given in terms of the Beta distribution, it is exact and it holds for all data. The covariate selection procedures based on this require only a cutoff value $\alpha$ for the Gaussian Pvalue: the default value in this paper is $\alpha=0.01$. The resulting procedures are very simple, very fast, do not overfit and require only least squares. In particular there is no regularization parameter, no data splitting, no use of simulations, no shrinkage and no post selection inference is required. The paper includes the results of simulations, applications to real data sets and theorems on the asymptotic behaviour under the standard linear model. Here the stepwise procedure performs overwhelmingly better than any other procedure we are aware of. An Rpackage {\it gausscov} is available.
 Publication:

arXiv eprints
 Pub Date:
 June 2019
 arXiv:
 arXiv:1906.01990
 Bibcode:
 2019arXiv190601990D
 Keywords:

 Statistics  Methodology;
 Statistics  Applications;
 6207 62J05
 EPrint:
 32 pages, 3 figures