Information Criteria for Deciding between Normal Regression Models
Abstract
Regression models fitted to data can be assessed on their goodness of fit, though models with many parameters should be disfavored to prevent over-fitting. Statisticians' tools for this are little known to physical scientists. These include the Akaike Information Criterion (AIC), a penalized goodness-of-fit statistic, and the AICc, a variant including a small-sample correction. They entered the physical sciences through being used by astrophysicists to compare cosmological models; e.g., predictions of the distance-redshift relation. The AICc is shown to have been misapplied, being applicable only if error variances are unknown. If error bars accompany the data, the AIC should be used instead. Erroneous applications of the AICc are listed in an appendix. It is also shown how the variability of the AIC difference between models with a known error variance can be estimated. This yields a significance test that can potentially replace the use of `Akaike weights' for deciding between such models. Additionally, the effects of model misspecification are examined. For regression models fitted to data sets without (rather than with) error bars, they are major: the AICc may be shifted by an unknown amount. The extent of this in the fitting of physical models remains to be studied.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2013
- DOI:
- arXiv:
- arXiv:1305.5493
- Bibcode:
- 2013arXiv1305.5493M
- Keywords:
-
- Statistics - Methodology;
- Astrophysics - Instrumentation and Methods for Astrophysics;
- Physics - Data Analysis;
- Statistics and Probability;
- 62F07 (Primary) 62B10;
- 62J05;
- 83F05
- E-Print:
- 27 pages, margins fixed