On Correlation and Prediction Interval Reduction
Abstract
Pearson's correlation coefficient is a popular statistical measure to summarize the strength of association between two continuous variables. It is usually interpreted via its square as percentage of variance of one variable predicted by the other in a linear regression model. It can be generalized for multiple regression via the coefficient of determination, which is not straightforward to interpret in terms of prediction accuracy. In this paper, we propose to assess the prediction accuracy of a linear model via the prediction interval reduction (PIR) by comparing the width of the prediction interval derived from this model with the width of the prediction interval obtained without this model. At the population level, PIR is one-to-one related to the correlation and the coefficient of determination. In particular, a correlation of 0.5 corresponds to a PIR of only 13%. It is also the one's complement of the coefficient of alienation introduced at the beginning of last century. We argue that PIR is easily interpretable and useful to keep in mind how difficult it is to make accurate individual predictions, an important message in the era of precision medicine and artificial intelligence. Different estimates of PIR are compared in the context of a linear model and an extension of the PIR concept to non-linear models is outlined.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2024
- DOI:
- 10.48550/arXiv.2405.04895
- arXiv:
- arXiv:2405.04895
- Bibcode:
- 2024arXiv240504895P
- Keywords:
-
- Statistics - Methodology
- E-Print:
- 19 pages, 2 figures