Efficient algorithms for the sensitivities of the Pearson correlation coefficient and its statistical significance to online data
Abstract
Reliably measuring the collinearity of bivariate data is crucial in statistics, particularly for time-series analysis or ongoing studies in which incoming observations can significantly impact current collinearity estimates. Leveraging identities from Welford's online algorithm for sample variance, we develop a rigorous theoretical framework for analyzing the maximal change to the Pearson correlation coefficient and its p-value that can be induced by additional data. Further, we show that the resulting optimization problems yield elegant closed-form solutions that can be accurately computed by linear- and constant-time algorithms. Our work not only creates new theoretical avenues for robust correlation measures, but also has broad practical implications for disciplines that span econometrics, operations research, clinical trials, climatology, differential privacy, and bioinformatics. Software implementations of our algorithms in Cython-wrapped C are made available at https://github.com/marc-harary/sensitivity for reproducibility, practical deployment, and future theoretical development.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2024
- DOI:
- 10.48550/arXiv.2405.14686
- arXiv:
- arXiv:2405.14686
- Bibcode:
- 2024arXiv240514686H
- Keywords:
-
- Statistics - Methodology;
- Mathematics - Statistics Theory
- E-Print:
- Fixed minor typos and added new citations