Covariance Matrix Estimation for High-Throughput Biomedical Data with Interconnected Communities
Abstract
Estimating a covariance matrix is central to high-dimensional data analysis. Empirical analyses of high-dimensional biomedical data, including genomics, proteomics, microbiome, and neuroimaging, among others, consistently reveal strong modularity in the dependence patterns. In these analyses, intercorrelated high-dimensional biomedical features often form communities or modules that can be interconnected with others. While the interconnected community structure has been extensively studied in biomedical research (e.g., gene co-expression networks), its potential to assist in the estimation of covariance matrices remains largely unexplored. To address this gap, we propose a procedure that leverages the commonly observed interconnected community structure in high-dimensional biomedical data to estimate large covariance and precision matrices. We derive the uniformly minimum-variance unbiased estimators for covariance and precision matrices in closed forms and provide theoretical results on their asymptotic properties. Our proposed method enhances the accuracy of covariance- and precision-matrix estimation and demonstrates superior performance compared to the competing methods in both simulations and real data analyses.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2023
- DOI:
- 10.48550/arXiv.2302.01861
- arXiv:
- arXiv:2302.01861
- Bibcode:
- 2023arXiv230201861Y
- Keywords:
-
- Statistics - Methodology