Parallel Pairwise Correlation Computation On Intel Xeon Phi Clusters
Abstract
Co-expression network is a critical technique for the identification of inter-gene interactions, which usually relies on all-pairs correlation (or similar measure) computation between gene expression profiles across multiple samples. Pearson's correlation coefficient (PCC) is one widely used technique for gene co-expression network construction. However, all-pairs PCC computation is computationally demanding for large numbers of gene expression profiles, thus motivating our acceleration of its execution using high-performance computing. In this paper, we present LightPCC, the first parallel and distributed all-pairs PCC computation on Intel Xeon Phi (Phi) clusters. It achieves high speed by exploring the SIMD-instruction-level and thread-level parallelism within Phis as well as accelerator-level parallelism among multiple Phis. To facilitate balanced workload distribution, we have proposed a general framework for symmetric all-pairs computation by building bijective functions between job identifier and coordinate space for the first time. We have evaluated LightPCC and compared it to two CPU-based counterparts: a sequential C++ implementation in ALGLIB and an implementation based on a parallel general matrix-matrix multiplication routine in Intel Math Kernel Library (MKL) (all use double precision), using a set of gene expression datasets. Performance evaluation revealed that with one 5110P Phi and 16 Phis, LightPCC runs up to $20.6\times$ and $218.2\times$ faster than ALGLIB, and up to $6.8\times$ and $71.4\times$ faster than single-threaded MKL, respectively. In addition, LightPCC demonstrated good parallel scalability in terms of number of Phis. Source code of LightPCC is publicly available at http://lightpcc.sourceforge.net.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2016
- DOI:
- 10.48550/arXiv.1605.01584
- arXiv:
- arXiv:1605.01584
- Bibcode:
- 2016arXiv160501584L
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing;
- Quantitative Biology - Genomics
- E-Print:
- 9 pages, 2 figures, 2 tables, accepted by the SBAC-PAD 2016 conference