Graph-based data integration predicts long-range regulatory interactions across the human genome
Abstract
Transcriptional regulation of gene expression is one of the main processes that affect cell diversification from a single set of genes. Regulatory proteins often interact with DNA regions located distally from the transcription start sites (TSS) of the genes. We developed a computational method that combines open chromatin and gene expression information for a large number of cell types to identify these distal regulatory elements. Our method builds correlation graphs for publicly available DNase-seq and exon array datasets with matching samples and uses graph-based methods to filter findings supported by multiple datasets and remove indirect interactions. The resulting set of interactions was validated with both anecdotal information of known long-range interactions and unbiased experimental data deduced from Hi-C and CAGE experiments. Our results provide a novel set of high-confidence candidate open chromatin regions involved in gene regulation, often located several Mb away from the TSS of their target gene.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2014
- DOI:
- 10.48550/arXiv.1404.7281
- arXiv:
- arXiv:1404.7281
- Bibcode:
- 2014arXiv1404.7281D
- Keywords:
-
- Quantitative Biology - Genomics
- E-Print:
- 19 pages, 7 figures, 2 tables + 12 pages supplementary material (4 figures, 7 tables)