Nonparametric Bayes Differential Analysis of Multigroup DNA Methylation Data
Abstract
DNA methylation datasets in cancer studies are comprised of measurements on a large number of genomic locations called cytosine-phosphate-guanine (CpG) sites with complex correlation structures. A fundamental goal of these studies is the development of statistical techniques that can identify disease genomic signatures across multiple patient groups defined by different experimental or biological conditions. We propose BayesDiff, a nonparametric Bayesian approach for differential analysis relying on a novel class of first order mixture models called the Sticky Pitman-Yor process or two-restaurant two-cuisine franchise (2R2CF). The BayesDiff methodology flexibly utilizes information from all CpG sites or probes, adaptively accommodates any serial dependence due to the widely varying inter-probe distances and performs simultaneous inferences about the differential genomic signature of the patient groups. Using simulation studies, we demonstrate the effectiveness of the BayesDiff procedure relative to existing statistical techniques for differential DNA methylation. The methodology is applied to analyze a gastrointestinal (GI) cancer dataset that displays both serial correlations and interaction patterns. The results support and complement known aspects of DNA methylation and gene association in upper GI cancers.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2022
- DOI:
- 10.48550/arXiv.2204.04840
- arXiv:
- arXiv:2204.04840
- Bibcode:
- 2022arXiv220404840G
- Keywords:
-
- Statistics - Methodology;
- Statistics - Applications