Bayesian Inference for Tumor Subclones Accounting for Sequencing and Structural Variants
Abstract
Tumor samples are heterogeneous. They consist of different subclones that are characterized by differences in DNA nucleotide sequences and copy numbers on multiple loci. Heterogeneity can be measured through the identification of the subclonal copy number and sequence at a selected set of loci. Understanding that the accurate identification of variant allele fractions greatly depends on a precise determination of copy numbers, we develop a Bayesian feature allocation model for jointly calling subclonal copy numbers and the corresponding allele sequences for the same loci. The proposed method utilizes three random matrices, L, Z and w to represent subclonal copy numbers (L), numbers of subclonal variant alleles (Z) and cellular fractions of subclones in samples (w), respectively. The unknown number of subclones implies a random number of columns for these matrices. We use next-generation sequencing data to estimate the subclonal structures through inference on these three matrices. Using simulation studies and a real data analysis, we demonstrate how posterior inference on the subclonal structure is enhanced with the joint modeling of both structure and sequencing variants on subclonal genomes. Software is available at http://compgenome.org/BayClone2.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2014
- DOI:
- arXiv:
- arXiv:1409.7158
- Bibcode:
- 2014arXiv1409.7158L
- Keywords:
-
- Statistics - Methodology;
- Quantitative Biology - Genomics
- E-Print:
- 26 pages, 11 figures