Signal extraction and breakpoint identification for array CGH data using robust state space model
Abstract
Array comparative genomic hybridization(CGH) is a high resolution technique to assess DNA copy number variation. Identifying breakpoints where copy number changes will enhance the understanding of the pathogenesis of human diseases, such as cancers. However, the biological variation and experimental errors contained in array CGH data may lead to false positive identification of breakpoints. We propose a robust state space model for array CGH data analysis. The model consists of two equations: an observation equation and a state equation, in which both the measurement error and evolution error are specified to follow t-distributions with small degrees of freedom. The completely unspecified CGH profiles are estimated by a Markov Chain Monte Carlo(MCMC) algorithm. Breakpoints and outliers are identified by a novel backward selection procedure based on posterior draws of the CGH profiles. Compared to three other popular methods, our method demonstrates several desired features, including false positive rate control, robustness against outliers, and superior power of breakpoint detection. All these properties are illustrated using simulated and real datasets.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2012
- DOI:
- arXiv:
- arXiv:1201.5169
- Bibcode:
- 2012arXiv1201.5169Z
- Keywords:
-
- Statistics - Applications