Haplotype-resolved de novo assembly with phased assembly graphs
Abstract
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a new de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five non-human datasets, including California redwood with a $\sim$30-gigabase hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2020
- DOI:
- 10.48550/arXiv.2008.01237
- arXiv:
- arXiv:2008.01237
- Bibcode:
- 2020arXiv200801237C
- Keywords:
-
- Quantitative Biology - Genomics;
- Quantitative Biology - Quantitative Methods
- E-Print:
- 11 pages, 3 figures, 3 tables