Meraculous-2D: Haplotype-sensitive Assembly of Highly Heterozygous genomes
Abstract
While many short read assemblers attempt to simplify the de Brujin graph by identifying and resolving variant-induced bubbles to produce a haploid mosaic result, this approach is only viable when variants are relatively rare and the bubbles are well defined in a graph context. We observed that diploid genomes with very high levels of heterozygosity fail to display well-resolved bubble structures in a typical assembly graph and thus result in highly fragmented and incomplete assemblies. Here we present an enhancement of Meraculous2 algorithm, called Meraculous-2D, which preserves haplotypes across variant sites and generates accurate assembly of highly heterozygous diploid genomes. Preserving and taking advantage of the allelic variation throughout the assembly process allows reconstructing both haplomes at once, without the need to pick arbitrary paths through bubble structures. We also enhanced the original diploidy resolution method of Meraculous2 to maintain and report phased haplotype variant information.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2017
- DOI:
- 10.48550/arXiv.1703.09852
- arXiv:
- arXiv:1703.09852
- Bibcode:
- 2017arXiv170309852G
- Keywords:
-
- Quantitative Biology - Genomics
- E-Print:
- Availability: Meraculous-2D is available under the GNU General Public License from https://sourceforge.net/projects/meraculous20/