MetaPar: Metagenomic Sequence Assembly via Iterative Reclassification
Abstract
We introduce a parallel algorithmic architecture for metagenomic sequence assembly, termed MetaPar, which allows for significant reductions in assembly time and consequently enables the processing of large genomic datasets on computers with low memory usage. The gist of the approach is to iteratively perform read (re)classification based on phylogenetic marker genes and assembler outputs generated from random subsets of metagenomic reads. Once a sufficiently accurate classification within genera is performed, de novo metagenomic assemblers (such as Velvet or IDBA-UD) or reference based assemblers may be used for contig construction. We analyze the performance of MetaPar on synthetic data consisting of 15 randomly chosen species from the NCBI database through the effective gap and effective coverage metrics.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2013
- DOI:
- 10.48550/arXiv.1311.3932
- arXiv:
- arXiv:1311.3932
- Bibcode:
- 2013arXiv1311.3932K
- Keywords:
-
- Quantitative Biology - Quantitative Methods;
- Quantitative Biology - Genomics