An ExternalMemory Algorithm for String Graph Construction
Abstract
Some recent results have introduced externalmemory algorithms to compute selfindexes of a set of strings, mainly via computing the BurrowsWheeler Transform (BWT) of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to assemble a genome from a large set of much shorter samples extracted from the unknown genome. The approaches that are currently used to tackle this problem are memoryintensive. This fact does not bode well with the ongoing increase in the availability of genomic data. A data structure that is used in genome assembly is the string graph, where vertices correspond to samples and arcs represent two overlapping samples. In this paper we address an open problem: to design an externalmemory algorithm to compute the string graph.
 Publication:

arXiv eprints
 Pub Date:
 May 2014
 arXiv:
 arXiv:1405.7520
 Bibcode:
 2014arXiv1405.7520B
 Keywords:

 Computer Science  Data Structures and Algorithms;
 Quantitative Biology  Genomics