copMEM: Finding maximal exact matches via sampling both genomes
Abstract
Genome-to-genome comparisons require designating anchor points, which are given by Maximum Exact Matches (MEMs) between their sequences. For large genomes this is a challenging problem and the performance of existing solutions, even in parallel regimes, is not quite satisfactory. We present a new algorithm, copMEM, that allows to sparsely sample both input genomes, with sampling steps being coprime. Despite being a single-threaded implementation, copMEM computes all MEMs of minimum length 100 between the human and mouse genomes in less than 2 minutes, using less than 10 GB of RAM memory.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2018
- DOI:
- 10.48550/arXiv.1805.08816
- arXiv:
- arXiv:1805.08816
- Bibcode:
- 2018arXiv180508816G
- Keywords:
-
- Computer Science - Data Structures and Algorithms;
- Quantitative Biology - Genomics;
- 68W32;
- F.2.2
- E-Print:
- The source code of copMEM is freely available at https://github.com/wbieniec/copmem. Contact: wbieniec@kis.p.lodz.pl, wbieniec@kis.p.lodz.pl