Indexing Variation Graphs
Abstract
Variation graphs, which represent genetic variation within a population, are replacing sequences as reference genomes. Path indexes are one of the most important tools for working with variation graphs. They generalize text indexes to graphs, allowing one to find the paths matching the query string. We propose using de Bruijn graphs as path indexes, compressing them by merging redundant subgraphs, and encoding them with the Burrows-Wheeler transform. The resulting fast, space-efficient, and versatile index is used in the variation graph toolkit vg.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2016
- DOI:
- 10.48550/arXiv.1604.06605
- arXiv:
- arXiv:1604.06605
- Bibcode:
- 2016arXiv160406605S
- Keywords:
-
- Computer Science - Data Structures and Algorithms
- E-Print:
- Proc. ALENEX 2017. The implementation is available at https://github.com/jltsiren/gcsa2