Lightning-fast adaptive immune receptor similarity search by symmetric deletion lookup
Abstract
An individual's adaptive immune receptor (AIR) repertoire records immune history due to the exquisite antigen specificity of AIRs. Reading this record requires computational approaches for inferring receptor function from sequence, as the diversity of possible receptor-antigen pairs vastly outstrips experimental knowledge. Identification of AIRs with similar sequence and thus putatively similar function is a common performance bottleneck in these approaches. Here, we benchmark the time complexity of five different algorithmic approaches to radius-based search for Levenshtein neighbors. We show that a symmetric deletion lookup approach, originally proposed for spell-checking, is particularly scalable. We then introduce XTNeighbor, a variant of this algorithm that can be massively parallelized on GPUs. For one million input sequences, XTNeighbor identifies all sequence neighbors that differ by up to two edits in seconds on commodity hardware, orders of magnitude faster than existing approaches. We also demonstrate how symmetric deletion lookup can speed up search with more complex sequence-similarity metrics such as TCRdist. Our contribution is poised to greatly speed up existing analysis pipelines and enable processing of large-scale immunosequencing data without downsampling.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2024
- DOI:
- 10.48550/arXiv.2403.09010
- arXiv:
- arXiv:2403.09010
- Bibcode:
- 2024arXiv240309010C
- Keywords:
-
- Quantitative Biology - Quantitative Methods;
- Quantitative Biology - Genomics
- E-Print:
- 13 pages, 8 figures