Multiple sequence alignment based on set covers
Abstract
We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2004
- DOI:
- 10.48550/arXiv.q-bio/0412021
- arXiv:
- arXiv:q-bio/0412021
- Bibcode:
- 2004q.bio....12021P
- Keywords:
-
- Quantitative Biology - Quantitative Methods
- E-Print:
- Lecture Notes in Computer Science 3907 (2006), 127-137