A Note on Probabilistic Models over Strings: the Linear Algebra Approach
Abstract
Probabilistic models over strings have played a key role in developing methods allowing indels to be treated as phylogenetically informative events. There is an extensive literature on using automata and transducers on phylogenies to do inference on these probabilistic models, in which an important theoretical question in the field is the complexity of computing the normalization of a class of string-valued graphical models. This question has been investigated using tools from combinatorics, dynamic programming, and graph theory, and has practical applications in Bayesian phylogenetics. In this work, we revisit this theoretical question from a different point of view, based on linear algebra. The main contribution is a new proof of a known result on the complexity of inference on TKF91, a well-known probabilistic model over strings. Our proof uses a different approach based on classical linear algebra results, and is in some cases easier to extend to other models. The proving method also has consequences on the implementation and complexity of inference algorithms.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2013
- DOI:
- 10.48550/arXiv.1301.5054
- arXiv:
- arXiv:1301.5054
- Bibcode:
- 2013arXiv1301.5054B
- Keywords:
-
- Quantitative Biology - Populations and Evolution;
- Computer Science - Formal Languages and Automata Theory;
- Statistics - Computation
- E-Print:
- 17 pages, 7 figures