Inferring interactions from combinatorial protein libraries
Abstract
Proteins created by combinatorial methods in vitro are an important source of information for understanding sequence-structure-function relationships. Alignments of folded proteins from combinatorial libraries can be analyzed using methods developed for naturally occurring proteins, but this neglects the information contained in the unfolded sequences of the library. We introduce two algorithms, logistic regression and excess information analysis, that use both the folded and unfolded sequences and compare them against contingency table and statistical coupling analysis, which only use the former. The test set for this benchmark study is a library of fictitious proteins that fold according to a hypothetical energy model. Of the four methods studied, only logistic regression is able to correctly recapitulate the energy model from the sequence alignment. The other algorithms predict spurious interactions between alignment positions with strong but individual influences on protein stability. When present in the same protein, stabilizing amino acids tend to lower the energy below the threshold needed for folding. As a result, their frequencies in the alignment can be correlated even if the positions do not interact. We believe any algorithm that neglects the nonlinear relationship between folding and energy is susceptible to this error.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2005
- DOI:
- 10.48550/arXiv.q-bio/0505018
- arXiv:
- arXiv:q-bio/0505018
- Bibcode:
- 2005q.bio.....5018E
- Keywords:
-
- Quantitative Biology - Biomolecules
- E-Print:
- 21 pages, 2 figures