Combining predictors of natively unfolded proteins to detect a twilight zone between order and disorder in generic datasets
Abstract
Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. These proteins reasonably have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2009
- DOI:
- 10.48550/arXiv.0910.2903
- arXiv:
- arXiv:0910.2903
- Bibcode:
- 2009arXiv0910.2903D
- Keywords:
-
- Quantitative Biology - Biomolecules;
- Quantitative Biology - Genomics
- E-Print:
- The title has been changed to make more clear the content of the paper, and some previously misprinted formulas have been fixed. A slightly different version of this manuscript has been submitted to BMC Bioinformatics