The RNA Newton Polytope and Learnability of Energy Parameters
Abstract
Despite nearly two scores of research on RNA secondary structure and RNARNA interaction prediction, the accuracy of the stateoftheart algorithms are still far from satisfactory. Researchers have proposed increasingly complex energy models and improved parameter estimation methods in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. In this paper, we introduce the notion of learnability of the parameters of an energy model as a measure of its inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. We demonstrated the application of our theory to a simple energy model consisting of a weighted count of AU and CG base pairs. Our results show that this simple energy model satisfies the necessary condition for less than one third of the input unpseudoknotted sequencestructure pairs chosen from the RNA STRAND v2.0 database. For another one third, the necessary condition is barely violated, which suggests that augmenting this simple energy model with more features such as the Turner loops may solve the problem. The necessary condition is severely violated for 8%, which provides a small set of hard cases that require further investigation.
 Publication:

arXiv eprints
 Pub Date:
 January 2013
 arXiv:
 arXiv:1301.1608
 Bibcode:
 2013arXiv1301.1608F
 Keywords:

 Quantitative Biology  Biomolecules;
 Computer Science  Computational Engineering;
 Finance;
 and Science;
 Computer Science  Machine Learning