Energy-Based Models Capture Pairwise and Higher-Order Interactions in Protein Sequence Data
Abstract
Understanding protein structure, evolution and function requires reliable inference of interacting units in folded proteins. Here we present a unifying approach for inferring two of the most important structural units of proteins: pairwise contacts, and higher-order strongly correlated units, known as sectors. Our method is a hybrid energy-based model, combining a pairwise-energy term, as used in state-of-the-art Direct Coupling Analysis, and a Restricted Boltzmann Machine (RBM) term, meant to capture higher order interactions. We show that, when trained on data from a biologically-informed ground truth model, our algorithms can learn both the pairwise and higher-order structure and are robust to varying levels of undersampling and strength of interactions in the ground truth distribution. We carry out the analysis for 2-spin and 10-spin systems with Minimum Probability Flow and Ratio Matching algorithms, respectively. We comment on why the RBM is successful at modeling the higher-order interactions and why certain choices for hyperparameters (number of hidden units in the RBM, regularization strength) lend themselves to the model's feature detection capabilities.
Supported in part by the National Science Foundation, through the Center for the Physics of Biological Function (PHY-1734030), by the National Institutes of Health BRAIN initiative (R01EB026943-01), by the Simons Foundation, and by the Sloan Foundation.- Publication:
-
APS March Meeting Abstracts
- Pub Date:
- March 2022
- Bibcode:
- 2022APS..MARA03010F