Inferring the shape of global epistasis
Abstract
How does an organism's genetic sequence govern its measurable characteristics? New technologies provide libraries of randomized sequences to study this relationship in unprecedented detail for proteins and other molecules. Deriving insight from these data is difficult, though, because the space of possible sequences is enormous, so even the largest experiments sample a tiny minority of sequences. Moreover, the effects of mutations may combine in unexpected ways. We present a statistical framework to analyze such mutagenesis data. The key assumption is that mutations contribute in a simple way to some unobserved trait, which is related to the observed trait by a nonlinear mapping. Analyzing three proteins, we show that this model is easily interpretable and yet fits the data remarkably well.
- Publication:
-
Proceedings of the National Academy of Science
- Pub Date:
- August 2018
- DOI:
- Bibcode:
- 2018PNAS..115E7550O