Protein property prediction with uncertainties
Abstract
Reliable prediction of variant effects in proteins has seen considerable progress in recent years. The increasing availability of data in this regime has improved both the prediction performance and our ability to track progress in the field, measured in terms of prediction accuracy averaged over many datasets. For practical use in protein engineering, it is important that we can also provide reliable uncertainty estimates for our predictions, but such metrics are rarely reported. We here provide a Gaussian process regression model, Kermut, which obtains state-of-the-art performance for protein property prediction while also offering estimates of uncertainty through its posterior. We proceed by assessing the quality of these uncertainty estimates. Our results show that the model provides meaningful overall calibration, but that accurate instance-specific uncertainty quantification remains challenging. We hope that this will encourage future work in this promising direction.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2024
- DOI:
- 10.48550/arXiv.2407.00002
- arXiv:
- arXiv:2407.00002
- Bibcode:
- 2024arXiv240700002M
- Keywords:
-
- Quantitative Biology - Biomolecules;
- Computer Science - Machine Learning
- E-Print:
- 10 pages (33 in total with appendix), 3 figures (19 figures in total with appendix)