Statistical challenges in the analysis of sequence and structure data for the COVID-19 spike protein
Abstract
As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences into representative clusters. We then apply sampling methods to investigate possible changes to the S-protein's 3-D structure as a result of commonly observed mutations. While the increasing spread of D614G variants has been noted in other research, our results also show that the co-occurring mutations of D614G together with S477N or A222V may spread even more rapidly, as quantified by our model estimates.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2021
- DOI:
- 10.48550/arXiv.2101.02304
- arXiv:
- arXiv:2101.02304
- Bibcode:
- 2021arXiv210102304H
- Keywords:
-
- Statistics - Applications;
- Quantitative Biology - Biomolecules
- E-Print:
- 21 pages, 5 figures