Automated design of collective variables using supervised machine learning
Abstract
Selection of appropriate collective variables (CVs) for enhancing sampling of molecular simulations remains an unsolved problem in computational modeling. In particular, picking initial CVs is particularly challenging in higher dimensions. Which atomic coordinates or transforms there of from a list of thousands should one pick for enhanced sampling runs? How does a modeler even begin to pick starting coordinates for investigation? This remains true even in the case of simple two state systems and only increases in difficulty for multi-state systems. In this work, we solve the "initial" CV problem using a data-driven approach inspired by the field of supervised machine learning (SML). In particular, we show how the decision functions in SML algorithms can be used as initial CVs (SMLcv) for accelerated sampling. Using solvated alanine dipeptide and Chignolin mini-protein as our test cases, we illustrate how the distance to the support vector machines' decision hyperplane, the output probability estimates from logistic regression, the outputs from shallow or deep neural network classifiers, and other classifiers may be used to reversibly sample slow structural transitions. We discuss the utility of other SML algorithms that might be useful for identifying CVs for accelerating molecular simulations.
- Publication:
-
Journal of Chemical Physics
- Pub Date:
- September 2018
- DOI:
- 10.1063/1.5029972
- arXiv:
- arXiv:1802.10510
- Bibcode:
- 2018JChPh.149i4106S
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Computational Engineering;
- Finance;
- and Science;
- Quantitative Biology - Biomolecules
- E-Print:
- 26 pages, 11 figures