Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
Abstract
Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrödinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.
- Publication:
-
Nature Communications
- Pub Date:
- July 2022
- DOI:
- 10.1038/s41467-022-31915-y
- arXiv:
- arXiv:2110.08406
- Bibcode:
- 2022NatCo..13.4223L
- Keywords:
-
- Computer Science - Machine Learning;
- Condensed Matter - Materials Science;
- Physics - Applied Physics;
- Physics - Optics
- E-Print:
- 21 pages, 10 figures