Self-supervised Representation Learning for Astronomical Images
Abstract
Sky surveys are the largest data generators in astronomy, making automated tools for extracting meaningful scientific information an absolute necessity. We show that, without the need for labels, self-supervised learning recovers representations of sky survey images that are semantically useful for a variety of scientific tasks. These representations can be directly used as features, or fine-tuned, to outperform supervised methods trained only on labeled data. We apply a contrastive learning framework on multiband galaxy photometry from the Sloan Digital Sky Survey (SDSS), to learn image representations. We then use them for galaxy morphology classification and fine-tune them for photometric redshift estimation, using labels from the Galaxy Zoo 2 data set and SDSS spectroscopy. In both downstream tasks, using the same learned representations, we outperform the supervised state-of-the-art results, and we show that our approach can achieve the accuracy of supervised models while using 2-4 times fewer labels for training. The codes, trained models, and data can be found at https://portal.nersc.gov/project/dasrepo/self-supervised-learning-sdss.
- Publication:
-
The Astrophysical Journal
- Pub Date:
- April 2021
- DOI:
- 10.3847/2041-8213/abf2c7
- arXiv:
- arXiv:2012.13083
- Bibcode:
- 2021ApJ...911L..33H
- Keywords:
-
- Sky surveys;
- Observational cosmology;
- Astronomical methods;
- Observational astronomy;
- Computational methods;
- 1464;
- 1146;
- 1043;
- 1145;
- 1965;
- Astrophysics - Instrumentation and Methods for Astrophysics;
- Computer Science - Artificial Intelligence
- E-Print:
- The codes, trained models, and data can be found at https://portal.nersc.gov/project/dasrepo/self-supervised-learning-sdss