The Multimodal Universe: 100 TB of Machine Learning Ready Astronomical Data
Abstract
We present the Multimodal Universe, a new framework collating over 100 TB of multimodal astronomical data for its first release, spanning images, spectra, time series, tabular and hyper-spectral data. This unified collection enables a wide variety of machine learning (ML) applications and research across astronomical domains. The dataset brings together observations from multiple surveys, facilities, and wavelength regimes, providing standardized access to diverse data types. By providing uniform access to this diverse data, the Multimodal Universe aims to accelerate the development of ML methods for observational astronomy that can work across the large differences in astronomical datasets. The framework is actively supported and is designed to be extended whilst enforcing minimal self consistent conventions making contributing data as simple and practical as possible.
- Publication:
-
Research Notes of the American Astronomical Society
- Pub Date:
- December 2024
- DOI:
- Bibcode:
- 2024RNAAS...8..301A
- Keywords:
-
- Astronomy data acquisition;
- Astronomy databases;
- 1860;
- 83