Towards cosmological inference on unlabeled out-of-distribution HI observational data
Abstract
We present an approach that can be utilized in order to account for the covariate shift between two datasets of the same observable with different distributions, so as to improve the generalizability of a neural network model trained on in-distribution samples (IDs) when inferring cosmology at the field level on out-of-distribution samples (OODs) of {\it unknown labels}. We make use of HI maps from the two simulation suites in CAMELS, IllustrisTNG and SIMBA. We consider two different techniques, namely adversarial approach and optimal transport, to adapt a target network whose initial weights are those of a source network pre-trained on a labeled dataset. Results show that after adaptation, salient features that are extracted by source and target encoders are well aligned in the embedding space, indicating that the target encoder has learned the representations of the target domain via the adversarial training and optimal transport. Furthermore, in all scenarios considered in our analyses, the target encoder, which does not have access to any labels ($\Omega_{\rm m}$) during adaptation phase, is able to retrieve the underlying $\Omega_{\rm m}$ from out-of-distribution maps to a great accuracy of $R^{2}$ score $\ge$ 0.9, comparable to the performance of the source encoder trained in a supervised learning setup. We further test the viability of the techniques when only a few out-of-distribution instances are available and find that the target encoder still reasonably recovers the matter density. Our approach is critical in extracting information from upcoming large scale surveys.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2024
- DOI:
- 10.48550/arXiv.2411.10515
- arXiv:
- arXiv:2411.10515
- Bibcode:
- 2024arXiv241110515A
- Keywords:
-
- Astrophysics - Instrumentation and Methods for Astrophysics;
- Astrophysics - Cosmology and Nongalactic Astrophysics
- E-Print:
- 10 pages, 5 figures, 2 tables