Predicting the neutral hydrogen content of galaxies from optical data using machine learning
Abstract
We develop a machine learning-based framework to predict the H I content of galaxies from optical photometry and environmental parameters. We train the algorithm on z = 0-2 outputs from the MUFASA cosmological hydrodynamic simulation, which includes star formation, feedback, and a heuristic model to quench massive galaxies that yields a reasonable match to a range of survey data including H I. We employ a variety of machine learning methods (regressors), and quantify their performance using the slope of the predicted versus true relation, its root mean square error (RMSE), and Pearson correlation coefficient ({r}). Training on only Sloan Digital Sky Survey photometry, all regressors give {r}> 0.8 and RMSE ∼ 0.3 at z = 0, led by random forests with {r}=0.91, and a deep neural network (DNN) with comparable accuracy ({r}=0.9). Adding near-IR photometry improves all regressors. All regressors perform worse with redshift, particularly at z ≳ 1. Slope values are generally sub-linear, so that we overpredict H I in H I-poor galaxies and underpredict H I rich, because the regressors do not fully capture the scatter in the data. We test our framework on REsolved Spectroscopy Of a Local VolumE (RESOLVE) and Arecibo Legacy Fast ALFA (ALFALFA) survey data. Training on a subset of the observations, we find that our machine learning method can reasonably predict H I richnesses in the remaining data (RMSE ∼ 0.28 for RESOLVE and ∼0.25 for ALFALFA). Training on mock data from MUFASA to predict observed data is worse (RMSE ∼ 0.45 for RESOLVE and 0.31 for ALFALFA), with DNN well outperforming other regressors. Our method will be useful for making galaxy-by-galaxy survey predictions and incompleteness corrections for upcoming H I 21 cm surveys on Square Kilometre Array precursors such as MeerKAT, over regions where photometry is already available.
- Publication:
-
Monthly Notices of the Royal Astronomical Society
- Pub Date:
- October 2018
- DOI:
- 10.1093/mnras/sty1777
- arXiv:
- arXiv:1803.08334
- Bibcode:
- 2018MNRAS.479.4509R
- Keywords:
-
- methods: numerical;
- galaxies: evolution;
- galaxies: statistics;
- Astrophysics - Astrophysics of Galaxies
- E-Print:
- 16 pages, 11 figures, 1 table