A sparse regression approach to modelling the relation between galaxy stellar masses and their host haloes
Sparse regression algorithms have been proposed as the appropriate framework to model the governing equations of a system from data, without needing prior knowledge of the underlying physics. In this work, we use sparse regression to build an accurate and explainable model of the stellar mass of central galaxies given properties of their host dark matter (DM) halo. Our data set comprises 9521 central galaxies from the EAGLE hydrodynamic simulation. By matching the host haloes to a DM-only simulation, we collect the halo mass and specific angular momentum at present time and for their main progenitors in 10 redshift bins from z = 0 to z = 4. The principal component of our governing equation is a third-order polynomial of the host halo mass, which models the stellar-mass-halo-mass relation. The scatter about this relation is driven by the halo mass evolution and is captured by second- and third-order correlations of the halo mass evolution with the present halo mass. An advantage of sparse regression approaches is that unnecessary terms are removed. Although we include information on halo specific angular momentum, these parameters are discarded by our methodology. This suggests that halo angular momentum has little connection to galaxy formation efficiency. Our model has a root mean square error (RMSE) of 0.167log10(M*/M⊙), and accurately reproduces both the stellar mass function and central galaxy correlation function of EAGLE. The methodology appears to be an encouraging approach for populating the haloes of DM-only simulations with galaxies, and we discuss the next steps that are required.