Estimating the galaxy twopoint correlation function using a split random catalog
Abstract
The twopoint correlation function of the galaxy distribution is a key cosmological observable that allows us to constrain the dynamical and geometrical state of our Universe. To measure the correlation function we need to know both the galaxy positions and the expected galaxy density field. The expected field is commonly specified using a MonteCarlo sampling of the volume covered by the survey and, to minimize additional sampling errors, this random catalog has to be much larger than the data catalog. Correlation function estimators compare datadata pair counts to datarandom and randomrandom pair counts, where randomrandom pairs usually dominate the computational cost. Future redshift surveys will deliver spectroscopic catalogs of tens of millions of galaxies. Given the large number of random objects required to guarantee subpercent accuracy, it is of paramount importance to improve the efficiency of the algorithm without degrading its precision. We show both analytically and numerically that splitting the random catalog into a number of subcatalogs of the same size as the data catalog when calculating randomrandom pairs and excluding pairs across different subcatalogs provides the optimal error at fixed computational cost. For a random catalog fifty times larger than the data catalog, this reduces the computation time by a factor of more than ten without affecting estimator variance or bias.
 Publication:

Astronomy and Astrophysics
 Pub Date:
 November 2019
 DOI:
 10.1051/00046361/201935828
 arXiv:
 arXiv:1905.01133
 Bibcode:
 2019A&A...631A..73K
 Keywords:

 largescale structure of Universe;
 cosmology: observations;
 methods: statistical;
 methods: data analysis;
 Astrophysics  Cosmology and Nongalactic Astrophysics
 EPrint:
 11 pages, 6 figures