TATTER: A hypothesis testing tool for multi-dimensional data
Abstract
The two-sample hypothesis test quantifies whether distributions p and q are different, given the corresponding finite samples drawn from each. This problem appears in a legion of applications in astronomy, ranging from data mining to data analysis and inference. For decades, the Kolmogorov-Smirnov test has been astronomers' first choice to answer this question, but it has a major drawback, a generalization to multi-dimensional data sets is not straightforward. To fill this gap, we present a nonparametric estimator for comparing given multi-dimensional distributions drawn from them. This method employs a kernel function to construct an unbiased estimator of the Maximum Mean Discrepancy (MMD) distance between the two distributions that generated the observed data. We perform controlled numerical experiments in Gaussian, non-Gaussian, and multi-dimensional finite sample settings and test the performance of MMD estimator in each experiment. We then discuss some of the applications of this method in astronomy data analysis.
- Publication:
-
Astronomy and Computing
- Pub Date:
- January 2021
- DOI:
- Bibcode:
- 2021A&C....3400445F
- Keywords:
-
- Methods;
- Data analysis - methods;
- Statistical