Two-sample test based on Self-Organizing Maps
Abstract
Machine-learning classifiers can be leveraged as a two-sample statistical test. Suppose each sample is assigned a different label and that a classifier can obtain a better-than-chance result discriminating them. In this case, we can infer that both samples originate from different populations. However, many types of models, such as neural networks, behave as a black-box for the user: they can reject that both samples originate from the same population, but they do not offer insight into how both samples differ. Self-Organizing Maps are a dimensionality reduction initially devised as a data visualization tool that displays emergent properties, being also useful for classification tasks. Since they can be used as classifiers, they can be used also as a two-sample statistical test. But since their original purpose is visualization, they can also offer insights.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2022
- DOI:
- 10.48550/arXiv.2212.08960
- arXiv:
- arXiv:2212.08960
- Bibcode:
- 2022arXiv221208960A
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Neural and Evolutionary Computing;
- 62-08;
- G.3
- E-Print:
- 27 pages