On a Generalization of the Average Distance Classifier
Abstract
In high dimension, low sample size (HDLSS)settings, the simple average distance classifier based on the Euclidean distance performs poorly if differences between the locations get masked by the scale differences. To rectify this issue, modifications to the average distance classifier was proposed by Chan and Hall (2009). However, the existing classifiers cannot discriminate when the populations differ in other aspects than locations and scales. In this article, we propose some simple transformations of the average distance classifier to tackle this issue. The resulting classifiers perform quite well even when the underlying populations have the same location and scale. The high-dimensional behaviour of the proposed classifiers is studied theoretically. Numerical experiments with a variety of simulated as well as real data sets exhibit the usefulness of the proposed methodology.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2020
- DOI:
- 10.48550/arXiv.2001.02430
- arXiv:
- arXiv:2001.02430
- Bibcode:
- 2020arXiv200102430R
- Keywords:
-
- Statistics - Methodology;
- Statistics - Machine Learning
- E-Print:
- Short version