A new classification framework for high-dimensional data
Abstract
Classification, a fundamental problem in many fields, faces significant challenges when handling a large number of features, a scenario commonly encountered in modern applications, such as identifying tumor subtypes from genomic data or categorizing customer attitudes based on online reviews. We propose a novel framework that utilizes the ranks of pairwise distances among observations and identifies consistent patterns in moderate- to high- dimensional data, which previous methods have overlooked. The proposed method exhibits superior performance across a variety of scenarios, from high-dimensional data to network data. We further explore a typical setting to investigate key quantities that play essential roles in our framework, which reveal the framework's capabilities in distinguishing differences in the first and/or second moment, as well as distinctions in higher moments.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2023
- DOI:
- arXiv:
- arXiv:2306.15199
- Bibcode:
- 2023arXiv230615199M
- Keywords:
-
- Statistics - Methodology