High Dimensional Cluster Analysis Using Path Lengths
Abstract
A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimension ($N_{_{D}}>3$). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering techniques are used, including spectral clustering, however, new techniques are also introduced based on the path length between partitions that are connected to one another. A Line-Of-Sight algorithm is also developed for clustering. A test bank of 12 data sets with varying properties is used to expose the strengths and weaknesses of each technique. Finally, a robust clustering technique is discussed based on reaching a consensus among the multiple approaches, overcoming the weaknesses found individually.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2017
- DOI:
- 10.48550/arXiv.1710.04886
- arXiv:
- arXiv:1710.04886
- Bibcode:
- 2017arXiv171004886M
- Keywords:
-
- Physics - Data Analysis;
- Statistics and Probability;
- Computer Science - Data Structures and Algorithms
- E-Print:
- 52 pages, 94 figures