Modified Multidimensional Scaling and High Dimensional Clustering
Abstract
Multidimensional scaling is an important dimension reduction tool in statistics and machine learning. Yet few theoretical results characterizing its statistical performance exist, not to mention any in high dimensions. By considering a unified framework that includes low, moderate and high dimensions, we study multidimensional scaling in the setting of clustering noisy data. Our results suggest that, the classical multidimensional scaling can be modified to further improve the quality of embedded samples, especially when the noise level increases. To this end, we propose {\it modified multidimensional scaling} which applies a nonlinear transformation to the sample eigenvalues. The nonlinear transformation depends on the dimensionality, sample size and moment of noise. We show that modified multidimensional scaling followed by various clustering algorithms can achieve exact recovery, i.e., all the cluster labels can be recovered correctly with probability tending to one. Numerical simulations and two real data applications lend strong support to our proposed methodology.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2018
- DOI:
- arXiv:
- arXiv:1810.10172
- Bibcode:
- 2018arXiv181010172D
- Keywords:
-
- Statistics - Methodology;
- Computer Science - Machine Learning;
- Mathematics - Statistics Theory;
- Statistics - Machine Learning
- E-Print:
- This paper will be subsumed by another paper