Supervised Dimensionality Reduction and Visualization using Centroid-encoder
Abstract
Visualizing high-dimensional data is an essential task in Data Science and Machine Learning. The Centroid-Encoder (CE) method is similar to the autoencoder but incorporates label information to keep objects of a class close together in the reduced visualization space. CE exploits nonlinearity and labels to encode high variance in low dimensions while capturing the global structure of the data. We present a detailed analysis of the method using a wide variety of data sets and compare it with other supervised dimension reduction techniques, including NCA, nonlinear NCA, t-distributed NCA, t-distributed MCML, supervised UMAP, supervised PCA, Colored Maximum Variance Unfolding, supervised Isomap, Parametric Embedding, supervised Neighbor Retrieval Visualizer, and Multiple Relational Embedding. We empirically show that centroid-encoder outperforms most of these techniques. We also show that when the data variance is spread across multiple modalities, centroid-encoder extracts a significant amount of information from the data in low dimensional space. This key feature establishes its value to use it as a tool for data visualization.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2020
- DOI:
- 10.48550/arXiv.2002.11934
- arXiv:
- arXiv:2002.11934
- Bibcode:
- 2020arXiv200211934G
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Computer Vision and Pattern Recognition;
- Statistics - Machine Learning
- E-Print:
- 25 pages (including 3 reference pages), 12 figures. I am planning to submit the paper to JMLR very soon. Centroid-encoder was applied on a biological pathway data (https://www.sciencedirect.com/science/article/pii/S1046202317300439). In this paper we throughly analyzed the algorithm and compared it with state-of-the art techniques on a 8 data sets including MNIST, USPS