Supervised Dimensionality Reduction and Visualization using Centroid-encoder

doi:10.48550/arXiv.2002.11934

Supervised Dimensionality Reduction and Visualization using Centroid-encoder

Visualizing high-dimensional data is an essential task in Data Science and Machine Learning. The Centroid-Encoder (CE) method is similar to the autoencoder but incorporates label information to keep objects of a class close together in the reduced visualization space. CE exploits nonlinearity and labels to encode high variance in low dimensions while capturing the global structure of the data. We present a detailed analysis of the method using a wide variety of data sets and compare it with other supervised dimension reduction techniques, including NCA, nonlinear NCA, t-distributed NCA, t-distributed MCML, supervised UMAP, supervised PCA, Colored Maximum Variance Unfolding, supervised Isomap, Parametric Embedding, supervised Neighbor Retrieval Visualizer, and Multiple Relational Embedding. We empirically show that centroid-encoder outperforms most of these techniques. We also show that when the data variance is spread across multiple modalities, centroid-encoder extracts a significant amount of information from the data in low dimensional space. This key feature establishes its value to use it as a tool for data visualization.

Publication:

arXiv e-prints

Pub Date:

February 2020

DOI:

10.48550/arXiv.2002.11934

arXiv:

arXiv:2002.11934

Bibcode:

2020arXiv200211934G

Keywords:

Computer Science - Machine Learning;
Computer Science - Computer Vision and Pattern Recognition;
Statistics - Machine Learning

E-Print:

25 pages (including 3 reference pages), 12 figures. I am planning to submit the paper to JMLR very soon. Centroid-encoder was applied on a biological pathway data (https://www.sciencedirect.com/science/article/pii/S1046202317300439). In this paper we throughly analyzed the algorithm and compared it with state-of-the art techniques on a 8 data sets including MNIST, USPS

NASA/ADS

Supervised Dimensionality Reduction and Visualization using Centroid-encoder

Abstract