Parametric UMAP embeddings for representation and semisupervised learning
Abstract
UMAP is a nonparametric graphbased dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find lowdimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) Compute a graphical representation of a dataset (fuzzy simplicial complex), and (2) Through stochastic gradient descent, optimize a lowdimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that Parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g. fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data. Google Colab walkthrough: https://colab.research.google.com/drive/1WkXVZ5pnMrm17m0YgmtoNjM_XHdnE5Vp?usp=sharing
 Publication:

arXiv eprints
 Pub Date:
 September 2020
 DOI:
 10.48550/arXiv.2009.12981
 arXiv:
 arXiv:2009.12981
 Bibcode:
 2020arXiv200912981S
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Computational Geometry;
 Quantitative Biology  Quantitative Methods;
 Statistics  Machine Learning