Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces
Abstract
Modern machine learning techniques commonly rely on complex, high-dimensional embedding representations to capture underlying structure in the data and improve performance. In order to characterize model flaws and choose a desirable representation, model builders often need to compare across multiple embedding spaces, a challenging analytical task supported by few existing tools. We first interviewed nine embedding experts in a variety of fields to characterize the diverse challenges they face and techniques they use when analyzing embedding spaces. Informed by these perspectives, we developed a novel system called Emblaze that integrates embedding space comparison within a computational notebook environment. Emblaze uses an animated, interactive scatter plot with a novel Star Trail augmentation to enable visual comparison. It also employs novel neighborhood analysis and clustering procedures to dynamically suggest groups of points with interesting changes between spaces. Through a series of case studies with ML experts, we demonstrate how interactive comparison with Emblaze can help gain new insights into embedding space structure.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2022
- DOI:
- 10.48550/arXiv.2202.02641
- arXiv:
- arXiv:2202.02641
- Bibcode:
- 2022arXiv220202641S
- Keywords:
-
- Computer Science - Human-Computer Interaction;
- Computer Science - Machine Learning
- E-Print:
- 23 pages, 5 figures, 2 tables. To be presented at IUI'22. arXiv version updated Feb 16 2022 with corrected publication year and copyright