Unsupervised Disentanglement without Autoencoding: Pitfalls and Future Directions
Abstract
Disentangled visual representations have largely been studied with generative models such as Variational AutoEncoders (VAEs). While prior work has focused on generative methods for disentangled representation learning, these approaches do not scale to large datasets due to current limitations of generative models. Instead, we explore regularization methods with contrastive learning, which could result in disentangled representations that are powerful enough for large scale datasets and downstream applications. However, we find that unsupervised disentanglement is difficult to achieve due to optimization and initialization sensitivity, with trade-offs in task performance. We evaluate disentanglement with downstream tasks, analyze the benefits and disadvantages of each regularization used, and discuss future directions.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2021
- DOI:
- 10.48550/arXiv.2108.06613
- arXiv:
- arXiv:2108.06613
- Bibcode:
- 2021arXiv210806613B
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Machine Learning
- E-Print:
- Accepted at the ICML 2021 Self-Supervised Learning for Reasoning and Perception Workshop