Leave-One-EquiVariant: Alleviating invariance-related information loss in contrastive music representations

doi:10.48550/arXiv.2412.18955

Leave-One-EquiVariant: Alleviating invariance-related information loss in contrastive music representations

Contrastive learning has proven effective in self-supervised musical representation learning, particularly for Music Information Retrieval (MIR) tasks. However, reliance on augmentation chains for contrastive view generation and the resulting learnt invariances pose challenges when different downstream tasks require sensitivity to certain musical attributes. To address this, we propose the Leave One EquiVariant (LOEV) framework, which introduces a flexible, task-adaptive approach compared to previous work by selectively preserving information about specific augmentations, allowing the model to maintain task-relevant equivariances. We demonstrate that LOEV alleviates information loss related to learned invariances, improving performance on augmentation related tasks and retrieval without sacrificing general representation quality. Furthermore, we introduce a variant of LOEV, LOEV++, which builds a disentangled latent space by design in a self-supervised manner, and enables targeted retrieval based on augmentation related attributes.

Publication:

arXiv e-prints

Pub Date:

December 2024

DOI:

10.48550/arXiv.2412.18955

arXiv:

arXiv:2412.18955

Bibcode:

2024arXiv241218955G

Keywords:

Computer Science - Sound;
Electrical Engineering and Systems Science - Audio and Speech Processing

ADS

Leave-One-EquiVariant: Alleviating invariance-related information loss in contrastive music representations

Abstract