Integrating Contrastive Learning into a Multitask Transformer Model for Effective Domain Adaptation
Abstract
While speech emotion recognition (SER) research has made significant progress, achieving generalization across various corpora continues to pose a problem. We propose a novel domain adaptation technique that embodies a multitask framework with SER as the primary task, and contrastive learning and information maximisation loss as auxiliary tasks, underpinned by fine-tuning of transformers pre-trained on large language models. Empirical results obtained through experiments on well-established datasets like IEMOCAP and MSP-IMPROV, illustrate that our proposed model achieves state-of-the-art performance in SER within cross-corpus scenarios.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2023
- DOI:
- 10.48550/arXiv.2310.04703
- arXiv:
- arXiv:2310.04703
- Bibcode:
- 2023arXiv231004703A
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Human-Computer Interaction;
- Computer Science - Machine Learning;
- Speech Emotion Recognition;
- Domain adaptation