Mouth Articulation-Based Anchoring for Improved Cross-Corpus Speech Emotion Recognition

doi:10.48550/arXiv.2412.19909

Mouth Articulation-Based Anchoring for Improved Cross-Corpus Speech Emotion Recognition

Cross-corpus speech emotion recognition (SER) plays a vital role in numerous practical applications. Traditional approaches to cross-corpus emotion transfer often concentrate on adapting acoustic features to align with different corpora, domains, or labels. However, acoustic features are inherently variable and error-prone due to factors like speaker differences, domain shifts, and recording conditions. To address these challenges, this study adopts a novel contrastive approach by focusing on emotion-specific articulatory gestures as the core elements for analysis. By shifting the emphasis on the more stable and consistent articulatory gestures, we aim to enhance emotion transfer learning in SER tasks. Our research leverages the CREMA-D and MSP-IMPROV corpora as benchmarks and it reveals valuable insights into the commonality and reliability of these articulatory gestures. The findings highlight mouth articulatory gesture potential as a better constraint for improving emotion recognition across different settings or domains.

Publication:

arXiv e-prints

Pub Date:

December 2024

DOI:

10.48550/arXiv.2412.19909

arXiv:

arXiv:2412.19909

Bibcode:

2024arXiv241219909U

Keywords:

Computer Science - Sound;
Computer Science - Machine Learning;
Electrical Engineering and Systems Science - Audio and Speech Processing

ADS

Mouth Articulation-Based Anchoring for Improved Cross-Corpus Speech Emotion Recognition

Abstract