Effect of different splitting criteria on the performance of speech emotion recognition
Abstract
Traditional speech emotion recognition (SER) evaluations have been performed merely on a speaker-independent condition; some of them even did not evaluate their result on this condition. This paper highlights the importance of splitting training and test data for SER by script, known as sentence-open or text-independent criteria. The results show that employing sentence-open criteria degraded the performance of SER. This finding implies the difficulties of recognizing emotion from speech in different linguistic information embedded in acoustic information. Surprisingly, text-independent criteria consistently performed worse than speaker+text-independent criteria. The full order of difficulties for splitting criteria on SER performances from the most difficult to the easiest is text-independent, speaker+text-independent, speaker-independent, and speaker+text-dependent. The gap between speaker+text-independent and text-independent was smaller than other criteria, strengthening the difficulties of recognizing emotion from speech in different sentences.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2022
- DOI:
- 10.48550/arXiv.2210.14501
- arXiv:
- arXiv:2210.14501
- Bibcode:
- 2022arXiv221014501T
- Keywords:
-
- Electrical Engineering and Systems Science - Audio and Speech Processing;
- Computer Science - Sound
- E-Print:
- Accepted at TENCON 2021