Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content
Abstract
Transition Relevance Places are defined as the end of an utterance where the interlocutor may take the floor without interrupting the current speaker --i.e., a place where the turn is terminal. Analyzing turn terminality is useful to study the dynamic of turn-taking in spontaneous conversations. This paper presents an automatic classification of spoken utterances as Terminal or Non-Terminal in multi-speaker settings. We compared audio, text, and fusions of both approaches on a French corpus of TV and Radio extracts annotated with turn-terminality information at each speaker change. Our models are based on pre-trained self-supervised representations. We report results for different fusion strategies and varying context sizes. This study also questions the problem of performance variability by analyzing the differences in results for multiple training runs with random initialization. The measured accuracy would allow the use of these models for large-scale analysis of turn-taking.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2024
- DOI:
- arXiv:
- arXiv:2406.10073
- Bibcode:
- 2024arXiv240610073U
- Keywords:
-
- Electrical Engineering and Systems Science - Audio and Speech Processing;
- Computer Science - Computation and Language;
- Computer Science - Human-Computer Interaction;
- Computer Science - Sound
- E-Print:
- keywords : Spoken interaction, Media, TV, Radio, Transition-Relevance Places, Turn Taking, Interruption. Accepted to InterSpeech 2024, Kos Island, Greece