The DKU-DukeECE Diarization System for the VoxCeleb Speaker Recognition Challenge 2022
Abstract
This paper discribes the DKU-DukeECE submission to the 4th track of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). Our system contains a fused voice activity detection model, a clustering-based diarization model, and a target-speaker voice activity detection-based overlap detection model. Overall, the submitted system is similar to our previous year's system in VoxSRC-21. The difference is that we use a much better speaker embedding and a fused voice activity detection, which significantly improves the performance. Finally, we fuse 4 different systems using DOVER-lap and achieve 4.75 of the diarization error rate, which ranks the 1st place in track 4.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2022
- DOI:
- 10.48550/arXiv.2210.01677
- arXiv:
- arXiv:2210.01677
- Bibcode:
- 2022arXiv221001677W
- Keywords:
-
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- arXiv admin note: substantial text overlap with arXiv:2109.02002