Latent variable approach to diarization of audio recordings using ad-hoc randomly placed mobile devices
Abstract
Diarization of audio recordings from ad-hoc mobile devices using spatial information is considered in this paper. A two-channel synchronous recording is assumed for each mobile device, which is used to compute directional statistics separately at each device in a frame-wise manner. The recordings across the mobile devices are asynchronous, but a coarse synchronization is performed by aligning the signals using acoustic events, or real-time clock. Direction statistics computed for all the devices, are then modeled jointly using a Dirichlet mixture model, and the posterior probability over the mixture components is used to derive the diarization information. Experiments on real life recordings using mobile phones show a diarization error rate of less than 14%.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2018
- DOI:
- 10.48550/arXiv.1810.13109
- arXiv:
- arXiv:1810.13109
- Bibcode:
- 2018arXiv181013109C
- Keywords:
-
- Electrical Engineering and Systems Science - Audio and Speech Processing;
- Computer Science - Sound
- E-Print:
- Paper Submitted to the International Conference on Acoustics Speech and Signal Processing (ICASSP) 2019 to be held in Brighton, UK between May 12-17, 2019