Sound Source Separation with Two Spectrograms by Image Processing
Abstract
We propose a method for separating speeches using two spectrograms. First, two spectrograms are generated from voices recorded with a pair of microphones. The onsets and the offsets of the frequency components are extracted as the features using image processing techniques. Then the correspondences of the features between the spectrograms are determined and the intermicrophone time differences are calculated. Each of frequency components with the common onset/offset occurrences and time difference are grouped together as originating one of the speech signals. A set of band-pass filters are generated corresponding to each group of frequency components. Finally, each of the separated speech signals is extracted by applying the set of band-pass filters to the voice signal recorded by a microphone. Experiments were conducted with the mixture of a male speech sound and a female speech sound consisting of Japanese vowel and contain consonant. The evaluation results demonstrated that the separation was done reasonably well with the proposed method.
- Publication:
-
IEEJ Transactions on Electronics, Information and Systems
- Pub Date:
- 2004
- DOI:
- Bibcode:
- 2004ITEIS.124.2439H
- Keywords:
-
- sound source separation;
- spectrogram;
- onset;
- offset;
- image processing