Blind Normalization of Speech From Different Channels and Speakers
Abstract
This paper describes representations of time-dependent signals that are invariant under any invertible time-independent transformation of the signal time series. Such a representation is created by rescaling the signal in a non-linear dynamic manner that is determined by recently encountered signal levels. This technique may make it possible to normalize signals that are related by channel-dependent and speaker-dependent transformations, without having to characterize the form of the signal transformations, which remain unknown. The technique is illustrated by applying it to the time-dependent spectra of speech that has been filtered to simulate the effects of different channels. The experimental results show that the rescaled speech representations are largely normalized (i.e., channel-independent), despite the channel-dependence of the raw (unrescaled) speech.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2002
- DOI:
- arXiv:
- arXiv:cs/0204003
- Bibcode:
- 2002cs........4003L
- Keywords:
-
- Computation and Language;
- I.2.7
- E-Print:
- 4 pages, 2 figures