Enhanced Factored Three-Way Restricted Boltzmann Machines for Speech Detection
Abstract
In this letter, we propose enhanced factored three way restricted Boltzmann machines (EFTW-RBMs) for speech detection. The proposed model incorporates conditional feature learning by multiplying the dynamical state of the third unit, which allows a modulation over the visible-hidden node pairs. Instead of stacking previous frames of speech as the third unit in a recursive manner, the correlation related weighting coefficients are assigned to the contextual neighboring frames. Specifically, a threshold function is designed to capture the long-term features and blend the globally stored speech structure. A factored low rank approximation is introduced to reduce the parameters of the three-dimensional interaction tensor, on which non-negative constraint is imposed to address the sparsity characteristic. The validations through the area-under-ROC-curve (AUC) and signal distortion ratio (SDR) show that our approach outperforms several existing 1D and 2D (i.e., time and time-frequency domain) speech detection algorithms in various noisy environments.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2016
- DOI:
- arXiv:
- arXiv:1611.00326
- Bibcode:
- 2016arXiv161100326S
- Keywords:
-
- Computer Science - Sound;
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- 8 pages, Pattern Recognition Letter 2016