Masked Conditional Neural Networks for Audio Classification
Abstract
We present the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) designed for temporal signal recognition. The CLNN takes into consideration the temporal nature of the sound signal and the MCLNN extends upon the CLNN through a binary mask to preserve the spatial locality of the features and allows an automated exploration of the features combination analogous to hand-crafting the most relevant features for the recognition task. MCLNN has achieved competitive recognition accuracies on the GTZAN and the ISMIR2004 music datasets that surpass several state-of-the-art neural network based architectures and hand-crafted methods applied on both datasets.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2018
- DOI:
- 10.48550/arXiv.1803.02421
- arXiv:
- arXiv:1803.02421
- Bibcode:
- 2018arXiv180302421M
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning;
- Computer Science - Sound;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- Restricted BoltzmannMachine, RBM, Conditional Restricted Boltzmann Machine, CRBM, Music Information Retrieval, MIR, Conditional Neural Network, CLNN, Masked Conditional Neural Network, MCLNN, Deep Neural Network