Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification
Abstract
Music genre classification is one example of content-based analysis of music signals. Traditionally, human-engineered features were used to automatize this task and 61% accuracy has been achieved in the 10-genre classification. However, it's still below the 70% accuracy that humans could achieve in the same task. Here, we propose a new method that combines knowledge of human perception study in music genre classification and the neurophysiology of the auditory system. The method works by training a simple convolutional neural network (CNN) to classify a short segment of the music signal. Then, the genre of a music is determined by splitting it into short segments and then combining CNN's predictions from all short segments. After training, this method achieves human-level (70%) accuracy and the filters learned in the CNN resemble the spectrotemporal receptive field (STRF) in the auditory system.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2018
- DOI:
- 10.48550/arXiv.1802.09697
- arXiv:
- arXiv:1802.09697
- Bibcode:
- 2018arXiv180209697D
- Keywords:
-
- Computer Science - Sound;
- Computer Science - Machine Learning;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- https://ccneuro.org/2018/proceedings/1153.pdf