Human Voice Pitch Estimation: A Convolutional Network with Auto-Labeled and Synthetic Data
Abstract
In the domain of music and sound processing, pitch extraction plays a pivotal role. Our research presents a specialized convolutional neural network designed for pitch extraction, particularly from the human singing voice in acapella performances. Notably, our approach combines synthetic data with auto-labeled acapella sung audio, creating a robust training environment. Evaluation across datasets comprising synthetic sounds, opera recordings, and time-stretched vowels demonstrates its efficacy. This work paves the way for enhanced pitch extraction in both music and voice settings.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2023
- DOI:
- 10.48550/arXiv.2308.07170
- arXiv:
- arXiv:2308.07170
- Bibcode:
- 2023arXiv230807170C
- Keywords:
-
- Computer Science - Sound;
- Computer Science - Machine Learning;
- Electrical Engineering and Systems Science - Audio and Speech Processing