Evaluating Features and Metrics for High-Quality Simulation of Early Vocal Learning of Vowels

doi:10.48550/arXiv.2005.09986

Evaluating Features and Metrics for High-Quality Simulation of Early Vocal Learning of Vowels

The way infants use auditory cues to learn to speak despite the acoustic mismatch of their vocal apparatus is a hot topic of scientific debate. The simulation of early vocal learning using articulatory speech synthesis offers a way towards gaining a deeper understanding of this process. One of the crucial parameters in these simulations is the choice of features and a metric to evaluate the acoustic error between the synthesised sound and the reference target. We contribute with evaluating the performance of a set of 40 feature-metric combinations for the task of optimising the production of static vowels with a high-quality articulatory synthesiser. Towards this end we assess the usability of formant error and the projection of the feature-metric error surface in the normalised F1-F2 formant space. We show that this approach can be used to evaluate the impact of features and metrics and also to offer insight to perceptual results.

Publication:

arXiv e-prints

Pub Date:

May 2020

DOI:

10.48550/arXiv.2005.09986

arXiv:

arXiv:2005.09986

Bibcode:

2020arXiv200509986G

Keywords:

Electrical Engineering and Systems Science - Audio and Speech Processing;
Computer Science - Sound

E-Print:

Submitted to INTERSPEECH 2021

NASA/ADS

Evaluating Features and Metrics for High-Quality Simulation of Early Vocal Learning of Vowels

Abstract