Speech Synthesis with Neural Networks
Abstract
Text-to-speech conversion has traditionally been performed either by concatenating short samples of speech or by using rule-based systems to convert a phonetic representation of speech into an acoustic representation, which is then converted into speech. This paper describes a system that uses a time-delay neural network (TDNN) to perform this phonetic-to-acoustic mapping, with another neural network to control the timing of the generated speech. The neural network system requires less memory than a concatenation system, and performed well in tests comparing it to commercial systems using other technologies.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 1998
- DOI:
- 10.48550/arXiv.cs/9811031
- arXiv:
- arXiv:cs/9811031
- Bibcode:
- 1998cs.......11031K
- Keywords:
-
- Neural and Evolutionary Computing;
- Human-Computer Interaction;
- I.2.6;
- K.3.2
- E-Print:
- 6 pages, PostScript