Learning to Pronounce Chinese Without a Pronunciation Dictionary
Abstract
We demonstrate a program that learns to pronounce Chinese text in Mandarin, without a pronunciation dictionary. From non-parallel streams of Chinese characters and Chinese pinyin syllables, it establishes a many-to-many mapping between characters and pronunciations. Using unsupervised methods, the program effectively deciphers writing into speech. Its token-level character-to-syllable accuracy is 89%, which significantly exceeds the 22% accuracy of prior work.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2020
- DOI:
- 10.48550/arXiv.2010.04744
- arXiv:
- arXiv:2010.04744
- Bibcode:
- 2020arXiv201004744C
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- 7 pages. To appear in EMNLP 2020