Decoding visemes: improving machine lipreading
Abstract
To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work often uses viseme classification supported by language models with varying degrees of success. A few recent works suggest phoneme classification, in the right circumstances, can outperform viseme classification. In this work we present a novel two-pass method of training phoneme classifiers which uses previously trained visemes in the first pass. With our new training algorithm, we show classification performance which significantly improves on previous lip-reading results.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2017
- DOI:
- 10.48550/arXiv.1710.01169
- arXiv:
- arXiv:1710.01169
- Bibcode:
- 2017arXiv171001169B
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- Helen L Bear and Richard Harvey. Decoding visemes: improving machine lipreading. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016. p2009-2013