Efficient Segmental Cascades for Speech Recognition

doi:10.48550/arXiv.1608.00929

Efficient Segmental Cascades for Speech Recognition

Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition. However, their appeal has been limited by their computational requirements, due to the large number of possible segments to consider. Multi-pass cascades of segmental models introduce features of increasing complexity in different passes, where in each pass a segmental model rescores lattices produced by a previous (simpler) segmental model. In this paper, we explore several ways of making segmental cascades efficient and practical: reducing the feature set in the first pass, frame subsampling, and various pruning approaches. In experiments on phonetic recognition, we find that with a combination of such techniques, it is possible to maintain competitive performance while greatly reducing decoding, pruning, and training time.

Publication:

arXiv e-prints

Pub Date:

August 2016

DOI:

10.48550/arXiv.1608.00929

arXiv:

arXiv:1608.00929

Bibcode:

2016arXiv160800929T

Keywords:

Computer Science - Computation and Language

NASA/ADS

Efficient Segmental Cascades for Speech Recognition

Abstract