Prefix Probabilities from Stochastic Tree Adjoining Grammars
Abstract
Language models for speech recognition typically use a probability model of the form Pr(a_n | a_1, a_2, ..., a_{n-1}). Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the above form is constructed from such grammars by computing the prefix probability Sum_{w in Sigma*} Pr(a_1 ... a_n w), where w represents all possible terminations of the prefix a_1 ... a_n. The main result in this paper is an algorithm to compute such prefix probabilities given a stochastic Tree Adjoining Grammar (TAG). The algorithm achieves the required computation in O(n^6) time. The probability of subderivations that do not derive any words in the prefix, but contribute structurally to its derivation, are precomputed to achieve termination. This algorithm enables existing corpus-based estimation techniques for stochastic TAGs to be used for language modelling.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 1998
- DOI:
- 10.48550/arXiv.cs/9809026
- arXiv:
- arXiv:cs/9809026
- Bibcode:
- 1998cs........9026N
- Keywords:
-
- Computer Science - Computation and Language;
- I.2.7;
- D.3.1
- E-Print:
- 7 pages, 2 Postscript figures, uses colacl.sty, graphicx.sty, psfrag.sty