Prefix Probabilities from Stochastic Tree Adjoining Grammars
Abstract
Language models for speech recognition typically use a probability model of the form Pr(a_n  a_1, a_2, ..., a_{n1}). Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the above form is constructed from such grammars by computing the prefix probability Sum_{w in Sigma*} Pr(a_1 ... a_n w), where w represents all possible terminations of the prefix a_1 ... a_n. The main result in this paper is an algorithm to compute such prefix probabilities given a stochastic Tree Adjoining Grammar (TAG). The algorithm achieves the required computation in O(n^6) time. The probability of subderivations that do not derive any words in the prefix, but contribute structurally to its derivation, are precomputed to achieve termination. This algorithm enables existing corpusbased estimation techniques for stochastic TAGs to be used for language modelling.
 Publication:

arXiv eprints
 Pub Date:
 September 1998
 DOI:
 10.48550/arXiv.cs/9809026
 arXiv:
 arXiv:cs/9809026
 Bibcode:
 1998cs........9026N
 Keywords:

 Computer Science  Computation and Language;
 I.2.7;
 D.3.1
 EPrint:
 7 pages, 2 Postscript figures, uses colacl.sty, graphicx.sty, psfrag.sty