Learning Part-of-Speech Guessing Rules from Lexicon: Extension to Non-Concatenative Operations
Abstract
One of the problems in part-of-speech tagging of real-word texts is that of unknown to the lexicon words. In Mikheev (ACL-96 cmp-lg/9604022), a technique for fully unsupervised statistical acquisition of rules which guess possible parts-of-speech for unknown words was proposed. One of the over-simplification assumed by this learning technique was the acquisition of morphological rules which obey only simple concatenative regularities of the main word with an affix. In this paper we extend this technique to the non-concatenative cases of suffixation and assess the gain in the performance.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 1996
- DOI:
- arXiv:
- arXiv:cmp-lg/9604025
- Bibcode:
- 1996cmp.lg....4025M
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- 6 pages, LaTeX (colap.sty for COLING-96)