Aspects of Pattern-Matching in Data-Oriented Parsing
Abstract
Data-Oriented Parsing (dop) ranks among the best parsing schemes, pairing state-of-the art parsing accuracy to the psycholinguistic insight that larger chunks of syntactic structures are relevant grammatical and probabilistic units. Parsing with the dop-model, however, seems to involve a lot of CPU cycles and a considerable amount of double work, brought on by the concept of multiple derivations, which is necessary for probabilistic processing, but which is not convincingly related to a proper linguistic backbone. It is however possible to re-interpret the dop-model as a pattern-matching model, which tries to maximize the size of the substructures that construct the parse, rather than the probability of the parse. By emphasizing this memory-based aspect of the dop-model, it is possible to do away with multiple derivations, opening up possibilities for efficient Viterbi-style optimizations, while still retaining acceptable parsing accuracy through enhanced context-sensitivity.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2000
- DOI:
- 10.48550/arXiv.cs/0008014
- arXiv:
- arXiv:cs/0008014
- Bibcode:
- 2000cs........8014D
- Keywords:
-
- Computer Science - Computation and Language;
- I.2.6;
- I.2.7;
- I.5.4
- E-Print:
- 7 pages, 3 figures