Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle
Abstract
We introduce a novel transition system for discontinuous constituency parsing. Instead of storing subtrees in a stack --i.e. a data structure with linear-time sequential access-- the proposed system uses a set of parsing items, with constant-time random access. This change makes it possible to construct any discontinuous constituency tree in exactly $4n - 2$ transitions for a sentence of length $n$. At each parsing step, the parser considers every item in the set to be combined with a focus item and to construct a new constituent in a bottom-up fashion. The parsing strategy is based on the assumption that most syntactic structures can be parsed incrementally and that the set --the memory of the parser-- remains reasonably small on average. Moreover, we introduce a provably correct dynamic oracle for the new transition system, and present the first experiments in discontinuous constituency parsing using a dynamic oracle. Our parser obtains state-of-the-art results on three English and German discontinuous treebanks.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2019
- DOI:
- arXiv:
- arXiv:1904.00615
- Bibcode:
- 2019arXiv190400615C
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- Accepted for publication at NAACL 2019