Vicinity-Driven Paragraph and Sentence Alignment for Comparable Corpora
Abstract
Parallel corpora have driven great progress in the field of Text Simplification. However, most sentence alignment algorithms either offer a limited range of alignment types supported, or simply ignore valuable clues present in comparable documents. We address this problem by introducing a new set of flexible vicinity-driven paragraph and sentence alignment algorithms that 1-N, N-1, N-N and long distance null alignments without the need for hard-to-replicate supervised models.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2016
- DOI:
- 10.48550/arXiv.1612.04113
- arXiv:
- arXiv:1612.04113
- Bibcode:
- 2016arXiv161204113P
- Keywords:
-
- Computer Science - Computation and Language