ESURF: Simple and Effective EDU Segmentation

ESURF: Simple and Effective EDU Segmentation

Segmenting text into Elemental Discourse Units (EDUs) is a fundamental task in discourse parsing. We present a new simple method for identifying EDU boundaries, and hence segmenting them, based on lexical and character n-gram features, using random forest classification. We show that the method, despite its simplicity, outperforms other methods both for segmentation and within a state of the art discourse parser. This indicates the importance of such features for identifying basic discourse elements, pointing towards potentially more training-efficient methods for discourse analysis.

Publication:

arXiv e-prints

Pub Date:

January 2025

arXiv:

arXiv:2501.07723

Bibcode:

2025arXiv250107723S

Keywords:

Computer Science - Computation and Language;
Computer Science - Machine Learning

ADS

ESURF: Simple and Effective EDU Segmentation

Abstract