Text Segmentation Based on Similarity between Words
Abstract
This paper proposes a new indicator of text structure, called the lexical cohesion profile (LCP), which locates segment boundaries in a text. A text segment is a coherent scene; the words in a segment are linked together via lexical cohesion relations. LCP records mutual similarity of words in a sequence of text. The similarity of words, which represents their cohesiveness, is computed using a semantic network. Comparison with the text segments marked by a number of subjects shows that LCP closely correlates with the human judgments. LCP may provide valuable information for resolving anaphora and ellipsis.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 1996
- DOI:
- 10.48550/arXiv.cmp-lg/9601005
- arXiv:
- arXiv:cmp-lg/9601005
- Bibcode:
- 1996cmp.lg....1005K
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- 3 pages, uufiles (paper.tex, acl.sty, bezier.sty)