Graph-Community Detection for Cross-Document Topic Segment Relationship Identification
Abstract
In this paper we propose a graph-community detection approach to identify cross-document relationships at the topic segment level. Given a set of related documents, we automatically find these relationships by clustering segments with similar content (topics). In this context, we study how different weighting mechanisms influence the discovery of word communities that relate to the different topics found in the documents. Finally, we test different mapping functions to assign topic segments to word communities, determining which topic segments are considered equivalent. By performing this task it is possible to enable efficient multi-document browsing, since when a user finds relevant content in one document we can provide access to similar topics in other documents. We deploy our approach in two different scenarios. One is an educational scenario where equivalence relationships between learning materials need to be found. The other consists of a series of dialogs in a social context where students discuss commonplace topics. Results show that our proposed approach better discovered equivalence relationships in learning material documents and obtained close results in the social speech domain, where the best performing approach was a clustering technique.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2016
- DOI:
- 10.48550/arXiv.1606.04081
- arXiv:
- arXiv:1606.04081
- Bibcode:
- 2016arXiv160604081M
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Information Retrieval;
- Computer Science - Social and Information Networks