Toward Selectivity Based Keyword Extraction for Croatian News
Abstract
Preliminary report on network based keyword extraction for Croatian is an unsupervised method for keyword extraction from the complex network. We build our approach with a new network measure the node selectivity, motivated by the research of the graph based centrality approaches. The node selectivity is defined as the average weight distribution on the links of the single node. We extract nodes (keyword candidates) based on the selectivity value. Furthermore, we expand extracted nodes to word-tuples ranked with the highest in/out selectivity values. Selectivity based extraction does not require linguistic knowledge while it is purely derived from statistical and structural information en-compassed in the source text which is reflected into the structure of the network. Obtained sets are evaluated on a manually annotated keywords: for the set of extracted keyword candidates average F1 score is 24,63%, and average F2 score is 21,19%; for the exacted words-tuples candidates average F1 score is 25,9% and average F2 score is 24,47%.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2014
- DOI:
- 10.48550/arXiv.1407.4723
- arXiv:
- arXiv:1407.4723
- Bibcode:
- 2014arXiv1407.4723B
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Information Retrieval;
- Computer Science - Social and Information Networks