Unsupervised Keyword Extraction from Polish Legal Texts
Abstract
In this work, we present an application of the recently proposed unsupervised keyword extraction algorithm RAKE to a corpus of Polish legal texts from the field of public procurement. RAKE is essentially a language and domain independent method. Its only language-specific input is a stoplist containing a set of non-content words. The performance of the method heavily depends on the choice of such a stoplist, which should be domain adopted. Therefore, we complement RAKE algorithm with an automatic approach to selecting non-content words, which is based on the statistical properties of term distribution.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2014
- DOI:
- 10.48550/arXiv.1408.3731
- arXiv:
- arXiv:1408.3731
- Bibcode:
- 2014arXiv1408.3731J
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- Lecture Notes in Computer Science, Volume 8686, Springer 2014, pp 65-70