A Method for Extracting Keywords from XML Documents by Using DTD
Abstract
Since computerized documents, e.g. XML documents, have been increased, it is desired to find particular information from a huge amount of XML documents. This paper proposes an automatic method for extracting keywords from the valid XML documents. Structured elements of an XML document are defined by DTD. We consider that a certain element of the structure represents importance for the document. First, the importance of an element is determined by the definition in DTD. For example, elements that cannot be omitted and elements that appear only once at the maximum in their parent elements are considered important ones. Second, all elements in the target XML document are scored by the tree structure of elements and contained texts in the document. Third, candidates of the keywords are extracted from elements with the scores. Finally, scores are summed up and candidates ranked higher are selected as keywords of the XML document. The validity of this method is examined.
- Publication:
-
IEEJ Transactions on Electronics, Information and Systems
- Pub Date:
- 2003
- DOI:
- Bibcode:
- 2003ITEIS.123..693A
- Keywords:
-
- Keyword Extraction;
- XML Documents;
- DTD