A Novel Approach to Document Classification using WordNet
Abstract
Content based Document Classification is one of the biggest challenges in the context of free text mining. Current algorithms on document classifications mostly rely on cluster analysis based on bag-of-words approach. However that method is still being applied to many modern scientific dilemmas. It has established a strong presence in fields like economics and social science to merit serious attention from the researchers. In this paper we would like to propose and explore an alternative grounded more securely on the dictionary classification and correlatedness of words and phrases. It is expected that application of our existing knowledge about the underlying classification structure may lead to improvement of the classifier's performance.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2015
- DOI:
- 10.48550/arXiv.1510.02755
- arXiv:
- arXiv:1510.02755
- Bibcode:
- 2015arXiv151002755S
- Keywords:
-
- Computer Science - Information Retrieval;
- Computer Science - Computation and Language
- E-Print:
- (Working Paper)