An Effective Approach for Web Document Classification using the Concept of Association Analysis of Data Mining

doi:10.48550/arXiv.1406.5616

An Effective Approach for Web Document Classification using the Concept of Association Analysis of Data Mining

Exponential growth of the web increased the importance of web document classification and data mining. To get the exact information, in the form of knowing what classes a web document belongs to, is expensive. Automatic classification of web document is of great use to search engines which provides this information at a low cost. In this paper, we propose an approach for classifying the web document using the frequent item word sets generated by the Frequent Pattern (FP) Growth which is an association analysis technique of data mining. These set of associated words act as feature set. The final classification obtained after Naïve Bayes classifier used on the feature set. For the experimental work, we use Gensim package, as it is simple and robust. Results show that our approach can be effectively classifying the web document.

Publication:

arXiv e-prints

Pub Date:

June 2014

DOI:

10.48550/arXiv.1406.5616

arXiv:

arXiv:1406.5616

Bibcode:

2014arXiv1406.5616R

Keywords:

Computer Science - Information Retrieval

E-Print:

9 Pages

NASA/ADS

An Effective Approach for Web Document Classification using the Concept of Association Analysis of Data Mining

Abstract