Ensemble Models for Detecting Wikidata Vandalism with Stacking - Team Honeyberry Vandalism Detector at WSDM Cup 2017
Abstract
The WSDM Cup 2017 is a binary classification task for classifying Wikidata revisions into vandalism and non-vandalism. This paper describes our method using some machine learning techniques such as under-sampling, feature selection, stacking and ensembles of models. We confirm the validity of each technique by calculating AUC-ROC of models using such techniques and not using them. Additionally, we analyze the results and gain useful insights into improving models for the vandalism detection task. The AUC-ROC of our final submission after the deadline resulted in 0.94412.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2017
- DOI:
- 10.48550/arXiv.1712.06921
- arXiv:
- arXiv:1712.06921
- Bibcode:
- 2017arXiv171206921Y
- Keywords:
-
- Computer Science - Information Retrieval;
- H.3
- E-Print:
- Vandalism Detector at WSDM Cup 2017, see arXiv:1712.05956