IIITG-ADBU@HASOC-Dravidian-CodeMix-FIRE2020: Offensive Content Detection in Code-Mixed Dravidian Text
Abstract
This paper presents the results obtained by our SVM and XLM-RoBERTa based classifiers in the shared task Dravidian-CodeMix-HASOC 2020. The SVM classifier trained using TF-IDF features of character and word n-grams performed the best on the code-mixed Malayalam text. It obtained a weighted F1 score of 0.95 (1st Rank) and 0.76 (3rd Rank) on the YouTube and Twitter dataset respectively. The XLM-RoBERTa based classifier performed the best on the code-mixed Tamil text. It obtained a weighted F1 score of 0.87 (3rd Rank) on the code-mixed Tamil Twitter dataset.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2021
- DOI:
- 10.48550/arXiv.2107.14336
- arXiv:
- arXiv:2107.14336
- Bibcode:
- 2021arXiv210714336B
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- CEUR, 2020