Experiments with POS Tagging Code-mixed Indian Social Media Text
Abstract
This paper presents Centre for Development of Advanced Computing Mumbai's (CDACM) submission to the NLP Tools Contest on Part-Of-Speech (POS) Tagging For Code-mixed Indian Social Media Text (POSCMISMT) 2015 (collocated with ICON 2015). We submitted results for Hindi (hi), Bengali (bn), and Telugu (te) languages mixed with English (en). In this paper, we have described our approaches to the POS tagging techniques, we exploited for this task. Machine learning has been used to POS tag the mixed language text. For POS tagging, distributed representations of words in vector space (word2vec) for feature extraction and Log-linear models have been tried. We report our work on all three languages hi, bn, and te mixed with en.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2016
- DOI:
- arXiv:
- arXiv:1610.09799
- Bibcode:
- 2016arXiv161009799P
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- 3 Pages, Published in the Proceedings of the Tool Contest on POS Tagging for Code-mixed Indian Social Media (Facebook, Twitter, and Whatsapp) Text