Experiments with POS Tagging Code-mixed Indian Social Media Text

doi:10.48550/arXiv.1610.09799

Experiments with POS Tagging Code-mixed Indian Social Media Text

This paper presents Centre for Development of Advanced Computing Mumbai's (CDACM) submission to the NLP Tools Contest on Part-Of-Speech (POS) Tagging For Code-mixed Indian Social Media Text (POSCMISMT) 2015 (collocated with ICON 2015). We submitted results for Hindi (hi), Bengali (bn), and Telugu (te) languages mixed with English (en). In this paper, we have described our approaches to the POS tagging techniques, we exploited for this task. Machine learning has been used to POS tag the mixed language text. For POS tagging, distributed representations of words in vector space (word2vec) for feature extraction and Log-linear models have been tried. We report our work on all three languages hi, bn, and te mixed with en.

Publication:

arXiv e-prints

Pub Date:

October 2016

DOI:

10.48550/arXiv.1610.09799

arXiv:

arXiv:1610.09799

Bibcode:

2016arXiv161009799P

Keywords:

Computer Science - Computation and Language

E-Print:

3 Pages, Published in the Proceedings of the Tool Contest on POS Tagging for Code-mixed Indian Social Media (Facebook, Twitter, and Whatsapp) Text

ADS

Experiments with POS Tagging Code-mixed Indian Social Media Text

Abstract