When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter

doi:10.48550/arXiv.1611.03057

When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter

We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter data, in the context of the Evalita 2016 PoSTWITA shared task. We show that training the tagger on native Twitter data enriched with little amounts of specifically selected gold data and additional silver-labelled data scraped from Facebook, yields better results than using large amounts of manually annotated data from a mix of genres.

Publication:

arXiv e-prints

Pub Date:

November 2016

DOI:

10.48550/arXiv.1611.03057

arXiv:

arXiv:1611.03057

Bibcode:

2016arXiv161103057P

Keywords:

Computer Science - Computation and Language

E-Print:

Proceedings of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2016)

NASA/ADS

When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter

Abstract