Statistical modality tagging from rule-based annotations and crowdsourcing
Abstract
We explore training an automatic modality tagger. Modality is the attitude that a speaker might have toward an event or state. One of the main hurdles for training a linguistic tagger is gathering training data. This is particularly problematic for training a tagger for modality because modality triggers are sparse for the overwhelming majority of sentences. We investigate an approach to automatically training a modality tagger where we first gathered sentences based on a high-recall simple rule-based modality tagger and then provided these sentences to Mechanical Turk annotators for further annotation. We used the resulting set of training data to train a precise modality tagger using a multi-class SVM that delivers good performance.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2015
- DOI:
- arXiv:
- arXiv:1503.01190
- Bibcode:
- 2015arXiv150301190P
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Machine Learning;
- Statistics - Machine Learning;
- I.2.7;
- I.2.6;
- I.5.1;
- I.5.4
- E-Print:
- 8 pages, 6 tables