Identifying Chinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations
Abstract
Recent works of opinion expression identification (OEI) rely heavily on the quality and scale of the manually-constructed training corpus, which could be extremely difficult to satisfy. Crowdsourcing is one practical solution for this problem, aiming to create a large-scale but quality-unguaranteed corpus. In this work, we investigate Chinese OEI with extremely-noisy crowdsourcing annotations, constructing a dataset at a very low cost. Following zhang et al. (2021), we train the annotator-adapter model by regarding all annotations as gold-standard in terms of crowd annotators, and test the model by using a synthetic expert, which is a mixture of all annotators. As this annotator-mixture for testing is never modeled explicitly in the training phase, we propose to generate synthetic training samples by a pertinent mixup strategy to make the training and testing highly consistent. The simulation experiments on our constructed dataset show that crowdsourcing is highly promising for OEI, and our proposed annotator-mixup can further enhance the crowdsourcing modeling.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2022
- DOI:
- 10.48550/arXiv.2204.10714
- arXiv:
- arXiv:2204.10714
- Bibcode:
- 2022arXiv220410714Z
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- Accepted by ACL 2022 main conf