MBIC -- A Media Bias Annotation Dataset Including Annotator Characteristics
Abstract
Many people consider news articles to be a reliable source of information on current events. However, due to the range of factors influencing news agencies, such coverage may not always be impartial. Media bias, or slanted news coverage, can have a substantial impact on public perception of events, and, accordingly, can potentially alter the beliefs and views of the public. The main data gap in current research on media bias detection is a robust, representative, and diverse dataset containing annotations of biased words and sentences. In particular, existing datasets do not control for the individual background of annotators, which may affect their assessment and, thus, represents critical information for contextualizing their annotations. In this poster, we present a matrix-based methodology to crowdsource such data using a self-developed annotation platform. We also present MBIC (Media Bias Including Characteristics) - the first sample of 1,700 statements representing various media bias instances. The statements were reviewed by ten annotators each and contain labels for media bias identification both on the word and sentence level. MBIC is the first available dataset about media bias reporting detailed information on annotator characteristics and their individual background. The current dataset already significantly extends existing data in this domain providing unique and more reliable insights into the perception of bias. In future, we will further extend it both with respect to the number of articles and annotators per article.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2021
- DOI:
- 10.48550/arXiv.2105.11910
- arXiv:
- arXiv:2105.11910
- Bibcode:
- 2021arXiv210511910S
- Keywords:
-
- Computer Science - Computation and Language