WikiGUM: Exhaustive Entity Linking for Wikification in 12 Genres
Abstract
Previous work on Entity Linking has focused on resources targeting non-nested proper named entity mentions, often in data from Wikipedia, i.e. Wikification. In this paper, we present and evaluate WikiGUM, a fully wikified dataset, covering all mentions of named entities, including their non-named and pronominal mentions, as well as mentions nested within other mentions. The dataset covers a broad range of 12 written and spoken genres, most of which have not been included in Entity Linking efforts to date, leading to poor performance by a pretrained SOTA system in our evaluation. The availability of a variety of other annotations for the same data also enables further research on entities in context.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2021
- DOI:
- 10.48550/arXiv.2109.07449
- arXiv:
- arXiv:2109.07449
- Bibcode:
- 2021arXiv210907449L
- Keywords:
-
- Computer Science - Computation and Language