DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

doi:10.48550/arXiv.2104.08540

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible - diachronic and synchronic - uses for this dataset.

Publication:

arXiv e-prints

Pub Date:

April 2021

DOI:

10.48550/arXiv.2104.08540

arXiv:

arXiv:2104.08540

Bibcode:

2021arXiv210408540S

Keywords:

Computer Science - Computation and Language

E-Print:

Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky, and Barbara McGillivray. 2021. DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7079--7091, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics

NASA/ADS

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

Abstract