Measuring publication relatedness using controlled vocabularies
Abstract
Measuring the relatedness between scientific publications has important applications in many areas of bibliometrics and science policy. Controlled vocabularies provide a promising basis for measuring relatedness because they address issues that arise when using citation or textual similarity to measure relatedness. While several controlled-vocabulary-based relatedness measures have been developed, there exists no comprehensive and direct test of their accuracy and suitability for different types of research questions. This paper reviews existing measures, develops a new measure, and benchmarks the measures using TREC Genomics data as a ground truth of topics. The benchmark test show that the new measure and the measure proposed by Ahlgren et al. (2020) have differing strengths and weaknesses. These results inform a discussion of which method to choose when studying interdisciplinarity, information retrieval, clustering of science, and researcher topic switching.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2024
- DOI:
- 10.48550/arXiv.2408.15004
- arXiv:
- arXiv:2408.15004
- Bibcode:
- 2024arXiv240815004D
- Keywords:
-
- Computer Science - Information Retrieval;
- Computer Science - Information Theory;
- Computer Science - Social and Information Networks
- E-Print:
- Accepted for presentation at the 28th International Conference on Science, Technology and Innovation Indicators, 2024