Evaluating the diversity of scientific discourse on twenty-one multilingual Wikipedias using citation analysis
Abstract
INTRODUCTION: Wikipedia is a major source of information, particularly for medical and health content, citing over 4 million scholarly publications. However, the representation of research-based knowledge across different languages on Wikipedia has been under explored. This study analyses the largest database of Wikipedia citations collected to date, examining the uniqueness of content and research representation across languages. METHOD: The study included nearly 3.5 million unique research articles and their Wikipedia mentions from 21 languages. These were categorized into three groups: Group A (publications uniquely cited by a single non-English Wikipedia), Group B (co-cited by English and non-English Wikipedias), and Group C (co-cited by multiple non-English Wikipedias). Descriptive and comparative statistics were conducted by Wikipedia language, group, and discipline. RESULTS: Significant differences were found between twenty non-English languages and English Wikipedia (p<0.001). While English Wikipedia is the largest, non-English Wikipedias cite an additional 1.5 million publications. CONCLUSION: English Wikipedia should not be seen as a comprehensive body of information. Non-English Wikipedias cover unique subjects and disciplines, offering a more complete representation of research collectively. The uniqueness of voice in non-English Wikipedias correlates with their size, though other factors may also influence these differences.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- arXiv:
- arXiv:2501.09666
- Bibcode:
- 2025arXiv250109666T
- Keywords:
-
- Computer Science - Digital Libraries