Fine-tuning and Prompt Engineering with Cognitive Knowledge Graphs for Scholarly Knowledge Organization
Abstract
The increasing amount of published scholarly articles, exceeding 2.5 million yearly, raises the challenge for researchers in following scientific progress. Integrating the contributions from scholarly articles into a novel type of cognitive knowledge graph (CKG) will be a crucial element for accessing and organizing scholarly knowledge, surpassing the insights provided by titles and abstracts. This research focuses on effectively conveying structured scholarly knowledge by utilizing large language models (LLMs) to categorize scholarly articles and describe their contributions in a structured and comparable manner. While previous studies explored language models within specific research domains, the extensive domain-independent knowledge captured by LLMs offers a substantial opportunity for generating structured contribution descriptions as CKGs. Additionally, LLMs offer customizable pathways through prompt engineering or fine-tuning, thus facilitating to leveraging of smaller LLMs known for their efficiency, cost-effectiveness, and environmental considerations. Our methodology involves harnessing LLM knowledge, and complementing it with domain expert-verified scholarly data sourced from a CKG. This strategic fusion significantly enhances LLM performance, especially in tasks like scholarly article categorization and predicate recommendation. Our method involves fine-tuning LLMs with CKG knowledge and additionally injecting knowledge from a CKG with a novel prompting technique significantly increasing the accuracy of scholarly knowledge extraction. We integrated our approach in the Open Research Knowledge Graph (ORKG), thus enabling precise access to organized scholarly knowledge, crucially benefiting domain-independent scholarly knowledge exchange and dissemination among policymakers, industrial practitioners, and the general public.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2024
- DOI:
- 10.48550/arXiv.2409.06433
- arXiv:
- arXiv:2409.06433
- Bibcode:
- 2024arXiv240906433R
- Keywords:
-
- Computer Science - Digital Libraries;
- Computer Science - Computation and Language;
- Computer Science - Machine Learning