Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data

doi:10.48550/arXiv.2405.19519

Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data

Retrieval augmented generation (RAG) provides the capability to constrain generative model outputs, and mitigate the possibility of hallucination, by providing relevant in-context text. The number of tokens a generative large language model (LLM) can incorporate as context is finite, thus limiting the volume of knowledge from which to generate an answer. We propose a two-layer RAG framework for query-focused answer generation and evaluate a proof-of-concept for this framework in the context of query-focused summary generation from social media forums, focusing on emerging drug-related information. The evaluations demonstrate the effectiveness of the two-layer framework in resource constrained settings to enable researchers in obtaining near real-time data from users.

Publication:

arXiv e-prints

Pub Date:

May 2024

DOI:

10.48550/arXiv.2405.19519

arXiv:

arXiv:2405.19519

Bibcode:

2024arXiv240519519D

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence

ADS

Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data

Abstract