Retrieval Augmented Generation for Domain-specific Question Answering

doi:10.48550/arXiv.2404.14760

Retrieval Augmented Generation for Domain-specific Question Answering

Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we build an in-house question-answering system for Adobe products. We propose a novel framework to compile a large question-answer database and develop the approach for retrieval-aware finetuning of a Large Language model. We showcase that fine-tuning the retriever leads to major improvements in the final generation. Our overall approach reduces hallucinations during generation while keeping in context the latest retrieval information for contextual grounding.

Publication:

arXiv e-prints

Pub Date:

April 2024

DOI:

10.48550/arXiv.2404.14760

arXiv:

arXiv:2404.14760

Bibcode:

2024arXiv240414760S

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence;
Computer Science - Information Retrieval;
Computer Science - Machine Learning

E-Print:

AAAI 2024 (Association for the Advancement of Artificial Intelligence) Scientific Document Understanding Workshop

NASA/ADS

Retrieval Augmented Generation for Domain-specific Question Answering

Abstract