Advancing Similarity Search with GenAI: A Retrieval Augmented Generation Approach
Abstract
This article introduces an innovative Retrieval Augmented Generation approach to similarity search. The proposed method uses a generative model to capture nuanced semantic information and retrieve similarity scores based on advanced context understanding. The study focuses on the BIOSSES dataset containing 100 pairs of sentences extracted from the biomedical domain, and introduces similarity search correlation results that outperform those previously attained on this dataset. Through an in-depth analysis of the model sensitivity, the research identifies optimal conditions leading to the highest similarity search accuracy: the results reveals high Pearson correlation scores, reaching specifically 0.905 at a temperature of 0.5 and a sample size of 20 examples provided in the prompt. The findings underscore the potential of generative models for semantic information retrieval and emphasize a promising research direction to similarity search.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- arXiv:
- arXiv:2501.04006
- Bibcode:
- 2025arXiv250104006B
- Keywords:
-
- Computer Science - Information Retrieval