Unsupervised Identification of Relevant Prior Cases
Abstract
Document retrieval has taken its role in almost all domains of knowledge understanding, including the legal domain. Precedent refers to a court decision that is considered as authority for deciding subsequent cases involving identical or similar facts or similar legal issues. In this work, we propose different unsupervised approaches to solve the task of identifying relevant precedents to a given query case. Our proposed approaches are using word embeddings like word2vec, doc2vec, and sent2vec, finding cosine similarity using TF-IDF, retrieving relevant documents using BM25 scores, using the pre-trained model and SBERT to find the most similar document, and using the product of BM25 and TF-IDF scores to find the most relevant document for a given query. We compared all the methods based on precision@10, recall@10, and MRR. Based on the comparative analysis, we found that the TF-IDF score multiplied by the BM25 score gives the best result. In this paper, we have also presented the analysis that we did to improve the BM25 score.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2021
- DOI:
- 10.48550/arXiv.2107.08973
- arXiv:
- arXiv:2107.08973
- Bibcode:
- 2021arXiv210708973B
- Keywords:
-
- Computer Science - Information Retrieval;
- Computer Science - Computation and Language
- E-Print:
- Code: https://github.com/shivangibithel/Information-Retrieval-CS6370