A Markov Random Field Topic Space Model for Document Retrieval
Abstract
This paper proposes a novel statistical approach to intelligent document retrieval. It seeks to offer a more structured and extensible mathematical approach to the term generalization done in the popular Latent Semantic Analysis (LSA) approach to document indexing. A Markov Random Field (MRF) is presented that captures relationships between terms and documents as probabilistic dependence assumptions between random variables. From there, it uses the MRF-Gibbs equivalence to derive joint probabilities as well as local probabilities for document variables. A parameter learning method is proposed that utilizes rank reduction with singular value decomposition in a matter similar to LSA to reduce dimensionality of document-term relationships to that of a latent topic space. Experimental results confirm the ability of this approach to effectively and efficiently retrieve documents from substantial data sets.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2011
- DOI:
- 10.48550/arXiv.1111.6640
- arXiv:
- arXiv:1111.6640
- Bibcode:
- 2011arXiv1111.6640H
- Keywords:
-
- Computer Science - Information Retrieval