Semantic clustering of Russian web search results: possibilities and problems
Abstract
The paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply this data to cluster Mail.ru Search results according to meanings of the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are described.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2014
- DOI:
- 10.48550/arXiv.1409.1612
- arXiv:
- arXiv:1409.1612
- Bibcode:
- 2014arXiv1409.1612K
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Information Retrieval
- E-Print:
- Presented at Russian Summer School in Information Retrieval (RuSSIR 2014). To be published in Springer Communications in Computer and Information Science series