An LSH Index for Computing Kendall's Tau over Top-k Lists
Abstract
We consider the problem of similarity search within a set of top-k lists under the Kendall's Tau distance function. This distance describes how related two rankings are in terms of concordantly and discordantly ordered items. As top-k lists are usually very short compared to the global domain of possible items to be ranked, creating an inverted index to look up overlapping lists is possible but does not capture tight enough the similarity measure. In this work, we investigate locality sensitive hashing schemes for the Kendall's Tau distance and evaluate the proposed methods using two real-world datasets.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2014
- DOI:
- arXiv:
- arXiv:1409.0651
- Bibcode:
- 2014arXiv1409.0651P
- Keywords:
-
- Computer Science - Databases
- E-Print:
- 6 pages, 8 subfigures, presented in Seventeenth International Workshop on the Web and Databases (WebDB 2014) co-located with ACM SIGMOD2014