Optimal DataDependent Hashing for Approximate Near Neighbors
Abstract
We show an optimal datadependent hashing scheme for the approximate near neighbor problem. For an $n$point data set in a $d$dimensional space our data structure achieves query time $O(d n^{\rho+o(1)})$ and space $O(n^{1+\rho+o(1)} + dn)$, where $\rho=\tfrac{1}{2c^21}$ for the Euclidean space and approximation $c>1$. For the Hamming space, we obtain an exponent of $\rho=\tfrac{1}{2c1}$. Our result completes the direction set forth in [AINR14] who gave a proofofconcept that datadependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [IM98,AI06] for all approximation factors $c>1$. From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudorandom.
 Publication:

arXiv eprints
 Pub Date:
 January 2015
 arXiv:
 arXiv:1501.01062
 Bibcode:
 2015arXiv150101062A
 Keywords:

 Computer Science  Data Structures and Algorithms
 EPrint:
 36 pages, 5 figures, an extended abstract appeared in the proceedings of the 47th ACM Symposium on Theory of Computing (STOC 2015)