Efficient Construction of Neighborhood Graphs by the Multiple Sorting Method
Abstract
Neighborhood graphs are gaining popularity as a concise data representation in machine learning. However, naive graph construction by pairwise distance calculation takes $O(n^2)$ runtime for $n$ data points and this is prohibitively slow for millions of data points. For strings of equal length, the multiple sorting method (Uno, 2008) can construct an $\epsilon$-neighbor graph in $O(n+m)$ time, where $m$ is the number of $\epsilon$-neighbor pairs in the data. To introduce this remarkably efficient algorithm to continuous domains such as images, signals and texts, we employ a random projection method to convert vectors to strings. Theoretical results are presented to elucidate the trade-off between approximation quality and computation time. Empirical results show the efficiency of our method in comparison to fast nearest neighbor alternatives.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2009
- DOI:
- 10.48550/arXiv.0904.3151
- arXiv:
- arXiv:0904.3151
- Bibcode:
- 2009arXiv0904.3151U
- Keywords:
-
- Computer Science - Data Structures and Algorithms;
- Computer Science - Machine Learning