FASTPPR: Scaling Personalized PageRank Estimation for Large Graphs
Abstract
We propose a new algorithm, FASTPPR, for estimating personalized PageRank: given start node $s$ and target node $t$ in a directed graph, and given a threshold $\delta$, FASTPPR estimates the Personalized PageRank $\pi_s(t)$ from $s$ to $t$, guaranteeing a small relative error as long $\pi_s(t)>\delta$. Existing algorithms for this problem have a runningtime of $\Omega(1/\delta)$; in comparison, FASTPPR has a provable average runningtime guarantee of ${O}(\sqrt{d/\delta})$ (where $d$ is the average indegree of the graph). This is a significant improvement, since $\delta$ is often $O(1/n)$ (where $n$ is the number of nodes) for applications. We also complement the algorithm with an $\Omega(1/\sqrt{\delta})$ lower bound for PageRank estimation, showing that the dependence on $\delta$ cannot be improved. We perform a detailed empirical study on numerous massive graphs, showing that FASTPPR dramatically outperforms existing algorithms. For example, on the 2010 Twitter graph with 1.5 billion edges, for target nodes sampled by popularity, FASTPPR has a $20$ factor speedup over the state of the art. Furthermore, an enhanced version of FASTPPR has a $160$ factor speedup on the Twitter graph, and is at least $20$ times faster on all our candidate graphs.
 Publication:

arXiv eprints
 Pub Date:
 April 2014
 arXiv:
 arXiv:1404.3181
 Bibcode:
 2014arXiv1404.3181L
 Keywords:

 Computer Science  Data Structures and Algorithms;
 Computer Science  Social and Information Networks;
 G.2.2;
 F.2.2
 EPrint:
 KDD 2014