Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks

doi:10.48550/arXiv.1605.07990

Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks

Influence Maximization (IM), that seeks a small set of key users who spread the influence widely into the network, is a core problem in multiple domains. It finds applications in viral marketing, epidemic control, and assessing cascading failures within complex systems. Despite the huge amount of effort, IM in billion-scale networks such as Facebook, Twitter, and World Wide Web has not been satisfactorily solved. Even the state-of-the-art methods such as TIM+ and IMM may take days on those networks. In this paper, we propose SSA and D-SSA, two novel sampling frameworks for IM-based viral marketing problems. SSA and D-SSA are up to 1200 times faster than the SIGMOD'15 best method, IMM, while providing the same $(1-1/e-\epsilon)$ approximation guarantee. Underlying our frameworks is an innovative Stop-and-Stare strategy in which they stop at exponential check points to verify (stare) if there is adequate statistical evidence on the solution quality. Theoretically, we prove that SSA and D-SSA are the first approximation algorithms that use (asymptotically) minimum numbers of samples, meeting strict theoretical thresholds characterized for IM. The absolute superiority of SSA and D-SSA are confirmed through extensive experiments on real network data for IM and another topic-aware viral marketing problem, named TVM. The source code is available at https://github.com/hungnt55/Stop-and-Stare

Publication:

arXiv e-prints

Pub Date:

May 2016

DOI:

10.48550/arXiv.1605.07990

arXiv:

arXiv:1605.07990

Bibcode:

2016arXiv160507990N

Keywords:

Computer Science - Social and Information Networks;
Computer Science - Data Structures and Algorithms;
Physics - Physics and Society

E-Print:

Correct the errors in the proofs for SSA/D-SSA. Update D-SSA to estimate \epsilon(s) instead of \delta(s)

NASA/ADS

Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks

Abstract