Synthetic graphs for link prediction benchmarking
Abstract
Predicting missing links in complex networks requires algorithms that are able to explore statistical regularities in the existing data. Here we investigate the interplay between algorithm efficiency and network structures through the introduction of suitably-designed synthetic graphs. We propose a family of random graphs that incorporates both micro-scale motifs and meso-scale communities, two ubiquitous structures in complex networks. A key contribution is the derivation of theoretical upper bounds for link prediction performance in our synthetic graphs, allowing us to estimate the predictability of the task and obtain an improved assessment of the performance of any method. Our results on the performance of classical methods (e.g., Stochastic Block Models, Node2Vec,GraphSage) show that the performance of all methods correlate with the theoretical predictability, that no single method is universally superior, and that each of the methods exploit different characteristics known to exist in large classes of networks. Our findings underline the need for careful consideration of graph structure when selecting a link prediction method and emphasize the value of comparing performance against synthetic benchmarks. We provide open-source code for generating these synthetic graphs, enabling further research on link prediction methods.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.03757
- Bibcode:
- 2024arXiv241203757V
- Keywords:
-
- Computer Science - Social and Information Networks;
- Physics - Physics and Society
- E-Print:
- 15 pages, 8 figures