Large-Scale Graphs Community Detection using Spark GraphFrames
Abstract
With the emergence of social networks, online platforms dedicated to different use cases, and sensor networks, the emergence of large-scale graph community detection has become a steady field of research with real-world applications. Community detection algorithms have numerous practical applications, particularly due to their scalability with data size. Nonetheless, a notable drawback of community detection algorithms is their computational intensity~\cite{Apostol2014}, resulting in decreasing performance as data size increases. For this purpose, new frameworks that employ distributed systems such as Apache Hadoop and Apache Spark which can seamlessly handle large-scale graphs must be developed. In this paper, we propose a novel framework for community detection algorithms, i.e., K-Cliques, Louvain, and Fast Greedy, developed using Apache Spark GraphFrames. We test their performance and scalability on two real-world datasets. The experimental results prove the feasibility of developing graph mining algorithms using Apache Spark GraphFrames.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2024
- DOI:
- 10.48550/arXiv.2408.03966
- arXiv:
- arXiv:2408.03966
- Bibcode:
- 2024arXiv240803966A
- Keywords:
-
- Computer Science - Social and Information Networks;
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing
- E-Print:
- IEEE International Symposium on Parallel and Distributed Computing (ISPDC 2024)