Real-Time Construction Algorithm of Co-Occurrence Network Based on Inverted Index
Abstract
Co-occurrence networks are an important method in the field of natural language processing and text mining for discovering semantic relationships within texts. However, the traditional traversal algorithm for constructing co-occurrence networks has high time complexity and space complexity when dealing with large-scale text data. In this paper, we propose an optimized algorithm based on inverted indexing and breadth-first search to improve the efficiency of co-occurrence network construction and reduce memory consumption. Firstly, the traditional traversal algorithm is analyzed, and its performance issues in constructing co-occurrence networks are identified. Then, the detailed implementation process of the optimized algorithm is presented. Subsequently, the CSL large-scale Chinese scientific literature dataset is used for experimental validation, comparing the performance of the traditional traversal algorithm and the optimized algorithm in terms of running time and memory usage. Finally, using non-parametric test methods, the optimized algorithm is proven to have significantly better performance than the traditional traversal algorithm. The research in this paper provides an effective method for the rapid construction of co-occurrence networks, contributing to the further development of the Information Organization fields.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2023
- DOI:
- 10.48550/arXiv.2308.08756
- arXiv:
- arXiv:2308.08756
- Bibcode:
- 2023arXiv230808756C
- Keywords:
-
- Computer Science - Information Retrieval;
- Computer Science - Data Structures and Algorithms
- E-Print:
- 10 pages, 8 figures