Data Mining on Collection of Papers: Recognizing the Emerging Issues in Water and Infectious Disease
Abstract
Human development and population growth exert pressures on the quality and quantity of water resources, among which, infectious water-related diseases are a major cause of morbidity and mortality worldwide. They are prevalent in tropical areas, and since global climate is occurring, this will become more important as time progresses. The associated research and studies have been expanding exponentially for decades. Given tons of scientific publications on water and infectious diseases, the result is that scientists are literally drowning in data and information. There are strategies and approaches that could help with this problem. The goal of this paper is to demonstrate the approaches such as data mining and machine learning to evaluate large collections of papers. The objective is to conduct a systematic analysis of research related to the emerging area of water and infectious diseases. More specifically, the analysis of information from the database of papers will examine systematics in the research topics, the inter-relationships among water-associated terms and water-borne diseases, and discover styles of research associated with them. The analysis uses twenty-six thousand papers (1930 2021) retrieved from a bibliographic database, Scopus, given the combination of nine water associated terms and nine popular water-borne diseases. We developed tools that conduct text processing steps, which lead to clustering to demonstrate published research. The collection of papers is subdivided into article clusters according to their contents. The cluster topics were determined by analyzing keywords or common words contained in the articles titles, abstracts, and key words. The clustering results demonstrate necessary connections between water environments and the infectious diseases, as well as the inner relationships among them. Preliminary results show that the study of this area is still growing and cluster analysis is a robust tool to analyze inter-relationship among factors affecting infectious diseases.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2021
- Bibcode:
- 2021AGUFMGH35B0672F