rDataFusion: A Project-Specific Multi-Data Fusion Tool for Discovering, Integrating, and Visualizing Heterogeneous Long-term Data Sets
Abstract
To understand ecosystem change over a range of spatial and temporal scales and levels of biological organization and interaction, multiple streams of ecological data need to be collected, integrated, and analyzed. However, due to the size and complexity of data of these data streams and many other challenges (e.g. personnel turnover, methodological changes, and gaps in observing records), managing, analyzing, sharing, and visualization of these data has posed a significant challenge. To resolve these challenges, we are developing a multi-data fusion tool called rDataFusion, which is capable of aggregating heterogeneous data sets collected from a range of automated and semi-automated sensors and manual observations over a decade-long period. rDataFusion is being developed using the free, open-source software R shiny. rDataFusion, currently can integrate and filter data from two instrument nodes and different data streams that include micro-meteorological variables (e.g., temperature, relative humidity), soil conditions (e.g., temperature and soil moisture), and ecosystem trace gas and energy fluxes. After initial compilation and filtering, users visualize data in near real-time to check that all sensors are running properly, and/or ensure preliminary flagging for data that is deemed out of range or problematic in some way. They can also add/edit field metadata. When complete, rDataFusion will have the capacity for exploratory data analysis through quality control and quality assurance processes and allow for identifying missing values, outlier detection, and gap-filling. Future goals are to incorporate Machine Learning to filter and flag unusual data based on the alignment of related sensors, gap-fill missing or problematic data, visualize data to allow for preliminary summaries and interpretations, and compare data across time or by site. The overarching goal is to develop a custom analytic tool that aids researchers with improved capacities for aggregating different streams of data from a single intensive site by providing an open-source multi-data fusion tool that facilitates data management, sharing, and analysis and serves as a template for other research groups with similar challenges.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2020
- Bibcode:
- 2020AGUFMIN025..08N
- Keywords:
-
- 0452 Instruments and techniques;
- BIOGEOSCIENCES;
- 1848 Monitoring networks;
- HYDROLOGY;
- 1920 Emerging informatics technologies;
- INFORMATICS;
- 1964 Real-time and responsive information delivery;
- INFORMATICS