Machine Learning for Accelerating Science
Abstract
The ability to rapidly, efficiently, and accurately apply quality assurance criteria to streaming data is essential for timely approval and online posting of near real-time data. An automated process becomes increasingly important as the number of monitoring stations and volume of data increase. Timely availability of near real-time data allows users to respond to acute events that are time critical, such as monitoring potential grass seedling germination, nutrient transport, or acute erosion events as identified by comparisons to similar past conditions or data models. We present a system that automatically processes data from an array of remote sensors in a wireless network and makes these data available online for exploration, experimentation, and publication. We perform the QA/QC process automatically by applying rules to the data collected. Since the rules could be unable to detect all incorrect data or could create false positives, we use machine learning to validate the QA/QC process automatically. The training data result from previous instances of a user manually approving data. The idea is to transfer the knowledge of an experienced scientist to the automatic validation and approval of future data. In addition, using our machine learning approach, we can identify events that are causing recurrent errors in data collection.
We have interconnected our QA/QC system to our knowledge, learning, and analysis system (KLAS) to perform experiments. Users can easily perform experiments with data collected by sensors across field sites. One of the main components of KLAS is the recommendation system based on the identification of the relationships between datasets, processes, and user profiles of previous experiments. These relationships are learned using machine learning and then used to make recommendations to serve as a guide for experiments for future users. These recommendations could address uncertainty about what process to follow, what parameters to use, and what data transformations to use. We also implemented tools to help the scientific community to reuse data, methods, and models. Well documented datasets with thorough, consistently applied quality assurance criteria ensure models that have high value and that provide interpretive strength to analyses.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN53E0671R
- Keywords:
-
- 1912 Data management;
- preservation;
- rescue;
- INFORMATICSDE: 1916 Data and information discovery;
- INFORMATICSDE: 1920 Emerging informatics technologies;
- INFORMATICSDE: 1976 Software tools and services;
- INFORMATICS