Near-Real Time Anomaly Detection for Scientific Sensor Data
Abstract
Environmental scientists use advanced sensor technology such as meteorological towers, wireless sensor networks and robotic trams equipped with sensors to perform data collection at remote research sites. Because the amount of environmental sensor data acquired in real time by such instruments is increasing, both the ability to evaluate the accuracy of the data at near-real time and check that the instrumentation is operating correctly are critical in order to not lose valuable time and information. The goal of the research is to define a software engineering-based solution that provides the foundation to define reusable templates for formally specifying data properties and automatically generate programming code that can monitor data streams to identify anomalies at near real-time. The research effort has resulted in a data property categorization that is based on a literature survey of 15 projects that collected environmental data from sensors and a case study conducted in the Arctic. More than 500 published data properties were manually extracted and analyzed from the surveyed projects. The data property categorization revealed recurrent data patterns. Using these patterns and the Specification and Pattern System (SPS) from the software-engineering community as a model, we developed the Data Specification and Pattern System (D-SPS) to capture data properties. D-SPS is the foundation for the Data Property Specification (DaProS) prototype tool that assists scientists in specification of sensor data properties. A series of experiments have been conducted in collaboration with experts working with Eddy covariance (EC) data from the Jornada Basin Experimental Range (JER) and with hyper-spectral data collected using robotic tram systems from the Arctic. The goal of the experiments were to determine if the approach for specifying data properties is effective for specifying data properties and identifying anomalies in sensor data. A complementary Sensor Data Verification (SDVe) prototype tool identified anomalies detected by the expert-specified data properties over the EC data. Scientists using DaProS and SDVe were able to detect environmental variability, instrument malfunctioning, and seasonal and diurnal variability in EC and hyperspectral datasets. The results of the experiment also yielded insights regarding the practices followed by scientists to specify data properties, and it exposed new data properties challenges and a potential method for capturing data quality confidence levels.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2011
- Bibcode:
- 2011AGUFMIN11C1306G
- Keywords:
-
- 1908 INFORMATICS / Cyberinfrastructure;
- 1924 INFORMATICS / Formal logics and grammars;
- 1950 INFORMATICS / Metadata: Quality;
- 1990 INFORMATICS / Uncertainty