Data-driven Forecasting Techniques for Harmful Algal Bloom in Western Lake Erie
Abstract
Harmful Algal Blooms (HABs) threaten public health and regional economy in Western Lake Erie (WLE) recently due to nutrient loading and abnormal climate. Timely, accurate forecasting is critical to protect large population and related businesses from HABs. However, it is a very challenging task to model and forecast the severity of HAB in a large lake system due to the complex and delayed biochemical interactions between bacteria and lake dynamics. Recent advances in hydrologic measurements and data science provide simpler, yet accurate, approaches that replace parameter-based biochemical reaction and transport models. The purpose of this study is to have data tell HABs. This study exploits the advantages of data-driven, machine-learning techniques to identify major factors and their combinations causing HABs; to train multiple machine-learning models; and to evaluate the model performance against existing NOAA HABs projections in 2002-2017. Stepwise Multiple Regression (SMR) and Genetic Programming (GP) were selected due to their popularity and capability to capture nonlinearity. The highest Cyanobacterial Index (CI) computed by NOAA is a HAB predictand for each month (Jul to Oct). Predictors are eight monthly variables for Maumee River Basin and WLE (discharge, total phosphorous (P), P loading, soluble reactive P, total nitrogen, water temperature, air temperature, and wind speed). Six lag-times (1 to 6 mon) and four averaging durations (2 to 5 mon) were combined for each variable, yielding total 192 combinations per month. To feed more meaningful predictors to the models, which are dominated by data quality as well as quantity, the Spearman rank correlation was used. Models were trained for two training periods (2002-2011 & 2002-2014). Results reveals that 1) each HAB month was modeled by various combinations of data, lag-time, and averaging duration; 2) GP overall outperforms SMR in training and prediction, while SMR trains better in particular months; 3) both models underperform in untrained HAB events; 4) both models are simple to update for real operation. Although prediction uncertainty needs to be addressed with data quality in the future, this study clearly demonstrates the applicability of machine-learning models to forecasting HABs, contributing to a member of multi-model projections.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMNH13C0704K
- Keywords:
-
- 4301 Atmospheric;
- NATURAL HAZARDSDE: 4302 Geological;
- NATURAL HAZARDSDE: 4313 Extreme events;
- NATURAL HAZARDSDE: 4333 Disaster risk analysis and assessment;
- NATURAL HAZARDS