Data History Considerations on Deep Learning Forecast of Cyanobacterial bloom
Abstract
Climate change is intensifying algal bloom globally. Forecasting cyanobacterial harmful algal blooms (CyanoHABs) is extremely important since taking proactive actions can prevent negative effects of their toxins in our valuable water resources for drinking purpose and ecological preservation. Various factors influencing CyanoHABs (regional meteorology, river hydrology and hydraulics, pollutant source and water quality characteristics, and aquatic ecological characteristics) are subject to be shifted temporally in the current era of climate change. If so, deep learning, the commonly used techniques in numerous fields, for CyanoHAB forecast would be dependent upon the history of regional hydrological and environmental characteristics. Such hypothesis has yet to be examined. To examine the history dependency, we collected publicly accessible cyanobacteria concentration, water quality and meteorological data in Nakdong river, South Korea over 9 years. The first 8-year data was used to train and the last year was used to test the Recurrent Neural Network (RNN) model. The training data was split into different periods (0.5, 1.0, 2, 4, 6 and 7 years), and different sizes (random selection having equivalent amounts as 0.5, 1.0, 2, 4, 6 and 7 years). RNN forecasting performances were compared among the full training data and the subsets. In the length of period versus forecast accuracy curve, the highest accuracy was estimated between the shortest and longest periods. As the length of the period decreased from the highest accuracy point, the model performance was hindered due to insufficient training with the smaller amounts of training data. Meanwhile, as the length of period increased from the highest accuracy point, the decrease in accuracy is attributed to uncertain temporal characteristics of river hydraulics and nonpoint source pollution distribution. To further optimize the history dependent forecasting, a systematic methodological guideline to select the window size of the time-series independent variables used in the model input, and the forecast horizon (i.e. the lead time of the forecast) was proposed based upon the results from this work. These findings provide significant implications on the history dependent considerations on CyanoHABs forecasts using deep learning.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2021
- Bibcode:
- 2021AGUFMGH15E0626K