Remote sensing & Statistical learning approach to Harmful Algal Bloom Forecasting Using MODIS Ocean colour parameters
Abstract
For the last few decades, Harmful Algal Bloom (HAB; aka Karenia brevis) has become one of the most deteriorative natural phenomena in Florida's coastal areas. Karenia Brevis produces toxins that have harmful effects on people, fisheries, and ecosystems. In this study we developed and compared the use of state-of-the-art data-driven statistical learning models (e.g., Decision Tree, Random Forest, Bagging and Support Vector Machine) to predict the occurrence of HABs. In the proposed models the number of "Karenia brevis" cells (cells/L) in surface water samples is used as the response variable and ten level-02 ocean color parameters (euphotic depth, Secchi disk depth, chlor_a, chl_gsm, chl_giop, Kd_490, SST, FLH, particulate backscattering coefficient, turbidity index) extracted from daily archival MODIS satellite data are used as controlling factors. The adopted approach addresses two main shortcomings of earlier models: (1) the paucity of satellite data due to cloudy scenes, and (2) the lag time between the period at which a variable reaches its highest correlation with the target (onset of HAB) and the time the bloom occurs. Ten spatio-temporal models were generated, each from three consecutive satellite day datasets, with a forecasting span of zero and up to nine days. In the generation of the models the dataset was split into training (80%) and testing (20%). The 3-day models outperformed the single and two day models and addressed the potential variations in lag time from one variable to another. One or more of the generated ten models could be used to predict HAB occurrences depending on availability of the cloud-free consecutive days. The confusion matrix was adopted to evaluate, compare and contrast models using the testing dataset. Findings indicate: (1) the Random Forest outperformed the remaining models, (2) the forecasting models of 4-8 days achieved the best results, and (3) the most reliable model can forecast seven days ahead of time with overall accuracy, Kappa coefficient, and F-Score of 98%, 0.95, and 0.96 respectively, (4) Sea Surface Temperature (SST) and Chlorophyll-a are always among the most significant variables, and (5) the proposed models could potentially be used to develop an "Early Warning System " for HABs in southwest Florida with short to long term forecasting capabilities.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2020
- Bibcode:
- 2020AGUFMIN011..09I
- Keywords:
-
- 1912 Data management;
- preservation;
- rescue;
- INFORMATICS;
- 1916 Data and information discovery;
- INFORMATICS;
- 1942 Machine learning;
- INFORMATICS;
- 1960 Portals and user interfaces;
- INFORMATICS