Random Forest-based Understanding of Earth System Model Predictions of Phytoplankton Diatom
Abstract
Phytoplankton form the base of the marine food web. Understanding the controls on their growth is fundamental to understanding marine ecology. Diatoms constitute nearly half of the biomass found in the oceans, generate 20 to 50 percent of the oxygen produced yearly, and take up over 6.7 billion tons of silicon yearly. Hence, it is important to understand the factors which affect their growth.
While Earth System Models (ESMs) predict diatom biomass, it is often unclear as to which parameters or their combination the ESMs rely on and to what extent. Here, the endeavor is to decipher the relative importance that an ESM attaches to environmental input predictors. The prediction of an ESM is simulated with Random Forest (RF). The target variable is the diatom mass and predictors are nutrients, light, mixed layer depth, salinity, temperature, upwelling etc. The aim is to train the RF algorithm to predict the diatom biomass with high accuracy taking the ESM's output as the ground truth. Once achieved, it is fair to assume that the RF algorithm is able to simulate the workings of the ESM and hence, the input parameters that played a crucial role in training the RF should also be important for the ESM prediction. The feature importance analysis is carried out with the following three methodologies to obtain the features significant for prediction. Built-in feature importance: The RF has built-in feature importance, 'Gini importance' (or mean decrease impurity). The feature importance of a feature is given by the measure of how much the feature decreases the impurity of the split across all Decision Trees in the RF. Permutation based importance: This method randomly shuffles each feature value and computes the change in the model's performance. The features which impact the performance the most are the most important ones. Feature importance with SHAP values: The SHAP interpretation uses Shapley values from game theory to estimate how does each feature contribute to the prediction. The feature importance obtained from the above different methodologies gives a fair understanding of which of the input predictors play the most significant roles in ESMs for predicting diatom growth in the ocean. It therefore both gives insight into the workings of the ESM and supports its application to observations.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFMOS32B1023D