Deep Ensemble Learning Strategy for the Estimation of Surface Ozone Concentrations in China
Abstract
Accurate estimation of surface ozone (O3) concentrations is critical and essential for the scientific community and public health as short-term O3 exposure has been related to various adverse health impacts. The present study aimed to develop a clustering-based stacking ensemble learning strategy for improving the estimation of daily surface O3 concentrations at a 5-km spatial resolution and explored the spatiotemporal variations of O3 exposure in China in 2020. The novel deep modeling framework had three stages. At the first stage, geographically weighted regression and hierarchical clustering methods were used to divide the extensive study area with large heterogeneity into six small, relatively homogeneous regions. At the second stage, three ensemble learning models, including random forest (RF), gradient boosting model (GBM), and eXtreme gradient boosting (XGboost), were used to estimate O3 concentrations. At the third stage, a ridge regression that can overcome the collinearity problem in the data was used to determine the optimal weights for the O3 estimates by the three models in the second stage. In addition to the surface O3 measurements, the meteorological (relative humidity, precipitation, temperature, U-wind, V-wind, planetary boundary layer height, etc.) and surface-condition (road density, population density) data used in this study, total ozone column data with a high spatial resolution of 3.5 × 7.5 km were obtained from the Sentinel-5P satellite to help the model interpret the fine-resolution gradient of O3 loading. The model performance was evaluated by sample-based, spatial, and temporal 10-fold cross-validation (CV) approaches. The validation results indicate that the proposed model with sample-based CV R2 of 0.84 outperformed any individual models (CV R2 of <0.80) and can improve the spatial uncertainty with better spatial CV R2 (0.87) than sample-based CV R2 (0.84). Exposure analysis based on the high-resolution daily estimates demonstrates that the areas with unhealthy daily O3 concentrations (MDA8>160 μg/m3) clustered in the North China Plain, with such days exceeding 10% and concentrated in the summertime. These results confirm the significance of implementing policies and measures to control O3 in China.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFM.A45I1940H