A Machine Learning Testbed for the Retrieving PM2.5 Using Different Combination of Sensors and Models
Abstract
In the study we present a machine learning testbed to estimate PM2.5 concentrations with a combination of multiple satellites' data and modeling output. We show that a hierarchy of model complexities that take in different levels of input richness can have significant performances in estimating PM2.5 with satellite observations. The input richness primarily refers the measured vertical structure of aerosol extinction and the aerosol size and type information. Our models are all trained on hundreds of millions of data points that are synthetically derived using outputs from the NASA GEOS model, which makes the training set one of the largest to date. We show that with realistic satellite observations and physical model outputs machine learning models can range from almost perfectly predicting PM2.5 concentration with the right observations to good estimates even with bare-bone observations. Our models are all global in nature, i.e. they are not tuned for a specific region. Once trained, they can be applied to the whole globe consistently. Even though our models are trained on synthetic GOES simulation data, it can be applied to real observations and achieve better performance than the baseline methods. Furthermore, we can quantify the value of new satellite data by assessing their impact on the model performance. In particular, we show that geostationary data are invaluable to provide high resolution temporal change of PM2.5 with our models.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFM.A55K1243Y