What can we learn from machine learning models developed for short-term forecasting of PM2.5?
Abstract
Accurate forecasting of air pollution concentrations, both short- and long-term, can provide sensitive and other groups with some warning, thus enabling them to take action to reduce exposure to unhealthy levels of air pollution. We explore how different machine algorithms perform in predicting short-term 1-3 hour forecasts of PM2.5 concentrations at 5 different sites in Oregon. We use hourly PM2.5 and meteorological observations from Oregon's air quality monitoring network for the years from 2012-2016 as the training data set, and the 2017 PM2.5 data as the validation dataset. The accuracy of the machine learning models (MLMs) is estimated using the following metrics: (1) Comparison of the mean square errors for the MLM 1-hour predictions with the Reff model. (2) The mean bias and mean error of the 1-, 2-, and 3-hour predictions for each site as compared to the observed concentrations. (3) The slope and R2 of the best-fit line between predicted vs. observed concentrations.
MLMs are traditionally treated as black boxes. We use a combination of residual analysis, sensitivity analysis, and visualizations to "peek" instead the black box. Our goal is to explore what we can learn from the MLM performance; and to harness these insights into picking the right algorithm and building better models.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFM.A51U2679R
- Keywords:
-
- 0365 Troposphere: composition and chemistry;
- ATMOSPHERIC COMPOSITION AND STRUCTURE;
- 3336 Numerical approximations and analyses;
- ATMOSPHERIC PROCESSES;
- 0520 Data analysis: algorithms and implementation;
- COMPUTATIONAL GEOPHYSICS;
- 0555 Neural networks;
- fuzzy logic;
- machine learning;
- COMPUTATIONAL GEOPHYSICS