A Multi-Variate Triple-Regression Forecasting Algorithm for Long-Term Customized Allergy Season Prediction
In this paper, we propose a novel multi-variate algorithm using a triple-regression methodology to predict the airborne-pollen allergy season that can be customized for each patient in the long term. To improve the prediction accuracy, we first perform a pre-processing to integrate the historical data of pollen concentration and various inferential signals from other covariates such as the meteorological data. We then propose a novel algorithm which encompasses three-stage regressions: in Stage 1, a regression model to predict the start/end date of a airborne-pollen allergy season is trained from a feature matrix extracted from 12 time series of the covariates with a rolling window; in Stage 2, a regression model to predict the corresponding uncertainty is trained based on the feature matrix and the prediction result from Stage 1; in Stage 3, a weighted linear regression model is built upon prediction results from Stage 1 and 2. It is observed and proved that Stage 3 contributes to the improved forecasting accuracy and the reduced uncertainty of the multi-variate triple-regression algorithm. Based on different allergy sensitivity level, the triggering concentration of the pollen - the definition of the allergy season can be customized individually. In our backtesting, a mean absolute error (MAE) of 4.7 days was achieved using the algorithm. We conclude that this algorithm could be applicable in both generic and long-term forecasting problems.