Improving the Accuracy of Cloud Detection Using Machine Learning
Abstract
Cloud detection from geostationary satellite imagery has long been accomplished through multi-spectral channel differencing in comparison to the Earth's surface. The distinction of clear/cloud is then determined by comparing these differences to empirical thresholds. Using this methodology, the probability of detecting clouds exceeds 90% but performance varies seasonally, regionally and temporally. The Cloud Mask Generator (CMG) database developed under this effort, consists of 20 years of 4 km, 15minute clear/cloud images based on GOES data over CONUS and Hawaii. The algorithms to determine cloudy pixels in the imagery are based on well-known multi-spectral techniques and defined thresholds. These thresholds were produced by manually studying thousands of images and thousands of man-hours to determine the success and failure of the algorithms to fine tune the thresholds. This study aims to investigate the potential of improving cloud detection by using Random Forest (RF) ensemble classification. RF is the ideal methodology to employ for cloud detection as it runs efficiently on large datasets, is robust to outliers and noise and is able to deal with highly correlated predictors, such as multi-spectral satellite imagery. The RF code was developed using Python in about 4 weeks. The region of focus selected was Hawaii and includes the use of visible and infrared imagery, topography and multi-spectral image products as predictors. The development of the cloud detection technique is realized in three steps. First, tuning of the RF models is completed to identify the optimal values of the number of trees and number of predictors to employ for both day and night scenes. Second, the RF models are trained using the optimal number of trees and a select number of random predictors identified during the tuning phase. Lastly, the model is used to predict clouds for an independent time period than used during training and compared to truth, the CMG cloud mask. Initial results show 97% accuracy during the daytime, 94% accuracy at night, and 95% accuracy for all times. The total time to train, tune and test was approximately one week. The improved performance and reduced time to produce results is testament to improved computer technology and the use of machine learning as a more efficient and accurate methodology of cloud detection.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2017
- Bibcode:
- 2017AGUFMNG33A0185C
- Keywords:
-
- 1630 Impacts of global change;
- GLOBAL CHANGE;
- 1914 Data mining;
- INFORMATICS;
- 1942 Machine learning;
- INFORMATICS;
- 3270 Time series analysis;
- MATHEMATICAL GEOPHYSICS