Quantifying Seasonal Variation in Cloud Cover with Predictive Models
Abstract
An important problem that arises in many earth science problems is that of breaking down an observed signal into several components, including a seasonal component and a component that contains the signal of interest. Usually, a model is used to separate the signal into these components. This research presentation attempts to quantify the degree of variation induced into this process by the model itself. The larger the degree of variation due to the model, the lower the confidence we can have in our results and the subsequent scientific interpretation of the results. In order to address this question, we consider the following scenario, where the end goal is to build a predictive model to estimate the yearly cloud cover over the Amazon region based on a sample of 12 multi-spectral image cubes from MODIS. These multi-spectral image cubes are at 500 m resolution at 6 channels at the following bandwidths (Channel 1: 620-670, Channel 2: 841-876, Channel 3: 459-479, Channel 4: 545-645, Channel 5: 1230-1250, Channel 6: 1628-1652. All bandwidths are in microns). These six channels are useful for characterizing land, cloud, and aerosol content in the scene. We will model the nonlinear relationships between Channels 1-5 and Channel 6 using regression techniques. Channel 6, operating at about 1.6 microns, is well suited to distinguish clouds from the background scene. This paper discusses the results of a novel experiment, in which we compare the performance of two predictive models drawn from one model class known as Gaussian Processes (GP), but built with two different data sets. The first GP is built using data sampled from the entire set of 12 image cubes, whereas the second model is built using only those data for a particular season. We compare the performance of these two models using a variety of statistical techniques to determine whether the single GP model is capable of modeling the variation in the data observed across seasons. The second model, which is built only on the data of a given season, will be compared to the single GP in terms of both the predictions as well as the systematic differences between the two models based on the scenes themselves. We choose to use the model class of Gaussian Processes, because they have very useful properties that are well suited for this problem. When used in regression problems such as this one, the GP's emit the expected value of the target, which in this case is Channel 6 given the inputs, which in this case is Channels 1-5. GP's also generate an estimate of the uncertainty in the prediction based on the underlying distribution of the data. This uncertainty estimate is essential for the characterization of the quality of the two models. The methods that we use in this study are part of a larger family of algorithms that we are developing at NASA Ames Research Center, known as Virtual Sensors. Virtual Sensors are a class of mathematical algorithms that learn the potentially nonlinear correlations between channels in remote sensing satellites so that they can predict the value of a target channel given the inputs. This technique is useful for enabling retroactive long term climate models that would otherwise be impossible to obtain. They can be applied to predict the value of the target channel in situations where similar input channels are available from a different instrument. We will conclude the presentation with a discussion of Virtual Sensors and their applicability to this experiment.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2005
- Bibcode:
- 2005AGUFMIN33D..05S
- Keywords:
-
- 0430 Computational methods and data processing;
- 0480 Remote sensing;
- 3238 Prediction (3245;
- 4263);
- 3275 Uncertainty quantification (1873);
- 3311 Clouds and aerosols