Gaussian Process Regression for Uncertainty Estimation on Ecosystem Data
Abstract
The flow of carbon between terrestrial ecosystems and the atmosphere is mainly driven by nonlinear, complex and time-lagged processes. Understanding the associated ecosystem responses and climatic feedbacks is a key challenge regarding climate change questions such as increasing atmospheric CO2 levels. Usually, the underlying relationships are implemented in models as prescribed functions which interlink numerous meteorological, radiative and gas exchange variables. In contrast, supervised Machine Learning algorithms, such as Artificial Neural Networks or Gaussian Processes, allow for an insight into the relationships directly from a data perspective. Micrometeorological, high resolution measurements at flux towers of the FLUXNET observational network are an essential tool for obtaining quantifications of the ecosystem variables, as they continuously record e.g. CO2 exchange, solar radiation and air temperature. In order to facilitate the investigation of the interactions and feedbacks between these variables, several challenging data properties need to be taken into account: noisy, multidimensional and incomplete (Moffat, Accepted). The task of estimating uncertainties in such micrometeorological measurements can be addressed by Gaussian Processes (GPs), a modern nonparametric method for nonlinear regression. The GP approach has recently been shown to be a powerful modeling tool, regardless of the input dimensionality, the degree of nonlinearity and the noise level (Rasmussen and Williams, 2006). Heteroscedastic Gaussian Processes (HGPs) are a specialized GP method for data with a varying, inhomogeneous noise variance (Goldberg et al., 1998; Kersting et al., 2007), as usually observed in CO2 flux measurements (Richardson et al., 2006). Here, we showed by an evaluation of the HGP performance in several artificial experiments and a comparison to existing nonlinear regression methods, that their outstanding ability is to capture measurement noise levels, concurrently providing reasonable data fits under relatively few assumptions. On the basis of incomplete, half-hourly measured ecosystem data, a HGP was trained to model NEP (Net Ecosystem Production), only with the drivers PPFD (Photosynthetic Photon Flux Density) and Air Temperature. Time information was added to account for the autocorrelation in the flux measurements. Provided with a gap-filled, meteorological time series, NEP and the corresponding random error estimates can then be predicted empirically at high temporal resolution. We report uncertainties in annual sums of CO2 exchange at two flux tower sites in Hainich, Germany and Hesse, France. Similar noise patterns, but different magnitudes between sites were detected, with annual random error estimates of +/- 14.1 gCm^-2yr^-1 and +/- 23.5 gCm^-2yr^-1, respectively, for the year 2001. Existing models calculate uncertainties by evaluating the standard deviation of the model residuals. A comparison to the methods of Reichstein et al. (2005) and Lasslop et al. (2008) showed confidence both in the predictive uncertainties and the annual sums modeled with the HGP approach.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2011
- Bibcode:
- 2011AGUFM.B21F0332M
- Keywords:
-
- 0426 BIOGEOSCIENCES / Biosphere/atmosphere interactions;
- 1942 INFORMATICS / Machine learning;
- 1990 INFORMATICS / Uncertainty