Improving Scaling Properties of Common Statistical Operators for Gridded Geoscience Datasets
Abstract
An accurate cost-model that accounts for dataset size and structure can help optimize geoscience data analysis. We develop and apply a computational model to estimate data analysis costs for arithmetic operations on gridded datasets typical of satellite- or climate model-origin. For these dataset geometries our model predicts data reduction scalings that agree with measurements of widely-used geoscience data processing software, the netCDF Operators (NCO). I/O performance and library design dominate throughput for simple analysis (e.g., dataset differencing). Dataset structure can reduce analysis throughput ten-fold relative to same-sized unstructured datasets. We demonstrate algorithmic optimizations which substantially increase throughput for more complex, arithmetic-dominated analysis such as weighted-averaging of multi-dimensional data. Two methods for distributing simultaneous analysis of all variables in data files are intercompared: 1) OpenMP-threading across the set of variables, 2) MPI distribution of variables across a cluster. These scaling properties can help to estimate costs of distribution strategies for data reduction in cluster and grid environments. We show how these priniciples accelerate terascale data reduction by benchmarking the time for NCO to characterize the variability of climate simulations from the Intergovernmental Panel on Climate Change (IPCC) fourth assessment report (AR4). Our algorithms accelerate unoptimized data reduction about tenfold. This improvement is generic in that the same algorithms and operators apply to datasets from any geoscience model producing gridded, multi-dimensional datasets of similar rank.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2006
- Bibcode:
- 2006AGUFMIN53B0827M
- Keywords:
-
- 0432 Contaminant and organic biogeochemistry (0792);
- 0434 Data sets;
- 0520 Data analysis: algorithms and implementation;
- 1616 Climate variability (1635;
- 3305;
- 3309;
- 4215;
- 4513)