SCEC Community Modeling Environment (SCEC/CME) - Data and Metadata Management Issues
Abstract
One of the goals of the SCEC Community Modeling Environment is to facilitate the execution of substantial collections of large numerical simulations. Since such simulations are resource-intensive, and can generate extremely large outputs, implementing this concept raises a host of data and metadata management challenges. Due to the high computational cost involved in running these simulations, one must balance the cost of repeating such simulations against the burden of archiving the produced datasets making them accessible for future use such as post processing or visualization, without the need of re-computation. Further, a carefully selected collection of such data sets might be used as benchmarks for assessing accuracy and performance of future simulations, developing post-processing software such as visualization tools, and testing data and metadata management strategies. The problem is rapidly compounded if one contemplates the possibility of computing ensemble averages for simulations of complex nonlinear systems. The definition and organization of a complete set of metadata to describe fully any given simulation is a surprisingly complex task, which we approach from the point of view of developing a community digital library, which provides the means to organize the material, as well as standard metadata attributes. Web-based discovery mechanisms are then used to support browsing and retrieval of data. A key component is the selection of appropriate descriptive metadata. We compare existing metadata standards from the digital library community, federal standards, and discipline specific metadata attributes. The digital library community has developed a standard for organizing metadata, called the Metadata Encoding and Transmission Standard (METS). This schema supports descriptive (provenance), administrative (location), structural (component relationships), and behavioral (display and manipulation applications). The organization can be augmented with discipline specific extension schemata. Candidates include the FGDC spatial data standard, the ISO 19115 schema for geographic data, and the Storage Resource Broker authenticity metadata. Other candidates include various metadata schemata used in observational seismology. We are also considering metadata attributes that are being developed within the SCEC community and are specific to the requirements of that community. A comparison of the metadata attributes will be presented, along with their use in the organization of simulation output from a large-scale anelastic wave prediction simulation, The SDSC Storage Resource Broker (SRB) provides the data handling capabilities to manage the Terabyte scale simulation output, providing support for ingestion, organization, description, preservation and access of datasets. The metadata attributes include, in particular, descriptive information about the simulation run, simulation input parameters, the computational infrastructure, the physical geometry of the problem, and output structure.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2003
- Bibcode:
- 2003AGUFMNG11A0177M
- Keywords:
-
- 3230 Numerical solutions;
- 7209 Earthquake dynamics and mechanics;
- 7212 Earthquake ground motions and engineering;
- 7260 Theory and modeling