Active Data Archive Product Tracking and Automated SPASE Metadata Generation in Support of the Heliophysics Data Environment
Abstract
The understanding of Solar interaction with the Earth and other bodies in the solar system is a primary goal of Heliophysics as outlined in the NASA Science Mission Directive Science Plan. Heliophysics researchers need access to a vast collection of satellite and ground-based observations coupled with numerical simulation data to study complex processes some of which, as in the case of space weather, pose danger to physical elements of modern society. The infrastructure of the Heliophysics data environment plays a vital role in furthering the understanding of space physics processes by providing researchers with means for data discovery and access. The Heliophysics data environment is highly dynamic with thousands of data products involved. Access to data is facilitated via the Heliophysics Virtual Observatories (VxO) but routine access is possible only if the VxO SPASE metadata repositories contain accurate and up to date information. The Heliophysics Data Consortium has the stated goal of providing routine access to all relevant data products inclusively. Currently, only a small fraction of the data products relevant to Heliophysics studies have been described and registered in a VxO repository. And, for those products that have been described in SPASE, there is a significant time lag from when new data becomes available to when VxO metadata are updated to provide access. It is possible to utilize automated tools to shorten the response time of VxO data product registration via active data archive product tracking. Such a systematic approach is designed to address data access reliability by embracing the highly dynamic nature of the Heliophysics data environment. For example, the CDAWEB data repository located at the NASA Space Science Physics Data facility maintains logs of the data products served to the community. These files include two that pertain to full directory list information, updated daily, and a set of SHA1SUM hash value files, one for each of more than 30,000 individual directories present in the CDAWEB data directory tree. The SHA1SUM files contain a change log of directory content over time and show changes with time stamps and hash values for every individual data files, most of which are stored in Common Data File (CDF) format. Such a service can be built to track updates at the granular, that is, file-by-file level, whenever new data becomes available, old data is updated, or old data is deleted. It is also possible to automatically detect when wholly new data product becomes available online creating a need for SPASE metadata description and VxO product registration. Once a new product is detected, an automated service can be launched in order to auto-populate a Numerical Data metadata description by harvesting the global and variable level metadata contained within self-described data files such as a CDF. If the resultant description passes SPASE metadata validation, it can be registered in a VxO repository immediately. In the ideal case, the data provider themselves would review of the Numerical Data draft description to ensure that the metadata accurately and fully describes the data product. The full set of CDAWEB CDF data products will be used to demonstrate the utility of such an automated approach to providing routine and timely access to Heliophysics data.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2013
- Bibcode:
- 2013AGUFMSH13A1983B
- Keywords:
-
- 1916 INFORMATICS Data and information discovery;
- 2799 MAGNETOSPHERIC PHYSICS General or miscellaneous;
- 2199 INTERPLANETARY PHYSICS General or miscellaneous