Data Discovery of Big and Diverse Climate Change Datasets - Options, Practices and Challenges
Abstract
Developing data search tools is a very common, but often confusing, task for most of the data intensive scientific projects. These search interfaces need to be continually improved to handle the ever increasing diversity and volume of data collections. There are many aspects which determine the type of search tool a project needs to provide to their user community. These include: number of datasets, amount and consistency of discovery metadata, ancillary information such as availability of quality information and provenance, and availability of similar datasets from other distributed sources. Environmental Data Science and Systems (EDSS) group within the Environmental Science Division at the Oak Ridge National Laboratory has a long history of successfully managing diverse and big observational datasets for various scientific programs via various data centers such as DOE's Atmospheric Radiation Measurement Program (ARM), DOE's Carbon Dioxide Information and Analysis Center (CDIAC), USGS's Core Science Analytics and Synthesis (CSAS) metadata Clearinghouse and NASA's Distributed Active Archive Center (ORNL DAAC). This talk will showcase some of the recent developments for improving the data discovery within these centers The DOE ARM program recently developed a data discovery tool which allows users to search and discover over 4000 observational datasets. These datasets are key to the research efforts related to global climate change. The ARM discovery tool features many new functions such as filtered and faceted search logic, multi-pass data selection, filtering data based on data quality, graphical views of data quality and availability, direct access to data quality reports, and data plots. The ARM Archive also provides discovery metadata to other broader metadata clearinghouses such as ESGF, IASOA, and GOS. In addition to the new interface, ARM is also currently working on providing DOI metadata records to publishers such as Thomson Reuters and Elsevier. The ARM program also provides a standards based online metadata editor (OME) for PIs to submit their data to the ARM Data Archive. USGS CSAS metadata Clearinghouse aggregates metadata records from several USGS projects and other partner organizations. The Clearinghouse allows users to search and discover over 100,000 biological and ecological datasets from a single web portal. The Clearinghouse also enabled some new data discovery functions such as enhanced geo-spatial searches based on land and ocean classifications, metadata completeness rankings, data linkage via digital object identifiers (DOIs), and semantically enhanced keyword searches. The Clearinghouse also currently working on enabling a dashboard which allows the data providers to look at various statistics such as number their records accessed via the Clearinghouse, most popular keywords, metadata quality report and DOI creation service. The Clearinghouse also publishes metadata records to broader portals such as NSF DataONE and Data.gov. The author will also present how these capabilities are currently reused by the recent and upcoming data centers such as DOE's NGEE-Arctic project.
References: [1] Devarakonda, R., Palanisamy, G., Wilson, B. E., & Green, J. M. (2010). Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics, 3(1-2), 87-94. [2]Devarakonda, R., Shrestha, B., Palanisamy, G., Hook, L., Killeffer, T., Krassovski, M., ... & Frame, M. (2014, October). OME: Tool for generating and managing metadata to handle BigData. In BigData Conference (pp. 8-10).- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2013
- Bibcode:
- 2013AGUFMIN31C1513P
- Keywords:
-
- 0525 COMPUTATIONAL GEOPHYSICS Data management;
- 0321 ATMOSPHERIC COMPOSITION AND STRUCTURE Cloud/radiation interaction