Defining Best Practices in a High Performance Petascale Data Repository to Enable Interdisciplinary Data-intensive Research at the Australian National Computational Infrastructure
Abstract
The National Computational Infrastructure (NCI) at the Australian National University has created a multi-petabyte sized data repository, including local and internationally replicated datasets, and made them available within its national peak HPC facility. This repository has been particularly focused on highly-used reference datasets from multiple domains within the Earth System, most notably climate, weather, the environment and geophysics. As well as the increasing need to make data available, the repository has needed to support the growth of data-intensive research in HPC environments over recent years. These have driven us to make significant improvements in both the quality and organisation of the data in order to make them ready for high performance data analysis and a range of software and tools.
To manage our repository, we have adopted a pragmatic best practice approach to community standards so as to enable the data to be both increasingly FAIR and usable and with better fidelity. Our work includes developing processes for managing international data replication, providing local data QC/QA, managing the data publications and enabling data reuse across multiple disciplines. This has been an ongoing journey of improvements as datasets are often not in a form that are readily reusable by different communities, or to meet community-agreed standards required by software, data services and information systems. In particular, working with the research communities we strive to ensure the data are suitable for use in digital analysis environments as well as for other access that meets the needs of current priority use-cases. Working in this systematic way, we aim to increase the capability and scale of our data handling processes, that will continue to enable even more innovative data-intensive techniques and interdisciplinary collaborations into the future.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMIN14B..04D
- Keywords:
-
- 1908 Cyberinfrastructure;
- INFORMATICS;
- 1912 Data management;
- preservation;
- rescue;
- INFORMATICS;
- 1930 Data and information governance;
- INFORMATICS;
- 1934 International collaboration;
- INFORMATICS