Improving data reusability for high performance datasets
Abstract
In the next few years, there will be a major increase in computational power at many peak scientific computing centers. Many of these centers are now attempting to understand how to also manage an enormous amount of new data that will come on-line from many science domains - particularly from major instruments and curated model outputs. With the data too big or too complex to move, we are now in the post-download era. In doing so, the data needs to be better organised to make it more tractable for secondary re-use analysis at-scale - making increasingly more programmatically accessible for a broader range of use-cases.
Over the last several years, NCI has been focused on improving computational access to some major national reference datasets from across the Earth System. The data at NCI has also been significantly used by the wider community via remote access data services, including server-side data processing which makes use of the co-location of curated data and computational processing power. One challenge is to enable the quality of this data for a range of techniques to be usable and interoperable across multiple domains: this necessitates an increased focus on "FAIR data" principles- Findable, Accessible, Interoperable and Reusable. Our focus here is on what is needed in our quality tests to underpin seamless programmatic access to data in high performance environments across multiple domains. While this places additional requirements on the suppliers of both the data and metadata, the result is that data can be even more accessible- for primary use, secondary use, and citability through publication processes.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMIN21A..10D
- Keywords:
-
- 1904 Community standards;
- INFORMATICS;
- 1910 Data assimilation;
- integration and fusion;
- INFORMATICS;
- 1912 Data management;
- preservation;
- rescue;
- INFORMATICS;
- 1916 Data and information discovery;
- INFORMATICS