Using the Principles of F.A.I.R Data to Improve the Measure of Value of Big Data and Big Data Repositories
Abstract
In a data-intensive world, finding the right data can be time-consuming and, when found, may involve compromises on quality and often considerable extra effort to wrangle it into shape. This is particularly true as users are exploring new and innovative ways of working with data from different sources and scientific domains. It is recognised that the effort and specialist knowledge required to transform datasets to meet these requirements goes beyond the reasonable remit of a single research project or research community. Instead, Government investments in national collaborations like the Australian National University's National Computational Infrastructure (NCI), provide a sustainable way to bring together and transform disparate data collections from a range of disciplines in ways which enable new and innovative analysis and use. With these goals in mind, the NCI established a Data Quality Strategy (DQS) for managing 10PB of reference data collections with a particular focus on improving data use and reuse across scientific domains, making the data suitable for use in a high-end computational and data-intensive environment, and supporting programmatic access for a range of applications. Evaluating how effectively we're achivieving these goals and maintaining ongoing funding requires demonstration of the value and impact of these data collections. Standard approaches to measuring data value involve basic measures of `data usage' or make an attempt to track data to `research outcomes'. While useful, these measures fail to capture the value of the level of curation or quality assurance in making the data available. To fill this gap, NCI has developed a 3-tiered approach to measuring the return on investment which broadens the concept of value to include improvements in access to and use of the data. Key to this approach was integrating the guiding principles of the Force 11 community's F.A.I.R data into the DQS because it provides a community-driven standards-based framework which can be used for metrics. The NCI metrics provide useful information for data users, data custodians as well as data repositories and, most importantly, can be used to demonstrate the return on investment in both quantitative and qualitative terms.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2017
- Bibcode:
- 2017AGUFMIN42C..08R
- Keywords:
-
- 1904 Community standards;
- INFORMATICS;
- 1912 Data management;
- preservation;
- rescue;
- INFORMATICS;
- 1978 Software re-use;
- INFORMATICS;
- 6610 Funding;
- PUBLIC ISSUES