Gravitational clustering of Dark Data
Abstract
For the purposes of this discussion Dark Data is data that we know is being generated in science but is not currently being adequately curated. This data is not easily reused and is more likely to be lost permanently as time progresses. While some very large research initiatives generate large volumes of data this very volume insures that there will be dedicated staff and resources for managing the data so while this data may become dark it is less likely than data from smaller, independently manages projects. While there is a trend toward larger scientific collaborations the number of small projects greatly outnumber 'large" projects in terms of both funds available and number of collaborators. Because of the multiplicity of small projects their collective data volume qualifies this as a big data problem. One prospect for properly curating this data is to cluster like data or like process to reduce the operational cost for collecting and managing the data while making the data more easy to find for perspective reuse. This can be accomplished with disciplinary or subdisciplinary data repositories or by creating federating clustering of like items from multiple institutional repositories.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2011
- Bibcode:
- 2011AGUFMIN23C1459H
- Keywords:
-
- 0410 BIOGEOSCIENCES / Biodiversity;
- 0434 BIOGEOSCIENCES / Data sets;
- 1912 INFORMATICS / Data management;
- preservation;
- rescue