Ingesting, Managing and Distributing Large Atmospheric Data Sets
Abstract
The NASA Atmosphere PEATE at the University of Wisconsin has built a system for ingesting, managing and distributing millions of files and terabytes of data. Our system currently ingests roughly 50,000 files per day (~1 TB). Data is integrity checked, compressed and stored in a replicated Gluster filesystem. Metadata on every file is stored in a database which allows for monitoring of system performance and data integrity. A scripting API has been created allowing users to search for data from multiple sensors collocated in both time and space. We will present numerous challenges encountered during the implementation of this system.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2012
- Bibcode:
- 2012AGUFMIN43B1520D
- Keywords:
-
- 1910 INFORMATICS / Data assimilation;
- integration and fusion;
- 1932 INFORMATICS / High-performance computing;
- 1946 INFORMATICS / Metadata;
- 1996 INFORMATICS / Web Services