Proactive Data Containers (PDC): An Object-centric Data Store for Large-scale Computing Systems
Abstract
We will present a novel object-centric data management system, called Proactive Data Containers (PDC), which provides abstractions and storage mechanisms that take advantage of deep memory and storage hierarchy and enable proactive automated performance tuning in storing and retrieving data on large-scale supercomputing systems. While cloud computing environments have been successfully using object-based storage, such as Amazon S3 and OpenStack Swift, parallel file systems on large-scale supercomputing systems are accessed using I/O libraries that are based on slow and restrictive POSIX and MPI (Message Passing Interface) I/O standards. These file systems face fundamental challenges in the areas of scalable metadata operations, semantics-based data movement performance tuning, and asynchronous operation. Exacerbating this situation, storage systems on upcoming exascale supercomputers are being deployed with an unprecedented level of complexity due to a deep system memory/storage hierarchy. This hierarchy ranges from several levels of volatile memory to non-volatile memory, traditional hard disks, and tapes. Simple and efficient methods of data management and movement through this hierarchy is critical for numerous scientific applications that are storing and analyzing massive amounts of data on supercomputing systems. In the PDC system, scalable metadata management is achieved using the memory available in compute nodes. The metadata objects contain required information such as data description and ownership as well as optional provenance and user-defined tags. Data objects are stored and retrieved efficiently using a server-initiated optimized data movement, where data is stored or cached asynchronously, while the applications continue with computation operations. We will also discuss automatic data analysis and transformations while the data is moving from one location to another. Time permitting, we will present the PDC concepts of automatic reorganization and placement of data in the memory and storage hierarchy, closer to data analysis using the history of previous data accesses for analysis and of any user-provided hints.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN34B..09B
- Keywords:
-
- 1912 Data management;
- preservation;
- rescue;
- INFORMATICSDE: 1916 Data and information discovery;
- INFORMATICSDE: 1930 Data and information governance;
- INFORMATICSDE: 1942 Machine learning;
- INFORMATICS