Use of the Open Source PODPAC Library for Remote, Cloud-Based Data Analysis, Visualization, and Collaboration in a Web Browser
Abstract
The volume and variety of data originating from remote sensing platforms is growing so rapidly that new capabilities for online data visualization and analysis are crucial for future earth science research. Currently, each researcher discovers, downloads, interprets, and transforms large amounts of data independently on local servers. This approach is fragile, hinders reproducibility, and is not sustainable with future volumes of data. Cloud-based analysis of data on remote servers, close to where it is stored, offers a solution. However, analyzing data in the cloud currently requires earth scientists to possess expert knowledge of vendor-specific cloud services. Moreover, for each data source of interest, scientists still have to manage a variety of data storage formats, projections, etc. This presents a significant barrier to multidisciplinary scientific studies using large volumes of multi-source and multi-scale data.
To address these problems, we are developing PODPAC (Pipeline for Observational Data Processing Analysis and Collaboration), a cloud-ready open source Python library for remote data visualization and analysis. PODPAC aims to provide: (1) preconfigured cloud environments accessed via web browsers to help researchers transition to the cloud, (2) unified data access and automated data wrangling to address source-specific data variety, and (3) a cloud-based platform for collaboration and dissemination of algorithms and new data products. Using data from NASA's Soil Moisture Active Passive (SMAP) program, we will demonstrate seamless cloud-based access and wrangling of remotely sensed soil moisture, as well as sharing of the resulting processes and products. We will show cloud-based data analysis and visualization using PODPAC via a web browser user interface developed in Jupyterlab running on Amazon Web Services. To demonstrate seamless data access, we will explore subsets of the SMAP-Sentinel product, which poses unique challenges due to its spatial-temporal irregularity. PODPAC greatly simplifies cloud-based data analysis and visualization, provides a framework for encapsulating additional data sources in a unified, robust, and reproducible manner, and enables earth scientists to readily share and collaborate on new data analyses and products.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN43A..14U
- Keywords:
-
- 0399 General or miscellaneous;
- ATMOSPHERIC COMPOSITION AND STRUCTUREDE: 3399 General or miscellaneous;
- ATMOSPHERIC PROCESSESDE: 1899 General or miscellaneous;
- HYDROLOGYDE: 1996 Web Services;
- INFORMATICS