EDGIer APIs: Scalable, Feature-Rich Empirical Orthogonal Function Analysis of Distributed Geoscientific Data That "Just Works"
Abstract
Empirical Orthogonal Function (EOF) analysis is widely used for geospatial data analysis, particularly for extracting oscillating or propagating patterns. While EOF analysis is fairly well established in geoscience (often under a number of aliases, such as Principal Component Analysis), available packages typically do not conceptualize the full geoscientific EOF workflow as a coherent or scalable problem. Additionally, many flavors of EOF analysis are available, many of which are mutually compatible. Since EOF analysis requires an eigensolver algorithm, there are also complexities involved in choosing and implementing linear algebra packages. Increasingly, there are also issues with making the best use of HPC architectures at various stages. This disorganized ecosystem of EOF techniques currently makes it difficult to perform an EOF analysis on geoscientific data that makes full use of available hardware and "just works" from the scientist's perspective. We present the structure of the geoscientific EOF workflow problem and the algorithm design decisions that this structure imposes.
We originally sought to develop a package that provided an interface between spectra calculation over a bandwidth and EOFs that could also effectively leverage different computational architectures and parallel linear algebra libraries, deemed "EOF/DFT General Interface" (or EDGI). However, we soon realized that the EDGI approach could be expanded to include other relevant features like distributed I/O, accomodation of gappy data, and parallel eigensolvers. Emergent architectural properties of combining scalable I/O with spectra and EOF calculation are presented. We also demonstrate different flavors of EOF analysis currently implemented in our EDGI package (written in C++) and explore its scalability in several scenarios.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN41D0861D
- Keywords:
-
- 1855 Remote sensing;
- HYDROLOGYDE: 1908 Cyberinfrastructure;
- INFORMATICSDE: 1914 Data mining;
- INFORMATICSDE: 1942 Machine learning;
- INFORMATICS