Novel Random Projection-based Dimensionality Reduction Methods as a Foundation for Scalable and Efficient Big Data Analytics in Geoscience Applications
Abstract
Hyperspectral data has played a major role in recent advances in geosciences. It has enabled Big Data acquisition, analysis of geophysical phenomena such as tornadoes and hastened the response time to natural disasters. The spectral information in hyperspectral data has promoted study of geoscience applications including volcanic eruption in Hawaii, monitoring effects of global warming, and deforestation. However, this large number of spectral bands also imposes significant challenges such as curse of dimensionality, high storage and computation costs. Consequently, dimensionality reduction (DR) and feature selection/extraction methods are often used to evade these issues and facilitate effective data analysis by projecting a high dimensional data to its lower dimensions and ensuring that vital information in the data is preserved.
The focus of traditional transform-based DR methods such as singular vector decomposition (SVD) is on learning the underlying data-structure. However, this inherent data-learning process makes them computationally expensive and yields a data-dependent representation. In many cases, the traditional data analysis methods prove to be inefficient and ill-suited for Big Data processing. Therefore, modern Big Data research demands more innovative Big Data analytics that can efficiently process large volumes of data and produce application-oriented solutions. As a viable and novel alternative, random projection (RP)-based methods are attractive techniques for DR and Big Data analyses. Contrary to transform-based DR methods, RPs project data to lower dimensions using a random projection matrix (e.g. Gaussian (GM) or Hadamard (HM) matrix), thereby it eliminates data-learning and preserves crucial information. Therefore, RP methods have several advantages such as data-independent representation, computational efficiency and simplicity of implementation. Recent multidisciplinary Big Data research has confirmed the potential of RPs for use in efficient data analytics. This work further explores the realm of RP-based DR methods as a foundation for the development of new scalable and efficient Big Data analytics that will pivot the computational benefits of RP into data-learning of transform-based DR methods to promote Big Data applications in the area of geosciences.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN41D0870C
- Keywords:
-
- 1855 Remote sensing;
- HYDROLOGYDE: 1908 Cyberinfrastructure;
- INFORMATICSDE: 1914 Data mining;
- INFORMATICSDE: 1942 Machine learning;
- INFORMATICS