A comparison of dimensionality reduction techniques in hydrogeochemistry
Abstract
Principal Component Analysis (PCA) and Exploratory Factor Analysis (EFA) are statistical techniques used to characterize variance in a multivariate data set with the fewest dimensions. Both tools have been widely utilized in the social sciences, ecology, and, more relatedly, low-temperature geochemistry to reduce complicated data sets while maintaining a thorough understanding of the behavior of the system. Here, we apply and evaluate the effectiveness of PCA and EFA in reducing an extensive hydrogeochemical data set from a well-characterized system (Yellowstone hydrothermal system). Although often thought to be identical, the methods have a fundamental difference: the PCA transformation assumes all variance is common or shared, whereas the EFA model partitions total variance into common and unique portions. Attributing all covariance of a group of geochemical parameters to a single construct assumes that no other external or independent process alters any of the variables of interest. This phenomenon is highly unlikely in natural systems. For this reason, we hypothesize EFA to be the more efficient among the two methods in characterizing the latent structure of the data set. We evaluate the efficacy of our findings by applying cluster analysis to the reduced data sets . The results of this study provide a framework for moving forward with multivariate analysis in geochemical data.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFM.V11C..13G
- Keywords:
-
- 1099 General or miscellaneous;
- GEOCHEMISTRY;
- 1916 Data and information discovery;
- INFORMATICS;
- 1942 Machine learning;
- INFORMATICS;
- 3699 General or miscellaneous;
- MINERALOGY AND PETROLOGY