Using modified self-organizing maps to explore hydrochemical and biological datasets
Abstract
We present a clustering methodology that distinguishes management zones in a landfill leachate contaminated groundwater aquifer using only microbiological data for input rather than traditional physiochemical information. The self-organizing map (SOM), an artificial neural network (ANN), is commonly used as a K-means clustering method. The method outperforms many traditional clustering methods on noisy datasets (e.g. high dispersion, outliers, non-uniform cluster densities); and is appropriate when combining the multiple correlated and auto-correlated data associated with most hydrochemical research. We applied an SOM to a set of genome-based microbial community profiles created using terminal restriction fragment length polymorphism (T-RFLP) of the 16S rRNA gene sampled from groundwater monitoring wells in an aquifer contaminated with landfill leachate. We modified the existing algorithm to allow weighting of the input variables according to their relative importance, and added a post-processing radial basis function to estimate group membership between measurement locations auto-correlated in space. We statistically tested the SOM output clusters using a nonparametric MANOVA to identify an optimal number of clusters. The SOM methodology distinguished between tiers of contamination in this multi-contaminant environment using expert knowledge to guide data preprocessing and to weight the input variables. Results showed a composite delineation representative of overall groundwater contamination at the landfill based only on microbiological information. Using a small number of clusters, the SOM distinguished between background and leachate-contaminated sampling locations, whereas with a larger number of clusters it groups across a gradient of groundwater contamination. The landfill leachate application demonstrates that microbial community data can compliment standard analytical analyses for the purpose of delineating spatial zones of groundwater contamination. The success of this research is attributed to communication between the computational and biological scientists. This ensured that the essential nature of the dataset was preserved throughout the computational transformations and that the methodology was optimized for the application.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2010
- Bibcode:
- 2010AGUFM.H11E0865P
- Keywords:
-
- 0465 BIOGEOSCIENCES / Microbiology: ecology;
- physiology and genomics;
- 1831 HYDROLOGY / Groundwater quality;
- 1849 HYDROLOGY / Numerical approximations and analysis