Groundwater arsenic risk assessment in Cambodia via machine learning methods
Abstract
Significant health risks of arsenic (As) are known in Cambodia, with an estimated 2.25 million people chronically exposed to arsenic (As) via drinking contaminated well water. Distribution of groundwater As in Cambodia is spatially heterogenous as evidenced by As (n = 39,117) and hydrochemical (n = 6,887) data of mostly shallow (< 50 m) wells. The hydrochemical data coverage is concentrated in southeast Cambodia, representing only about 25% area. Using these datasets and geological and geomorphological parameters (n = 31), machine learning methods (BRT: boosted regression trees and RF: random forest) are used to construct risk assessment models and to determine factors controlling the spatial variability of As.
Groundwater As were binary coded for 3 thresholds: 5 μg/L, 10 μg/L and 50 μg/L. The spatial-only-parameter models were chosen to compare the national and regional models. All-parameter models were established to evaluate the controlling factors. Both BRT and RF performed well, with accuracy, specificity, sensitivity and ROC around 90%, 70%, 90% and 0.9 for all-parameter and spatial-only-parameter models in the southeast region and 80%, 80%, 70% and 0.8 for spatial-only-parameter models for the entire nation. The distance to the Mekong river was found to be the most important parameter in all models. Turbidity and Fe were the 2nd and 3rd important parameters for the all-parameter southeast models. For the spatial-only-parameter national models, gravity and precipitation ranked the 2nd and 3rd. The differences in rank of the important parameters between the national and southeast models were in part attributed to the uneven distribution of the dataset. To compensate for the uneven distribution, a semi-supervised method was further established to verify the effect of generated data with uniform distribution to improve performance. Finally, the influence of data distribution in regression models and the strategy to deal with the issue were discussed.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMGH23B1225Z
- Keywords:
-
- 0240 Public health;
- GEOHEALTH;
- 1831 Groundwater quality;
- HYDROLOGY;
- 1847 Modeling;
- HYDROLOGY;
- 1880 Water management;
- HYDROLOGY