GPU Accelerated Clustering for Arbitrary Shapes in Geoscience Data
Abstract
Clustering algorithms have become a vital component in intelligent systems for geoscience that helps scientists discover and track phenomena of various kinds. Here, we outline advances in Density-Based Spatial Clustering of Applications with Noise (DBSCAN) which detects clusters of arbitrary shape that are common in geospatial data. In particular, we propose a hybrid CPU-GPU implementation of DBSCAN and highlight new optimization approaches on the GPU that allows clustering detection in parallel while optimizing data transport during CPU-GPU interactions. We employ an efficient batching scheme between the host and GPU such that limited GPU memory is not prohibitive when processing large and/or dense datasets. To minimize data transfer overhead, we estimate the total workload size and employ an execution that generates optimized batches that will not overflow the GPU buffer. This work is demonstrated on space weather Total Electron Content (TEC) datasets containing over 5 million measurements from instruments worldwide, and allows scientists to spot spatially coherent phenomena with ease. Our approach is up to 30 times faster than a sequential implementation and therefore accelerates discoveries in large datasets. We acknowledge support from NSF ACI-1442997.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2016
- Bibcode:
- 2016AGUFMIN13C1671P
- Keywords:
-
- 9820 Techniques applicable in three or more fields;
- GENERAL OR MISCELLANEOUSDE: 1920 Emerging informatics technologies;
- INFORMATICSDE: 1968 Scientific reasoning/inference;
- INFORMATICSDE: 1976 Software tools and services;
- INFORMATICS