Cluster Detection Capabilities of the Average Nearest Neighbor Ratio and Ripley's K Function on Areal Data: an Empirical Assessment
Abstract
Spatial clustering detection methods are widely used in many fields including epidemiology, ecology, biology, physics, and sociology. In these fields, areal data is often of interest; such data may result from spatial aggregation (e.g. the number disease cases in a county) or may be inherent attributes of the areal unit as a whole (e.g. the habitat suitability of conserved land parcel). This study aims to assess the performance of two spatial clustering detection methods on areal data: the average nearest neighbor (ANN) ratio and Ripley's K function. These methods are designed for point process data, but their ease of implementation in GIS software (e.g., in ESRI ArcGIS) and the lack of analogous methods for areal data have contributed to their use for areal data. Despite the popularity of applying these methods to areal data, little research has explored their properties in the areal data context. In this paper we conduct a simulation study to evaluate the performance of each method for areal data under various areal structures and types of spatial dependence. These studies find that traditional approach to hypothesis testing using the ANN ratio or Ripley's K function results in inflated empirical type I rates when applied to areal data. We demonstrate that this issue can be remedied for both approaches by using Monte Carlo methods which acknowledge the areal nature of the data to estimate the distribution of the test statistic under the null hypothesis. While such an approach is not currently implemented in ArcGIS, it can be easily done in R using code provided by the authors.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2022
- DOI:
- 10.48550/arXiv.2204.10882
- arXiv:
- arXiv:2204.10882
- Bibcode:
- 2022arXiv220410882V
- Keywords:
-
- Statistics - Applications