Detecting Randomness: the Sensitivity of Statistical Tests to Deviations from a Constant Rate Poisson Process
Abstract
Detecting trends in the rate of sporadic events is a problem for earthquakes and other natural hazards such as storms, floods, or landslides. I use synthetic events to judge the tests used to address this problem in seismology and consider their application to other hazards. Recent papers have analyzed the record of magnitude ≥7 earthquakes since 1900 and concluded that the events are consistent with a constant rate Poisson process plus localized aftershocks (Michael, GRL, 2011; Shearer and Stark, PNAS, 2012; Daub et al., GRL, 2012; Parsons and Geist, BSSA, 2012). Each paper removed localized aftershocks and then used a different suite of statistical tests to test the null hypothesis that the remaining data could be drawn from a constant rate Poisson process. The methods include KS tests between event times or inter-event times and predictions from a Poisson process, the autocorrelation function on inter-event times, and two tests on the number of events in time bins: the Poisson dispersion test and the multinomial chi-square test. The range of statistical tests gives us confidence in the conclusions; which are robust with respect to the choice of tests and parameters. But which tests are optimal and how sensitive are they to deviations from the null hypothesis? The latter point was raised by Dimer (arXiv, 2012), who suggested that the lack of consideration of Type 2 errors prevents these papers from being able to place limits on the degree of clustering and rate changes that could be present in the global seismogenic process. I produce synthetic sets of events that deviate from a constant rate Poisson process using a variety of statistical simulation methods including Gamma distributed inter-event times and random walks. The sets of synthetic events are examined with the statistical tests described above. Preliminary results suggest that with 100 to 1000 events, a data set that does not reject the Poisson null hypothesis could have a variability that is 30% to 10% greater than the inherent variability of a Poisson process, respectively. E.g., if there are 1000 events in a century-long data set, then a Poisson process predicts that there will be 100±10 (1 s.d.) events in a decade. But given the limits of the data there could be 100±11 events. Or if there are 100 events in a century, then the prediction of 10±3 events in a decade widens to 10±4. For a smaller data set of 20 events per century, the increase in possible variability is 300% and the decadal forecast of 2±1.4 events becomes 2±4. Thus, the existing statistical tests are able to produce useful limits on forecasts for moderate-sized data sets but not for very small ones. Will we obtain similar results if we apply these tests to atmospheric events, which undergo annual and other cycles? Binning the data could minimize the effect of these cycles on these methods. However, based on these synthetic data sets, binning reduces the sensitivity of the tests, especially for larger data sets. In seismology, the effects of localized clustering can be addressed either by removing the aftershocks or transforming the event times such that the known clustering appears to be Poisson in transformed time and doing the tests in that transformed space. If similar approaches are appropriate for other natural hazards, then we can use the same methods across different fields.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2012
- Bibcode:
- 2012AGUFMNG23B1562M
- Keywords:
-
- 3245 MATHEMATICAL GEOPHYSICS / Probabilistic forecasting;
- 3275 MATHEMATICAL GEOPHYSICS / Uncertainty quantification;
- 7223 SEISMOLOGY / Earthquake interaction;
- forecasting;
- and prediction;
- 4315 NATURAL HAZARDS / Monitoring;
- forecasting;
- prediction