Statistical Detection of Potentially Fabricated Data
Abstract
Scientific fraud is an increasingly vexing problem. Many current programs for fraud detection focus on image manipulation, while techniques for detection based on anomalous patterns that may be discoverable in the underlying numerical data get much less attention, even though these techniques are often easy to apply. We employed three such techniques in a case study in which we considered data sets from several hundred experiments. We compared patterns in the data sets from one research teaching specialist (RTS), to those of 9 other members of the same laboratory and from 3 outside laboratories. Application of two conventional statistical tests and a newly developed test for anomalous patterns in the triplicate data commonly produced in such research to various data sets reported by the RTS resulted in repeated rejection of the hypotheses (often at p-levels well below 0.001) that anomalous patterns in his data may have occurred by chance. This analysis emphasizes the importance of access to raw data that form the bases of publications, reports and grant applications in order to evaluate the correctness of the conclusions, as well as the utility of methods for detecting anomalous, especially fabricated, numerical results.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2013
- DOI:
- 10.48550/arXiv.1311.5517
- arXiv:
- arXiv:1311.5517
- Bibcode:
- 2013arXiv1311.5517P
- Keywords:
-
- Quantitative Biology - Quantitative Methods;
- Statistics - Applications
- E-Print:
- 31 pages of text including 2 figures, 3 tables and an Appendix containing the mathematical derivation of a model for detecting and quantifying the probability for the occurrence of the average of 3 counts as one of those counts. 166 pages of raw data that were used in the analyses