Artificial bias typically neglected in comparisons of uncertain atmospheric data
Abstract
Researchers in atmospheric sciences typically neglect biases caused by regression dilution and regression to the mean (RTM) in comparisons of uncertain data. Regression dilution occurs when the ordinary least squares regression method is used on a predictor with random data uncertainty, which causes the slope to become biased towards zero. RTM on the other hand happens when an extreme observation is accompanied by a less extreme follow-up observation. These biases both originate from random uncertainties of the reference data, which is typically not taken into account and discussed in atmospheric sciences. This is crucial, since essentially all typical atmospheric data have some level of uncertainty. We use synthetic observations of aerosol optical thickness and UV index mimicking real atmospheric data to demonstrate how the biases arise from random data uncertainties of measurements, model output, or satellite retrieval products. Further, we provide examples of typical methods of data comparisons that have a tendency to pronounce the biases. The results show, that data uncertainties can significantly bias data comparisons due regression dilution and RTM, a fact that is known in statistics, but disregarded in atmospheric sciences. Thus we argue, that often these biases are widely regarded as measurement or modeling errors, for instance, while they in fact are artificial. It is essential that atmospheric and geoscience communities become aware of and consider features in research.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2016
- Bibcode:
- 2016AGUFM.A53C0307A
- Keywords:
-
- 0305 Aerosols and particles;
- ATMOSPHERIC COMPOSITION AND STRUCTUREDE: 0322 Constituent sources and sinks;
- ATMOSPHERIC COMPOSITION AND STRUCTUREDE: 0345 Pollution: urban and regional;
- ATMOSPHERIC COMPOSITION AND STRUCTUREDE: 3311 Clouds and aerosols;
- ATMOSPHERIC PROCESSES