An Open and Transparent Databank of Global Land Surface Temperature
Abstract
The International Surface Temperature Initiative (ISTI) consists of an effort to create an end-to-end process for land surface air temperature analyses. The foundation of this process is the establishment of a global land surface databank. The databank builds upon the groundbreaking efforts of scientists who led efforts to construct global land surface datasets in the 1980's and 1990's. A primary aim of the databank is to improve aspects including data provenance, version control, temporal and spatial coverage, and improved methods for bringing dozens of source data together into an integrated dataset. The databank consists of multiple stages, with each successive stage providing a higher level of processing, quality and integration. Currently more than 50 sources of data have been added to the databank. An automated algorithm has been developed that merges these sources into one complete dataset by removing duplicate station records, identifying two or more station records that can be merged into a single record, and incorporating new and unique stations. The program runs iteratively through all the sources which are ordered based upon criteria established by the ISTI. The highest preferred source, known as the target, runs through all the candidate sources, calculating station comparisons that are acceptable for merging. The process is probabilistic in approach, and the final fate of a candidate station is based upon metadata matching and data equivalence criteria. If there is not enough information, the station is withheld for further investigation. The algorithm has been validated using a pseudo-source of stations with a known time of observation bias, and correct matches have been made nearly 95% of the time. The final product, endorsed and recommended by ISTI, contains over 30,000 stations, however slight changes in the algorithm can perturb results. Subjective decisions, such as the ordering of the sources, or changing metadata and data matching thresholds, can yield a different outcome. In order to address the structural uncertainty, multi-member ensembles of the merge program have been produced. All data and code are provided openly and without charge, which facilitates easy access and ease of use by anyone in the international community. We strongly encourage the use of these data and feedback on any relevant aspect of the databank effort from interested parties. Location of stations in the recommended merge product, colorized by period of record.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2013
- Bibcode:
- 2013AGUFMGC41A0982R
- Keywords:
-
- 1616 GLOBAL CHANGE Climate variability;
- 1912 INFORMATICS Data management;
- preservation;
- rescue;
- 1914 INFORMATICS Data mining;
- 1984 INFORMATICS Statistical methods: Descriptive