Understanding the Limitations of in situ Data to Better Evaluate and Improve Land Cover Products
Abstract
In situ data are considered the gold standard for evaluating and improving image products. However, the inherent limitations of such data for this purpose are poorly understood. This has two important impacts. First, image-derived products are compared to a data standard that is not as good as assumed. Second, in situ data are not optimized for the improvement of image products. The work reported is focused on land cover products - specifically maps covering approximately 40% of Australia -- showing two classes: forest and non-forest. Such maps were available for nine years between the 2000 and 2011. Also available was the probability of each pixel being forest - p(Forest) -- in a given year, and 7680 points for which human interpreters had determined if each was forest or non-forest using high resolution imagery. Though not collected using ground-based sampling, the latter were considered to be in situ data - something that is considered appropriate given the use of relatively coarse image classes. Analysis indicated that pixels having a p(Forest) value between 0.20 and 0.80 are most likely to be erroneously classified. This suggests that to improve image products, in situ data should be collected in areas that a priori have been identified as being difficult to classify. Yet the author knows of no situation in which in situ data were collected according to such a sampling scheme, and it is difficult to imagine such a scheme being implemented. An additional general issue that was apparent is the location of an in situ point relative to the pixel it is meant to represent. Australia defines forest as being at least 2 m tall and having a 20% crown closure. A point located on single row of mature trees that lines an agricultural field was therefore forest if one uses a smaller pixel, but not if one uses a larger pixel. Moreover, if the point is located in a corner of the pixel, then the point should not be classified as forest. Arguing that a disagreement between a pixel's and a point's classification indicates erroneous image classification ignores these factors and the reality that forest can only be defined based on some minimum area. Note that the sample unit size/area chosen for the in situ data - though objectively applied - was chosen based on subjective consideration of multiple factors. Hence in situ data must be considered an additional representation of reality rather than "truth." Field-based in situ data collected to describe woody vegetation suffer similar sampling and representativeness problems. First, sampling schemes generally place samples in areas where woody vegetation is known to exist. However, this means that one is not sampling errors of omission - i.e., places identified as non-forest that actually are covered by woody vegetation. Second, woody vegetation is sampled using either a fixed-area sample whose size is not optimised relative to the imagery being processed, or trees are selected around a point using an angle-gauge that produces a sample unit whose area varies with tree size. These concepts will be presented to raise awareness of issues associated with in situ data leading to more realistic evaluation of image products, and more appropriate data collection for the improvement of image products.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2012
- Bibcode:
- 2012AGUFMIN53B1732L
- Keywords:
-
- 0480 BIOGEOSCIENCES / Remote sensing;
- 0540 COMPUTATIONAL GEOPHYSICS / Image processing;
- 0550 COMPUTATIONAL GEOPHYSICS / Model verification and validation