Combining Human and Machine Learning to Map Cropland in the 21st Century's Major Agricultural Frontier
Abstract
In the coming decades, large areas of new cropland will be created to meet the world's rapidly growing food demands. Much of this new cropland will be in sub-Saharan Africa, where food needs will increase most and the area of remaining potential farmland is greatest. If we are to understand the impacts of global change, it is critical to accurately identify Africa's existing croplands and how they are changing. Yet the continent's smallholder-dominated agricultural systems are unusually challenging for remote sensing analyses, making accurate area estimates difficult to obtain, let alone important details related to field size and geometry. Fortunately, the rapidly growing archives of moderate to high-resolution satellite imagery hosted on open servers now offer an unprecedented opportunity to improve landcover maps. We present a system that integrates two critical components needed to capitalize on this opportunity: 1) human image interpretation and 2) machine learning (ML). Human judgment is needed to accurately delineate training sites within noisy imagery and a highly variable cover type, while ML provides the ability to scale and to interpret large feature spaces that defy human comprehension. Because large amounts of training data are needed (a major impediment for analysts), we use a crowdsourcing platform that connects amazon.com's Mechanical Turk service to satellite imagery hosted on open image servers. Workers map visible fields at pre-assigned sites, and are paid according to their mapping accuracy. Initial tests show overall high map accuracy and mapping rates >1800 km2/hour. The ML classifier uses random forests and randomized quasi-exhaustive feature selection, and is highly effective in classifying diverse agricultural types in southern Africa (AUC > 0.9). We connect the ML and crowdsourcing components to make an interactive learning framework. The ML algorithm performs an initial classification using a first batch of crowd-sourced maps, using thresholds of posterior probabilities to segregate sub-images classified with high or low confidence. Workers are then directed to collect new training data in low confidence sub-images, after which classification is repeated and re-assessed, and the entire process iterated until maximum possible accuracy is realized.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2016
- Bibcode:
- 2016AGUFMIN54A..07E
- Keywords:
-
- 0845 Instructional tools;
- EDUCATIONDE: 1926 Geospatial;
- INFORMATICSDE: 1928 GIS science;
- INFORMATICSDE: 1992 Virtual globes;
- INFORMATICS