Satlas: Creating Accurate, Global Geospatial Data Products with Large-scale Pre-training
Abstract
Several collections of remote sensing images provide frequent global coverage, e.g., ESA's Sentinel-2 constellation captures Earth's land surface roughly once a week. Machine learning (ML) models can be used to extract up-to-date geospatial data from these images, such as identifying locations with recent forest loss or detecting newly constructed offshore platforms. Ideally, these data products would be high-accuracy, global, and updated frequently. However, while there are many existing sources of ML-generated geospatial data, few satisfy all of these requirements.
We present Satlas, a hub of high-accuracy geospatial data products generated by ML models using Sentinel-2 imagery. Satlas currently includes four geospatial products: wind turbines, solar farms, tree cover, and offshore platforms. The data is computed globally, and updated on a monthly basis as new Sentinel-2 imagery becomes available. To achieve high accuracy, we first pretrain a Swin Transformer backbone on SatlasPretrain, a large-scale remote sensing dataset. SatlasPretrain combines 302 million labels from several sources, including new annotation by crowdworkers, and processing existing datasets like OpenStreetMap and WorldCover. We find that pre-training on SatlasPretrain improves average performance on seven downstream tasks by 18% over ImageNet pre-training. We then collect high-quality task-specific training labels, varying in size from 2K solar farms to 14K offshore platforms. We fine-tune the pretrained backbone on each of the task-specific label sets to derive high-accuracy models. We develop a software system that automatically retrieves monthly Sentinel-2 imagery, and applies the models on the imagery to update the geospatial data products. We believe the data products we have released have many use cases in planetary and environmental monitoring, and present two active use cases here. Skylight, a tool for surfacing potential illegal fishing events, is using the offshore wind turbines and platforms we identify to improve vessel trajectory classification. We are also collaborating with Amazon Conservation, a nonprofit working to conserve the biodiversity of the Amazon rainforest, on using our tree cover data to automatically classify the causes of detected forest loss events. https://agu.confex.com/data/abstract/agu/fm23/4/6/Paper_1372264_abstract_1200112_0.png- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2023
- Bibcode:
- 2023AGUFMGC21F0984B