ISOSCELES: Data Driven Sampling for Efficient Convolutional Neural Network Training
Abstract
Convolutional neural networks provide state-of-the-art performance in many computer vision tasks, including those related to remote sensing image analysis. While it is well-known that the training samples are essential to obtaining satisfactory results with CNNs, we still need to better understand how to select the samples to optimize the feature extractors and minimize the training efforts. This is particularly important when dealing with large and heterogeneous remote sensing images where data multimodality is often observed. To address these issues, we have developed ISOCELES (Iterative Self-Organizing Scene-Level Sampling), an algorithm which uses affinity propagation to automate the selection and generation of highly representative training images. This framework can be customized to exploit a variety of image spectral and texture features without prior knowledge about the underlying data distributions. Compared to random sampling, the distribution of the training is principally data driven, reducing the chance of oversampling information poor areas or undersampling information rich ones. In comparison to manual sample selection by an analyst, ISOSCELES exploits descriptive features, spectral or textural, and eliminates human bias in sample selection. Coupled with an unsupervised scene selection, ISOSCELES can be used to quickly obtain a training set that reflects both between-scene variability, such as in viewing angle and time of day, and within-scene variability at the level of individual samples. We demonstrate the effectiveness of the proposed sample selection on a country-scale building extraction task, where we can efficiently fine-tune a pre-trained CNN model with the samples selected by ISOSCELES.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFM.H34B..02S
- Keywords:
-
- 0434 Data sets;
- BIOGEOSCIENCESDE: 1855 Remote sensing;
- HYDROLOGYDE: 1926 Geospatial;
- INFORMATICSDE: 1942 Machine learning;
- INFORMATICS