Mixed Land Use Detection via Multi-modal Learning
Abstract
Land use classification based on visual interpretation of aerial images has been extensively studied over decades. However, it is difficult to use such overhead imagery to determine land use accurately in complicated urban areas (e.g., a building with different functionalities). On the contrary, images taken at the ground level such as street view images are more fine-grained and informative for mixed land use detection. Meanwhile, given that land use categories are often used to describe urban images, mixed land use detection can be considered as the Natural Language for Visual Reasoning (NLVR) problem, aiming to classify an image to land use labels based on its context. Taking this fact into consideration, this study tries to investigate vision-language multi-modal learning as a novel solution to mixed land use detection. Built on the successful trials on the contrastive language-image pre-training (CLIP) model, we develop Mixed Land Use Detection (MLUD) models, which not only learn the visual features from street view images, but also integrate land use labels to generate textual features, which in turn are fused with the visual features for determining land uses. Our experiments demonstrate that simply utilizing the street view image itself and appropriate geo-contextualized prompts is effective for mixed land use detection across 56 cities worldwide. Both the Zero-shot MLUD model (i.e., geo-contextualized prompt tuning on CLIP) and the Linear-probing MLUD model (i.e., CLIP's image encoder linearly trained on street view images) can achieve competitive accuracies on a challenging six-class mixed land use detection task. Based on the detection results, we were able to use the Diversity Index (DI) to measure the mixed land use level for each street view image. These results indicate that there is great potential of combining natural language and visual representations for mixed land use detection in future.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFMIN12B0260W