Mask Approximation Net: Merging Feature Extraction and Distribution Learning for Remote Sensing Change Captioning
Abstract
Remote sensing image change description, as a novel multimodal task in the field of remote sensing processing, not only enables the detection of changes in surface conditions but also provides detailed descriptions of these changes, thereby enhancing human interpretability and interactivity. However, previous methods mainly employed Convolutional Neural Network (CNN) architectures to extract bitemporal image features. This approach often leads to an overemphasis on designing specific network architectures and limits the captured feature distributions to the current dataset, resulting in poor generalizability and robustness when applied to other datasets or real-world scenarios. To address these limitations, this paper proposes a novel approach for remote sensing image change detection and description that integrates diffusion models, aiming to shift the focus from conventional feature learning paradigms to data distribution learning. The proposed method primarily includes a simple multi-scale change detection module, whose output features are subsequently refined using a diffusion model. Additionally, we introduce a frequency-guided complex filter module to handle high-frequency noise during the diffusion process, which helps to maintain model performance. Finally, we validate the effectiveness of our proposed method on several remote sensing change detection description datasets, demonstrating its superior performance. The code available at MaskApproxNet.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.19179
- Bibcode:
- 2024arXiv241219179S
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning