A Novel Self-Supervised Cross-Modal Image Retrieval Method In Remote Sensing
Abstract
Due to the availability of multi-modal remote sensing (RS) image archives, one of the most important research topics is the development of cross-modal RS image retrieval (CM-RSIR) methods that search semantically similar images across different modalities. Existing CM-RSIR methods require the availability of a high quality and quantity of annotated training images. The collection of a sufficient number of reliable labeled images is time consuming, complex and costly in operational scenarios, and can significantly affect the final accuracy of CM-RSIR. In this paper, we introduce a novel self-supervised CM-RSIR method that aims to: i) model mutual-information between different modalities in a self-supervised manner; ii) retain the distributions of modal-specific feature spaces similar to each other; and iii) define the most similar images within each modality without requiring any annotated training image. To this end, we propose a novel objective including three loss functions that simultaneously: i) maximize mutual information of different modalities for inter-modal similarity preservation; ii) minimize the angular distance of multi-modal image tuples for the elimination of inter-modal discrepancies; and iii) increase cosine similarity of the most similar images within each modality for the characterization of intra-modal similarities. Experimental results show the effectiveness of the proposed method compared to state-of-the-art methods. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/SS-CM-RSIR.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2022
- DOI:
- 10.48550/arXiv.2202.11429
- arXiv:
- arXiv:2202.11429
- Bibcode:
- 2022arXiv220211429S
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- Accepted at IEEE International Conference on Image Processing (ICIP) 2022. Our code is available at https://git.tu-berlin.de/rsim/SS-CM-RSIR