Fusing Transformers in a Tuning Fork Structure for Hyperspectral Image Classification Across Disjoint Samples
Abstract
3D Swin Transformer (3D-ST) known for its hierarchical attention and window-based processing, excels in capturing intricate spatial relationships within images. Spatial-spectral Transformer (SST), meanwhile, specializes in modeling long-range dependencies through self-attention mechanisms. Therefore, this paper introduces a novel method: an attentional fusion of these two transformers to significantly enhance the classification performance of Hyperspectral Images (HSIs). What sets this approach apart is its emphasis on the integration of attentional mechanisms from both architectures. This integration not only refines the modeling of spatial and spectral information but also contributes to achieving more precise and accurate classification results. The experimentation and evaluation of benchmark HSI datasets underscore the importance of employing disjoint training, validation, and test samples. The results demonstrate the effectiveness of the fusion approach, showcasing its superiority over traditional methods and individual transformers. Incorporating disjoint samples enhances the robustness and reliability of the proposed methodology, emphasizing its potential for advancing hyperspectral image classification.
- Publication:
-
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
- Pub Date:
- 2024
- DOI:
- arXiv:
- arXiv:2405.01095
- Bibcode:
- 2024IJSTA..1718167A
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Electrical Engineering and Systems Science - Image and Video Processing
- E-Print:
- doi:10.1109/JSTARS.2024.3465831