Narrowing the semantic gaps in U-Net with learnable skip connections: The case of medical image segmentation
Abstract
Most state-of-the-art methods for medical image segmentation adopt the encoder-decoder architecture. However, this U-shaped framework still has limitations in capturing the non-local multi-scale information with a simple skip connection. To solve the problem, we firstly explore the potential weakness of skip connections in U-Net on multiple segmentation tasks, and find that i) not all skip connections are useful, each skip connection has different contribution; ii) the optimal combinations of skip connections are different, relying on the specific datasets. Based on our findings, we propose a new segmentation framework, named UDTransNet, to solve three semantic gaps in U-Net. Specifically, we propose a Dual Attention Transformer (DAT) module for capturing the channel- and spatial-wise relationships to better fuse the encoder features, and a Decoder-guided Recalibration Attention (DRA) module for effectively connecting the DAT tokens and the decoder features to eliminate the inconsistency. Hence, both modules establish a learnable connection to solve the semantic gaps between the encoder and the decoder, which leads to a high-performance segmentation model for medical images. Comprehensive experimental results indicate that our UDTransNet produces higher evaluation scores and finer segmentation results with relatively fewer parameters over the state-of-the-art segmentation methods on different public datasets. Code: https://github.com/McGregorWwww/UDTransNet.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2023
- DOI:
- 10.48550/arXiv.2312.15182
- arXiv:
- arXiv:2312.15182
- Bibcode:
- 2023arXiv231215182W
- Keywords:
-
- Electrical Engineering and Systems Science - Image and Video Processing;
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Machine Learning