Decoder Denoising Pretraining for Semantic Segmentation

doi:10.48550/arXiv.2205.11423

Decoder Denoising Pretraining for Semantic Segmentation

Semantic segmentation labels are expensive and time consuming to acquire. Hence, pretraining is commonly used to improve the label-efficiency of segmentation models. Typically, the encoder of a segmentation model is pretrained as a classifier and the decoder is randomly initialized. Here, we argue that random initialization of the decoder can be suboptimal, especially when few labeled examples are available. We propose a decoder pretraining approach based on denoising, which can be combined with supervised pretraining of the encoder. We find that decoder denoising pretraining on the ImageNet dataset strongly outperforms encoder-only supervised pretraining. Despite its simplicity, decoder denoising pretraining achieves state-of-the-art results on label-efficient semantic segmentation and offers considerable gains on the Cityscapes, Pascal Context, and ADE20K datasets.

Publication:

arXiv e-prints

Pub Date:

May 2022

DOI:

10.48550/arXiv.2205.11423

arXiv:

arXiv:2205.11423

Bibcode:

2022arXiv220511423B

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
I.4.6;
I.5.4;
I.2.10

NASA/ADS

Decoder Denoising Pretraining for Semantic Segmentation

Abstract