For many practical computer vision applications, the learned models usually have high performance on the datasets used for training but suffer from significant performance degradation when deployed in new environments, where there are usually style differences between the training images and the testing images. An effective domain generalizable model is expected to be able to learn feature representations that are both generalizable and discriminative. In this paper, we design a novel Style Normalization and Restitution module (SNR) to simultaneously ensure both high generalization and discrimination capability of the networks. In the SNR module, particularly, we filter out the style variations (e.g, illumination, color contrast) by performing Instance Normalization (IN) to obtain style normalized features, where the discrepancy among different samples and domains is reduced. However, such a process is task-ignorant and inevitably removes some task-relevant discriminative information, which could hurt the performance. To remedy this, we propose to distill task-relevant discriminative features from the residual (i.e, the difference between the original feature and the style normalized feature) and add them back to the network to ensure high discrimination. Moreover, for better disentanglement, we enforce a dual causality loss constraint in the restitution step to encourage the better separation of task-relevant and task-irrelevant features. We validate the effectiveness of our SNR on different computer vision tasks, including classification, semantic segmentation, and object detection. Experiments demonstrate that our SNR module is capable of improving the performance of networks for domain generalization (DG) and unsupervised domain adaptation (UDA) on many tasks. Code are available at https://github.com/microsoft/SNR.
- Pub Date:
- January 2021
- Computer Science - Computer Vision and Pattern Recognition
- Published on IEEE Transactions on Multimedia. This paper extended SNR (CVPR'20) for domain generalization and adaptation to various computer vision tasks, e.g., image classification, object detection, semantic segmentation