Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning

doi:10.48550/arXiv.2412.18842

Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning

Due to the lack of extensive precisely-annotated multi-label data in real word, semi-supervised multi-label learning (SSMLL) has gradually gained attention. Abundant knowledge embedded in vision-language models (VLMs) pre-trained on large-scale image-text pairs could alleviate the challenge of limited labeled data under SSMLL setting.Despite existing methods based on fine-tuning VLMs have achieved advances in weakly-supervised multi-label learning, they failed to fully leverage the information from labeled data to enhance the learning of unlabeled data. In this paper, we propose a context-based semantic-aware alignment method to solve the SSMLL problem by leveraging the knowledge of VLMs. To address the challenge of handling multiple semantics within an image, we introduce a novel framework design to extract label-specific image features. This design allows us to achieve a more compact alignment between text features and label-specific image features, leading the model to generate high-quality pseudo-labels. To incorporate the model with comprehensive understanding of image, we design a semi-supervised context identification auxiliary task to enhance the feature representation by capturing co-occurrence information. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our proposed method.

Publication:

arXiv e-prints

Pub Date:

December 2024

DOI:

10.48550/arXiv.2412.18842

arXiv:

arXiv:2412.18842

Bibcode:

2024arXiv241218842F

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Machine Learning

ADS

Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning

Abstract