Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling

doi:10.48550/arXiv.2412.07077

Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling

The advancement of vision-language models, particularly the Contrastive Language-Image Pre-training (CLIP) model, has revolutionized the field of machine learning by enabling robust zero-shot learning capabilities. These capabilities allow models to understand and respond to previously unseen data without task-specific training. However, adapting CLIP to integrate specialized knowledge from various domains while retaining its zero-shot capabilities remains a significant challenge. To address this, we introduce a novel prompt ensemble learning approach called Group-wise Prompt Ensemble (GPE). This method aims to enhance CLIP's zero-shot capabilities by incorporating new domain knowledge while improving its adaptability and robustness against data distribution shifts. Our approach hinges on three main strategies: prompt grouping with masked attention to optimize CLIP's adaptability while safeguarding its zero-shot capabilities; the incorporation of auxiliary prompts for the seamless integration of new domain insights without disrupting the original model's representation; and an ensemble learning strategy that effectively merges original and new knowledge. Through rigorous experimentation, including more challenging cross-dataset transfer evaluations, our GPE method redefines the benchmarks for the adaptability and efficiency of vision-language models, surpassing existing models across various scenarios.

Publication:

arXiv e-prints

Pub Date:

December 2024

DOI:

10.48550/arXiv.2412.07077

arXiv:

arXiv:2412.07077

Bibcode:

2024arXiv241207077K

Keywords:

Computer Science - Computer Vision and Pattern Recognition

E-Print:

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

ADS

Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling

Abstract