Continual Distillation Learning: An Empirical Study of Knowledge Distillation in Prompt-based Continual Learning

doi:10.48550/arXiv.2407.13911

Continual Distillation Learning: An Empirical Study of Knowledge Distillation in Prompt-based Continual Learning

Knowledge Distillation (KD) focuses on using a teacher model to improve a student model. Traditionally, KD is studied in an offline fashion, where a training dataset is available before learning. In this work, we introduce the problem of Continual Distillation Learning (CDL) that considers KD in the Continual Learning (CL) setup. A teacher model and a student model need to learn a sequence of tasks, and the knowledge of the teacher model will be distilled to the student to improve the student model in an online fashion. The CDL problem is valuable to study since for prompt-based continual learning methods, using a larger vision transformer (ViT) leads to better performance in continual learning. Distilling the knowledge from a large ViT to a small ViT can improve inference efficiency for promptbased CL models. To this end, we conducted experiments to study the CDL problem with three prompt-based CL models, i.e., L2P, DualPrompt and CODA-Prompt, where we utilized logit distillation, feature distillation and prompt distillation for knowledge distillation from a teacher model to a student model. Our findings of this study can serve as baselines for future CDL work.

Publication:

arXiv e-prints

Pub Date:

July 2024

DOI:

10.48550/arXiv.2407.13911

arXiv:

arXiv:2407.13911

Bibcode:

2024arXiv240713911Z

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Machine Learning

ADS

Continual Distillation Learning: An Empirical Study of Knowledge Distillation in Prompt-based Continual Learning

Abstract