Continual Distillation Learning: An Empirical Study of Knowledge Distillation in Prompt-based Continual Learning
Abstract
Knowledge Distillation (KD) focuses on using a teacher model to improve a student model. Traditionally, KD is studied in an offline fashion, where a training dataset is available before learning. In this work, we introduce the problem of Continual Distillation Learning (CDL) that considers KD in the Continual Learning (CL) setup. A teacher model and a student model need to learn a sequence of tasks, and the knowledge of the teacher model will be distilled to the student to improve the student model in an online fashion. The CDL problem is valuable to study since for prompt-based continual learning methods, using a larger vision transformer (ViT) leads to better performance in continual learning. Distilling the knowledge from a large ViT to a small ViT can improve inference efficiency for promptbased CL models. To this end, we conducted experiments to study the CDL problem with three prompt-based CL models, i.e., L2P, DualPrompt and CODA-Prompt, where we utilized logit distillation, feature distillation and prompt distillation for knowledge distillation from a teacher model to a student model. Our findings of this study can serve as baselines for future CDL work.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2024
- DOI:
- arXiv:
- arXiv:2407.13911
- Bibcode:
- 2024arXiv240713911Z
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Machine Learning