Multitask Emotion Recognition with Incomplete Labels
Abstract
We train a unified model to perform three tasks: facial action unit detection, expression classification, and valence-arousal estimation. We address two main challenges of learning the three tasks. First, most existing datasets are highly imbalanced. Second, most existing datasets do not contain labels for all three tasks. To tackle the first challenge, we apply data balancing techniques to experimental datasets. To tackle the second challenge, we propose an algorithm for the multitask model to learn from missing (incomplete) labels. This algorithm has two steps. We first train a teacher model to perform all three tasks, where each instance is trained by the ground truth label of its corresponding task. Secondly, we refer to the outputs of the teacher model as the soft labels. We use the soft labels and the ground truth to train the student model. We find that most of the student models outperform their teacher model on all the three tasks. Finally, we use model ensembling to boost performance further on the three tasks.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2020
- DOI:
- arXiv:
- arXiv:2002.03557
- Bibcode:
- 2020arXiv200203557D
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Multimedia;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- Accepted by FG2020