Unsupervised Deep Representations for Learning Audience Facial Behaviors
Abstract
In this paper, we present an unsupervised learning approach for analyzing facial behavior based on a deep generative model combined with a convolutional neural network (CNN). We jointly train a variational auto-encoder (VAE) and a generative adversarial network (GAN) to learn a powerful latent representation from footage of audiences viewing feature-length movies. We show that the learned latent representation successfully encodes meaningful signatures of behaviors related to audience engagement (smiling & laughing) and disengagement (yawning). Our results provide a proof of concept for a more general methodology for annotating hard-to-label multimedia data featuring sparse examples of signals of interest.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2018
- DOI:
- 10.48550/arXiv.1805.04136
- arXiv:
- arXiv:1805.04136
- Bibcode:
- 2018arXiv180504136S
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition