Unsupervised Audio-Visual Subspace Alignment for High-Stakes Deception Detection
Abstract
Automated systems that detect deception in high-stakes situations can enhance societal well-being across medical, social work, and legal domains. Existing models for detecting high-stakes deception in videos have been supervised, but labeled datasets to train models can rarely be collected for most real-world applications. To address this problem, we propose the first multimodal unsupervised transfer learning approach that detects real-world, high-stakes deception in videos without using high-stakes labels. Our subspace-alignment (SA) approach adapts audio-visual representations of deception in lab-controlled low-stakes scenarios to detect deception in real-world, high-stakes situations. Our best unsupervised SA models outperform models without SA, outperform human ability, and perform comparably to a number of existing supervised models. Our research demonstrates the potential for introducing subspace-based transfer learning to model high-stakes deception and other social behaviors in real-world contexts with a scarcity of labeled behavioral data.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2021
- DOI:
- 10.48550/arXiv.2102.03673
- arXiv:
- arXiv:2102.03673
- Bibcode:
- 2021arXiv210203673M
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Human-Computer Interaction;
- Computer Science - Machine Learning
- E-Print:
- Accepted at ICASSP 2021 \c{opyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of copyrighted components of this work