Learning in Audio-visual Context: A Review, Analysis, and New Perspective

doi:10.48550/arXiv.2208.09579

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

Sight and hearing are two senses that play a vital role in human communication and scene understanding. To mimic human perception ability, audio-visual learning, aimed at developing computational approaches to learn from both audio and visual modalities, has been a flourishing field in recent years. A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected. Starting from the analysis of audio-visual cognition foundations, we introduce several key findings that have inspired our computational studies. Then, we systematically review the recent audio-visual learning studies and divide them into three categories: audio-visual boosting, cross-modal perception and audio-visual collaboration. Through our analysis, we discover that, the consistency of audio-visual data across semantic, spatial and temporal support the above studies. To revisit the current development of the audio-visual learning field from a more macro view, we further propose a new perspective on audio-visual scene understanding, then discuss and analyze the feasible future direction of the audio-visual learning area. Overall, this survey reviews and outlooks the current audio-visual learning field from different aspects. We hope it can provide researchers with a better understanding of this area. A website including constantly-updated survey is released: \url{https://gewu-lab.github.io/audio-visual-learning/}.

Publication:

arXiv e-prints

Pub Date:

August 2022

DOI:

10.48550/arXiv.2208.09579

arXiv:

arXiv:2208.09579

Bibcode:

2022arXiv220809579W

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Artificial Intelligence;
I.2.10;
I.4.8;
I.5

E-Print:

https://gewu-lab.github.io/audio-visual-learning/

NASA/ADS

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

Abstract