Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

doi:10.48550/arXiv.2310.16781

Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and well-demonstrated with regards to cross-modal associations between language and the visual domain. In this work, we address the question of whether sound symbolism is reflected in vision-and-language models such as CLIP and Stable Diffusion. Using zero-shot knowledge probing to investigate the inherent knowledge of these models, we find strong evidence that they do show this pattern, paralleling the well-known kiki-bouba effect in psycholinguistics. Our work provides a novel method for demonstrating sound symbolism and understanding its nature using computational tools. Our code will be made publicly available.

Publication:

arXiv e-prints

Pub Date:

October 2023

DOI:

10.48550/arXiv.2310.16781

arXiv:

arXiv:2310.16781

Bibcode:

2023arXiv231016781A

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Computation and Language;
Computer Science - Machine Learning

E-Print:

Accepted to NeurIPS 2023 (spotlight). Project webpage: https://kiki-bouba.github.io/

NASA/ADS

Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

Abstract