Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Abstract
Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and well-demonstrated with regards to cross-modal associations between language and the visual domain. In this work, we address the question of whether sound symbolism is reflected in vision-and-language models such as CLIP and Stable Diffusion. Using zero-shot knowledge probing to investigate the inherent knowledge of these models, we find strong evidence that they do show this pattern, paralleling the well-known kiki-bouba effect in psycholinguistics. Our work provides a novel method for demonstrating sound symbolism and understanding its nature using computational tools. Our code will be made publicly available.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2023
- DOI:
- 10.48550/arXiv.2310.16781
- arXiv:
- arXiv:2310.16781
- Bibcode:
- 2023arXiv231016781A
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Computation and Language;
- Computer Science - Machine Learning
- E-Print:
- Accepted to NeurIPS 2023 (spotlight). Project webpage: https://kiki-bouba.github.io/