Caption Enriched Samples for Improving Hateful Memes Detection

doi:10.48550/arXiv.2109.10649

Caption Enriched Samples for Improving Hateful Memes Detection

The recently introduced hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not. Specifically, both unimodal language models and multimodal vision-language models cannot reach the human level of performance. Motivated by the need to model the contrast between the image content and the overlayed text, we suggest applying an off-the-shelf image captioning tool in order to capture the first. We demonstrate that the incorporation of such automatic captions during fine-tuning improves the results for various unimodal and multimodal models. Moreover, in the unimodal case, continuing the pre-training of language models on augmented and original caption pairs, is highly beneficial to the classification accuracy.

Publication:

arXiv e-prints

Pub Date:

September 2021

DOI:

10.48550/arXiv.2109.10649

arXiv:

arXiv:2109.10649

Bibcode:

2021arXiv210910649B

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Artificial Intelligence

E-Print:

EMNLP 2021

NASA/ADS

Caption Enriched Samples for Improving Hateful Memes Detection

Abstract