Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment Analysis
Abstract
Most existing methods focus on sentiment analysis of textual data. However, recently there has been a massive use of images and videos on social platforms, motivating sentiment analysis from other modalities. Current studies show that exploring other modalities (e.g., images) increases sentiment analysis performance. State-of-the-art multimodal models, such as CLIP and VisualBERT, are pre-trained on datasets with the text paired with images. Although the results obtained by these models are promising, pre-training and sentiment analysis fine-tuning tasks of these models are computationally expensive. This paper introduces a transfer learning approach using joint fine-tuning for sentiment analysis. Our proposal achieved competitive results using a more straightforward alternative fine-tuning strategy that leverages different pre-trained unimodal models and efficiently combines them in a multimodal space. Moreover, our proposal allows flexibility when incorporating any pre-trained model for texts and images during the joint fine-tuning stage, being especially interesting for sentiment classification in low-resource scenarios.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2022
- DOI:
- 10.48550/arXiv.2210.05790
- arXiv:
- arXiv:2210.05790
- Bibcode:
- 2022arXiv221005790L
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Computation and Language;
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- Talk: https://icml.cc/Conferences/2022/ScheduleMultitrack?event=13483#collapse20429