2AFC Prompting of Large Multimodal Models for Image Quality Assessment
Abstract
While abundant research has been conducted on improving high-level visual understanding and reasoning capabilities of large multimodal models~(LMMs), their visual quality assessment~(IQA) ability has been relatively under-explored. Here we take initial steps towards this goal by employing the two-alternative forced choice~(2AFC) prompting, as 2AFC is widely regarded as the most reliable way of collecting human opinions of visual quality. Subsequently, the global quality score of each image estimated by a particular LMM can be efficiently aggregated using the maximum a posterior estimation. Meanwhile, we introduce three evaluation criteria: consistency, accuracy, and correlation, to provide comprehensive quantifications and deeper insights into the IQA capability of five LMMs. Extensive experiments show that existing LMMs exhibit remarkable IQA ability on coarse-grained quality comparison, but there is room for improvement on fine-grained quality discrimination. The proposed dataset sheds light on the future development of IQA models based on LMMs. The codes will be made publicly available at https://github.com/h4nwei/2AFC-LMMs.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2024
- DOI:
- 10.48550/arXiv.2402.01162
- arXiv:
- arXiv:2402.01162
- Bibcode:
- 2024arXiv240201162Z
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Artificial Intelligence