"Did my figure do justice to the answer?" : Towards Multimodal Short Answer Grading with Feedback (MMSAF)
Abstract
Personalized feedback plays a vital role in a student's learning process. While existing systems are adept at providing feedback over MCQ-based evaluation, this work focuses more on subjective and open-ended questions, which is similar to the problem of Automatic Short Answer Grading (ASAG) with feedback. Additionally, we introduce the Multimodal Short Answer grading with Feedback (MMSAF) problem over the traditional ASAG feedback problem to address the scenario where the student answer and reference answer might contain images. Moreover, we introduce the MMSAF dataset with 2197 data points along with an automated framework for generating such data sets. Our evaluations on existing LLMs over this dataset achieved an overall accuracy of 55\% on Level of Correctness labels, 75\% on Image Relevance labels and a score of 4.27 out of 5 in correctness level of LLM generated feedback as rated by experts. As per experts, Pixtral achieved a rating of above 4 out of all metrics, indicating that it is more aligned to human judgement, and that it is the best solution for assisting students.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.19755
- Bibcode:
- 2024arXiv241219755S
- Keywords:
-
- Computer Science - Artificial Intelligence