e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations
Abstract
The recently proposed SNLI-VE corpus for recognising visual-textual entailment is a large, real-world dataset for fine-grained multimodal reasoning. However, the automatic way in which SNLI-VE has been assembled (via combining parts of two related datasets) gives rise to a large number of errors in the labels of this corpus. In this paper, we first present a data collection effort to correct the class with the highest error rate in SNLI-VE. Secondly, we re-evaluate an existing model on the corrected corpus, which we call SNLI-VE-2.0, and provide a quantitative comparison with its performance on the non-corrected corpus. Thirdly, we introduce e-SNLI-VE, which appends human-written natural language explanations to SNLI-VE-2.0. Finally, we train models that learn from these explanations at training time, and output such explanations at testing time.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2020
- DOI:
- 10.48550/arXiv.2004.03744
- arXiv:
- arXiv:2004.03744
- Bibcode:
- 2020arXiv200403744D
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Artificial Intelligence;
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- IEEE CVPR Workshop on Fair, Data Efficient and Trusted Computer Vision, 2020