Deep Reason: A Strong Baseline for Real-World Visual Reasoning
Abstract
This paper presents a strong baseline for real-world visual reasoning (GQA), which achieves 60.93% in GQA 2019 challenge and won the sixth place. GQA is a large dataset with 22M questions involving spatial understanding and multi-step inference. To help further research in this area, we identified three crucial parts that improve the performance, namely: multi-source features, fine-grained encoder, and score-weighted ensemble. We provide a series of analysis on their impact on performance.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2019
- DOI:
- 10.48550/arXiv.1905.10226
- arXiv:
- arXiv:1905.10226
- Bibcode:
- 2019arXiv190510226W
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- CVPR 2019 Visual Question Answering and Dialog Workshop