ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
Abstract
We present a new task and dataset, ScreenQA, for screen content understanding via question answering. The existing screen datasets are focused either on structure and component-level understanding, or on a much higher-level composite task such as navigation and task completion. We attempt to bridge the gap between these two by annotating 86K question-answer pairs over the RICO dataset in hope to benchmark the screen reading comprehension capacity.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2022
- DOI:
- 10.48550/arXiv.2209.08199
- arXiv:
- arXiv:2209.08199
- Bibcode:
- 2022arXiv220908199H
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Human-Computer Interaction