Visual Understanding and Narration: A Deeper Understanding and Explanation of Visual Scenes
Abstract
We describe the task of Visual Understanding and Narration, in which a robot (or agent) generates text for the images that it collects when navigating its environment, by answering open-ended questions, such as 'what happens, or might have happened, here?'
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2019
- DOI:
- 10.48550/arXiv.1906.00038
- arXiv:
- arXiv:1906.00038
- Bibcode:
- 2019arXiv190600038L
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- 2-page extended abstract, presented at the Workshop on Shortcomings in Vision and Language (SiVL), 2019, at the North American Association for Computational Linguistics (NAACL)