RefineCap: Concept-Aware Refinement for Image Captioning
Abstract
Automatically translating images to texts involves image scene understanding and language modeling. In this paper, we propose a novel model, termed RefineCap, that refines the output vocabulary of the language decoder using decoder-guided visual semantics, and implicitly learns the mapping between visual tag words and images. The proposed Visual-Concept Refinement method can allow the generator to attend to semantic details in the image, thereby generating more semantically descriptive captions. Our model achieves superior performance on the MS-COCO dataset in comparison with previous visual-concept based models.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2021
- DOI:
- 10.48550/arXiv.2109.03529
- arXiv:
- arXiv:2109.03529
- Bibcode:
- 2021arXiv210903529C
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- Accepted at ViGIL @NAACL 2021