Automated Image Captioning with CNNs and Transformers

doi:10.48550/arXiv.2412.10511

Automated Image Captioning with CNNs and Transformers

This project aims to create an automated image captioning system that generates natural language descriptions for input images by integrating techniques from computer vision and natural language processing. We employ various different techniques, ranging from CNN-RNN to the more advanced transformer-based techniques. Training is carried out on image datasets paired with descriptive captions, and model performance will be evaluated using established metrics such as BLEU, METEOR, and CIDEr. The project will also involve experimentation with advanced attention mechanisms, comparisons of different architectural choices, and hyperparameter optimization to refine captioning accuracy and overall system effectiveness.

Publication:

arXiv e-prints

Pub Date:

December 2024

DOI:

10.48550/arXiv.2412.10511

arXiv:

arXiv:2412.10511

Bibcode:

2024arXiv241210511C

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Artificial Intelligence

ADS

Automated Image Captioning with CNNs and Transformers

Abstract