Learning Sparse Mixture of Experts for Visual Question Answering

doi:10.48550/arXiv.1909.09192

Learning Sparse Mixture of Experts for Visual Question Answering

There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question Answering (VQA). A Convolutional Neural Network (CNN) is an integral part of the visual processing pipeline of a VQA model (assuming the CNN is trained along with entire VQA model). In this project, we propose an efficient and modular neural architecture for the VQA task with focus on the CNN module. Our experiments demonstrate that a sparsely activated CNN based VQA model achieves comparable performance to a standard CNN based VQA model architecture.

Publication:

arXiv e-prints

Pub Date:

September 2019

DOI:

10.48550/arXiv.1909.09192

arXiv:

arXiv:1909.09192

Bibcode:

2019arXiv190909192P

Keywords:

Computer Science - Machine Learning;
Computer Science - Computation and Language;
Computer Science - Computer Vision and Pattern Recognition;
Statistics - Machine Learning

E-Print:

Accepted in Visual Question Answering and Dialog Workshop, CVPR 2019

NASA/ADS

Learning Sparse Mixture of Experts for Visual Question Answering

Abstract