Map-based Modular Approach for Zero-shot Embodied Question Answering

doi:10.48550/arXiv.2405.16559

Map-based Modular Approach for Zero-shot Embodied Question Answering

Embodied Question Answering (EQA) serves as a benchmark task to evaluate the capability of robots to navigate within novel environments and identify objects in response to human queries. However, existing EQA methods often rely on simulated environments and operate with limited vocabularies. This paper presents a map-based modular approach to EQA, enabling real-world robots to explore and map unknown environments. By leveraging foundation models, our method facilitates answering a diverse range of questions using natural language. We conducted extensive experiments in both virtual and real-world settings, demonstrating the robustness of our approach in navigating and comprehending queries within unknown environments.

Publication:

arXiv e-prints

Pub Date:

May 2024

DOI:

10.48550/arXiv.2405.16559

arXiv:

arXiv:2405.16559

Bibcode:

2024arXiv240516559S

Keywords:

Computer Science - Robotics;
Computer Science - Computer Vision and Pattern Recognition

E-Print:

IROS 2024

ADS

Map-based Modular Approach for Zero-shot Embodied Question Answering

Abstract