Map-based Modular Approach for Zero-shot Embodied Question Answering
Abstract
Embodied Question Answering (EQA) serves as a benchmark task to evaluate the capability of robots to navigate within novel environments and identify objects in response to human queries. However, existing EQA methods often rely on simulated environments and operate with limited vocabularies. This paper presents a map-based modular approach to EQA, enabling real-world robots to explore and map unknown environments. By leveraging foundation models, our method facilitates answering a diverse range of questions using natural language. We conducted extensive experiments in both virtual and real-world settings, demonstrating the robustness of our approach in navigating and comprehending queries within unknown environments.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2024
- DOI:
- 10.48550/arXiv.2405.16559
- arXiv:
- arXiv:2405.16559
- Bibcode:
- 2024arXiv240516559S
- Keywords:
-
- Computer Science - Robotics;
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- IROS 2024