MUVO: A Multimodal World Model with Spatial Representations for Autonomous Driving
Abstract
Learning unsupervised world models for autonomous driving has the potential to improve the reasoning capabilities of today's systems dramatically. However, most work neglects the physical attributes of the world and focuses on sensor data alone. We propose MUVO, a MUltimodal World Model with spatial VOxel representations, to address this challenge. We utilize raw camera and lidar data to learn a sensor-agnostic geometric representation of the world. We demonstrate multimodal future predictions and show that our spatial representation improves the prediction quality of both camera images and lidar point clouds.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2023
- DOI:
- arXiv:
- arXiv:2311.11762
- Bibcode:
- 2023arXiv231111762B
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Robotics
- E-Print:
- Daniel Bogdoll and Yitian Yang contributed equally