0

MUVO: A Multimodal World Model with Spatial Representations for Autonomous Driving

Experiments with MUltimodal World Model with Geometric VOxel representations (MUVO) evaluate sensor fusion strategies and assess the benefits of 3D occupancy prediction in autonomous driving systems.

Year
2023
Venue
arXiv 2023
Authors
4
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2311.11762v3ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Learning unsupervised world models for autonomous driving has the potential to improve the reasoning capabilities of today's systems dramatically. However, most work neglects the physical attributes of the world and focuses on sensor data alone. We propose MUVO, a MUltimodal World Model with spatial VOxel representations, to address this challenge. We utilize raw camera and lidar data to learn a sensor-agnostic geometric representation of the world. We demonstrate multimodal future predictions and show that our spatial representation improves the prediction quality of both camera images and lidar point clouds.

Authors

4