Research2026-05-11

Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles

arXiv:2512.03454v4 Announce Type: replace-cross Abstract: Interpreting natural-language commands to localize target objects is critical for autonomous driving (AD). Existing visual grounding (VG) methods for autonomous vehicles (AVs) typically struggle with ambiguous, context-dependent...

Read Original Article on Arxiv CS.AI

arxivpapersmultimodal