Research2026-05-11
Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
Source: Arxiv CS.AI
arXiv:2512.03454v4 Announce Type: replace-cross Abstract: Interpreting natural-language commands to localize target objects is critical for autonomous driving (AD). Existing visual grounding (VG) methods for autonomous vehicles (AVs) typically struggle with ambiguous, context-dependent...
arxivpapersmultimodal