Research2026-04-24
Fake or Real, Can Robots Tell? Evaluating VLM Robustness to Domain Shift in Single-View Robotic Scene Understanding
Source: Arxiv CS.AI
arXiv:2506.19579v3 Announce Type: replace-cross Abstract: Robotic scene understanding increasingly relies on Vision-Language Models (VLMs) to generate natural language descriptions of the environment. In this work, we systematically evaluate single-view object captioning for tabletop scenes...
arxivpapers