Research2026-05-01
Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation
Source: Arxiv CS.AI
arXiv:2601.22228v2 Announce Type: replace-cross Abstract: We study whether vision-language models (VLMs) can solve relative camera pose estimation (RCPE) from image pairs, a direct test of multi-view spatial reasoning. We cast RCPE as a discrete verbal classification task and introduce...
arxivpapersvision