BeClaude
Research2026-05-12

CrossVL: Complexity-Aware Feature Routing and Paired Curriculum for Cross-View Vision-Language Detection

Source: Arxiv CS.AI

arXiv:2605.09802v1 Announce Type: cross Abstract: Vision-language models (VLMs) enable text-guided object detection but degrade severely under cross-view scenarios where ground and aerial viewpoints differ in altitude, scale, and spatial layout. These geometric changes introduce systematic...

arxivpapersvision