BeClaude
Research2026-04-28

Beyond Cross-Modal Alignment: Measuring and Leveraging Modality Gap in Vision-Language Models

Source: Arxiv CS.AI

arXiv:2502.14888v4 Announce Type: replace-cross Abstract: The success of vision-language models is primarily attributed to effective alignment across modalities such as vision and language. However, modality gaps persist in existing alignment algorithms and appear necessary for human perception as...

arxivpapersragvision