Research2026-04-22
Geometry-Aware CLIP Retrieval via Local Cross-Modal Alignment and Steering
Source: Arxiv CS.AI
arXiv:2604.16487v2 Announce Type: replace-cross Abstract: CLIP retrieval is typically framed as a pointwise similarity problem in a shared embedding space. While CLIP achieves strong global cross-modal alignment, many retrieval failures arise from local geometric inconsistencies: nearby items are...
arxivpapers