BeClaude
Research2026-04-22

GAIR: Location-Aware Self-Supervised Contrastive Pre-Training with Geo-Aligned Implicit Representations

Source: Arxiv CS.AI

arXiv:2503.16683v2 Announce Type: replace-cross Abstract: Vision Transformer (ViT) has been widely used in computer vision tasks with excellent results by providing representations for a whole image or image patches. However, ViT lacks detailed localized image representations at arbitrary positions...

arxivpapers