Research2026-04-22
GAIR: Location-Aware Self-Supervised Contrastive Pre-Training with Geo-Aligned Implicit Representations
Source: Arxiv CS.AI
arXiv:2503.16683v2 Announce Type: replace-cross Abstract: Vision Transformer (ViT) has been widely used in computer vision tasks with excellent results by providing representations for a whole image or image patches. However, ViT lacks detailed localized image representations at arbitrary positions...
arxivpapers