Research2026-05-06
Omni-NegCLIP: Enhancing CLIP with Front-Layer Contrastive Fine-Tuning for Comprehensive Negation Understanding
Source: Arxiv CS.AI
arXiv:2603.29258v2 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) have demonstrated strong capabilities across a wide range of multimodal tasks. However, recent studies have shown that VLMs, such as CLIP, perform poorly in understanding negation expressions, which are common...
arxivpapersfine-tuning