Research2026-04-27
CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding
Source: Arxiv CS.AI
arXiv:2604.22498v1 Announce Type: cross Abstract: Although Multimodal Large Language Models (MLLMs) have advanced rapidly, they still face notable challenges in fine-grained multi-image understanding, often exhibiting spatial hallucination, attention leakage, and failures in object constancy. In...
arxivpapers