Research2026-05-01
COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts
Source: Arxiv CS.AI
arXiv:2604.27389v1 Announce Type: cross Abstract: In recent years, Multimodal Large Language Models (MLLMs) have achieved remarkable progress on a wide range of multimodal benchmarks. Despite these advances, most existing benchmarks mainly focus on single-image or multi-image comprehension. In...
arxivpaperscoherebenchmarkmultimodal