Research2026-04-20

Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs

arXiv:2604.16054v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have achieved impressive progress on vision language benchmarks, yet their capacity for visual cognitive and visuospatial reasoning remains less understood. We introduce "Mind's Eye", a multiple-choice...

Read Original Article on Arxiv CS.AI

arxivpapersbenchmarkmultimodal