Research2026-05-12
Do multimodal models imagine electric sheep?
Source: Arxiv CS.AI
arXiv:2605.09693v1 Announce Type: cross Abstract: Yes. We find that large multimodal models develop mental imagery when solving spatial puzzles, and they do imagine sheep when solving sheep puzzles. We fine-tune a Qwen3.5 VLM to solve twelve diverse visual reasoning tasks -- including tangram,...
arxivpapersmultimodal