Back to News
Research2026-04-17
Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling
Source: Arxiv CS.AI
arXiv:2604.13054v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have achieved rapid progress, yet their scaling behavior remains less clearly characterized and often less predictable than that of text-only LLMs. Increasing model size and task diversity often yields...
arxivpapersmultimodal