Research2026-04-28

Can Multimodal Large Language Models Truly Understand Small Objects?

arXiv:2604.22884v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have shown promising potential in diverse understanding tasks, e.g., image and video analysis, math and physics olympiads. However, they remain blank and unexplored for Small Object Understanding (SOU) tasks....

Read Original Article on Arxiv CS.AI

arxivpapersmultimodal