Research2026-05-06
LinMU: Multimodal Understanding Made Linear
Source: Arxiv CS.AI
arXiv:2601.01322v2 Announce Type: replace-cross Abstract: Modern Vision-Language Models (VLMs) achieve impressive performance but are limited by the quadratic complexity of self-attention, which prevents their deployment on edge devices and makes their understanding of high-resolution images and...
arxivpapersmultimodal