Research2026-05-12

Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models

arXiv:2605.09429v1 Announce Type: cross Abstract: Are low-attention visual tokens truly redundant in vision-language reasoning? Existing pruning methods often assume so, ranking visual tokens by shallow text-to-image attention and discarding low-scoring patches to accelerate LVLM inference. We show...

Read Original Article on Arxiv CS.AI

arxivpapersvision