Research2026-05-14
CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models
Source: Arxiv CS.AI
arXiv:2605.13178v1 Announce Type: cross Abstract: In large vision-language models, visual tokens typically constitute the majority of input tokens, leading to substantial computational overhead. To address this, recent studies have explored pruning redundant or less informative visual tokens for...
arxivpapersvision