Research2026-04-20
Dispatch-Aware Ragged Attention for Pruned Vision Transformers
Source: Arxiv CS.AI
arXiv:2604.15408v1 Announce Type: cross Abstract: Token pruning methods for Vision Transformers (ViTs) promise quadratic reductions in attention FLOPs by dropping uninformative patches. Yet when pruned sequences are executed with state-of-the-art variable-length attention APIs -- including...
arxivpapersragvision