Research2026-05-07
Cascade Token Selection for Transformer Attention Acceleration
Source: Arxiv CS.AI
arXiv:2605.03110v1 Announce Type: cross Abstract: A method is presented for reducing the cost of representative token selection in transformer attention layers by exploiting the coherence of the representative set across depth. Activation Decorrelation Attention (ADA) selects $r \ll T$...
arxivpapers