Research2026-04-30
Why Attend to Everything? Focus is the Key
Source: Arxiv CS.AI
arXiv:2604.03260v2 Announce Type: replace-cross Abstract: Standard attention scales quadratically with sequence length. Efficient attention methods reduce this O(n^2) cost, but when retrofitted into pretrained models, they often degrade perplexity, downstream accuracy, or both. We introduce Focus,...
arxivpapers