Research2026-05-14
Higher-order Linear Attention
Source: Arxiv CS.AI
arXiv:2510.27258v2 Announce Type: replace-cross Abstract: The quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts. Linear-time attention and State Space Models (SSMs) provide scalable alternatives but are typically...
arxivpapers