Research2026-05-11

Adaptive Memory Decay for Log-Linear Attention

arXiv:2605.06946v1 Announce Type: cross Abstract: Sequence models face a fundamental tradeoff between memory capacity and computational efficiency. Transformers achieve expressive context modeling at quadratic cost, while linear attention and state-space models run in linear time by compressing...

Read Original Article on Arxiv CS.AI

arxivpapers