Research2026-05-06
Short window attention enables long-term memorization
Source: Arxiv CS.AI
arXiv:2509.24552v3 Announce Type: replace-cross Abstract: Recent works show that hybrid architectures combining local sliding window attention layers and global attention layers outperform either of these architectures taken separately. However, the impact of the window length and the interplay...
arxivpapers