Research2026-05-01
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
Source: Arxiv CS.AI
arXiv:2510.17196v3 Announce Type: replace-cross Abstract: Effectively processing long contexts is a critical challenge for language models. While standard Transformers are limited by quadratic complexity and poor length extrapolation, alternative architectures like sliding window attention and...
arxivpapers