BeClaude
Back to News
Research2026-04-17

SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention

Source: Arxiv CS.AI

arXiv:2604.13847v1 Announce Type: cross Abstract: While sparse attention mitigates the computational bottleneck of long-context LLM training, its distributed training process exhibits extreme heterogeneity in both \textit{1)} sequence length and \textit{2)} sparsity sensitivity, leading to a severe...

arxivpapers