Research2026-04-17

SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention

arXiv:2604.13847v1 Announce Type: cross Abstract: While sparse attention mitigates the computational bottleneck of long-context LLM training, its distributed training process exhibits extreme heterogeneity in both \textit{1)} sequence length and \textit{2)} sparsity sensitivity, leading to a severe...

Read Original Article on Arxiv CS.AI

arxivpapers