Research2026-05-07
Finite-Size Gradient Transport in Large Language Model Pretraining: From Cascade Size to Intensive Transport Efficiency
Source: Arxiv CS.AI
arXiv:2605.02968v1 Announce Type: cross Abstract: We introduce a finite-size gradient-transport framework for real language-model training, based on five observables $(D,z,\beta,\delta,v_{\mathrm{rel}})$ that separate cascade size, duration, absolute transport, and intensive transport efficiency....
arxivpapers