BeClaude
Research2026-04-28

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

Source: Arxiv CS.AI

arXiv:2604.22778v1 Announce Type: cross Abstract: We present the first systematic study of weight matrix singular value spectra \emph{during} transformer pretraining, tracking full SVD decompositions of every weight matrix at 25-step intervals across three model scales (30M--285M parameters). We...

arxivpapers