Research2026-05-14
Spectral Flattening Is All Muon Needs: How Orthogonalization Controls Learning Rate and Convergence
Source: Arxiv CS.AI
arXiv:2605.13079v1 Announce Type: cross Abstract: Muon orthogonalizes the momentum buffer before each update, replacing its singular values with ones via Newton-Schulz iterations. This simple change lets Muon tolerate far larger learning rates and converge faster than other optimizers, but why? We...
arxivpapers