Research2026-05-14

Spectral Flattening Is All Muon Needs: How Orthogonalization Controls Learning Rate and Convergence

arXiv:2605.13079v1 Announce Type: cross Abstract: Muon orthogonalizes the momentum buffer before each update, replacing its singular values with ones via Newton-Schulz iterations. This simple change lets Muon tolerate far larger learning rates and converge faster than other optimizers, but why? We...

Read Original Article on Arxiv CS.AI

arxivpapers