BeClaude
Research2026-05-06

The Norm-Separation Delay Law of Grokking: A First-Principles Theory of Delayed Generalization

Source: Arxiv CS.AI

arXiv:2603.13331v2 Announce Type: replace Abstract: Grokking -- the sudden generalisation that appears long after a model has perfectly memorised its training data -- has been widely observed but lacks a quantitative theory explaining the length of the delay. We show that grokking is a norm-driven...

arxivpapers