Policy2026-04-17

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

arXiv:2604.13016v2 Announce Type: replace-cross Abstract: On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet its training dynamics remain poorly understood. This paper provides a systematic investigation of OPD dynamics and mechanisms. We...

Read Original Article on Arxiv CS.AI

arxivpapers