BeClaude
Policy2026-05-12

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

Source: Arxiv CS.AI

arXiv:2605.10889v1 Announce Type: cross Abstract: On-policy distillation offers dense, per-token supervision for training reasoning models; however, it remains unclear under which conditions this signal is beneficial and under which it is detrimental. Which teacher model should be used, and in the...

arxivpapers