Policy2026-05-12
Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation
Source: Arxiv CS.AI
arXiv:2605.09253v1 Announce Type: cross Abstract: While recent work in Reinforcement Learning with Verifiable Rewards (RLVR) has shown that a small subset of critical tokens disproportionately drives reasoning gains, an analogous token-level understanding of On-Policy Distillation (OPD) remains...
arxivpapers