Policy2026-05-06

MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate

arXiv:2605.01347v1 Announce Type: cross Abstract: On-policy distillation (OPD) trains a student on its own trajectories under token-level teacher supervision, but existing methods are capped by a single-teacher capability ceiling: when the teacher errs, the student inherits the error. OPD also...

Read Original Article on Arxiv CS.AI

arxivpapersagents