Research2026-06-30
CRAFT: Counterfactual Credit Assignment from Free Sibling Rollouts for Self-Distilled Agentic Reinforcement Learning
Originally published byArxiv CS.AI
arXiv:2606.29476v1 Announce Type: cross Abstract: Self-distilled agentic reinforcement learning augments trajectory-level reward with a token-level distillation loss, using as its teacher the same policy conditioned on privileged context. The prevailing recipe gates this loss by a single scalar,...
arxivpapersagentsrl