Research2026-06-30

CRAFT: Counterfactual Credit Assignment from Free Sibling Rollouts for Self-Distilled Agentic Reinforcement Learning

Originally published byArxiv CS.AI

arXiv:2606.29476v1 Announce Type: cross Abstract: Self-distilled agentic reinforcement learning augments trajectory-level reward with a token-level distillation loss, using as its teacher the same policy conditioned on privileged context. The prevailing recipe gates this loss by a single scalar,...

Read Original Article on Arxiv CS.AI

arxivpapersagentsrl