Research2026-05-12
DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
Source: Arxiv CS.AI
arXiv:2605.08441v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) generates hundreds of thousands of tokens per training step, with rollout generation dominating the computational cost. The overall token budget can be controlled along two main dimensions: (i)...
arxivpapersrl