BeClaude
Research2026-04-23

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Source: Arxiv CS.AI

arXiv:2504.13818v5 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as the leading approach for enhancing reasoning capabilities in large language models. However, it faces a fundamental compute and memory asymmetry: rollout generation is...

arxivpapersrl