BeClaude
Research2026-05-11

Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models

Source: Arxiv CS.AI

arXiv:2601.04731v2 Announce Type: replace Abstract: Current critic-free RL methods for large reasoning models suffer from severe inefficiency when training on positive homogeneous prompts (where all rollouts are correct), resulting in waste of rollouts due to zero advantage estimates. We introduce...

arxivpapersreasoning