Research2026-05-11

Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models

arXiv:2601.04731v2 Announce Type: replace Abstract: Current critic-free RL methods for large reasoning models suffer from severe inefficiency when training on positive homogeneous prompts (where all rollouts are correct), resulting in waste of rollouts due to zero advantage estimates. We introduce...

Read Original Article on Arxiv CS.AI

arxivpapersreasoning