Research2026-05-12
Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning
Source: Arxiv CS.AI
arXiv:2601.20829v2 Announce Type: replace-cross Abstract: As Reinforcement Learning with Verifiable Rewards (RLVR) substantially improves the reasoning abilities of large language models (LLMs), a new bottleneck emerges: more training problems become saturated, that is, the LLM answers the...
arxivpapersreasoning