Research2026-05-01
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
Source: Arxiv CS.AI
arXiv:2603.09117v2 Announce Type: replace-cross Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) significantly enhances large language models (LLMs) reasoning but severely suffers from calibration degeneration, where models become excessively over-confident in incorrect answers....
arxivpapersreasoningrl