Research2026-05-11
Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning
Source: Arxiv CS.AI
arXiv:2605.08061v1 Announce Type: new Abstract: We argue that decomposing reward into weighted, verifiable criteria and using an LLM judge to score them provides a partial-credit optimization signal: instead of a binary outcome or a single holistic score, each response is graded along multiple...
arxivpapersreasoning