Research2026-05-12
Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward
Source: Arxiv CS.AI
arXiv:2605.09920v1 Announce Type: cross Abstract: While Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a promising post-training paradigm for Large Language Models (LLMs), its dependency on the gold label or domain-specific verifiers limits its scalability to new...
arxivpapers