Research2026-05-08
Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR
Source: Arxiv CS.AI
arXiv:2605.05965v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a key approach for improving the reasoning abilities of large language models. However, widely used critic-free algorithms such as Group Relative Policy Optimization (GRPO) necessitate...
arxivpapers