Research2026-04-28
LearnAlign: Data Selection for LLM Reinforcement Learning with Improved Gradient Alignment
Source: Arxiv CS.AI
arXiv:2506.11480v4 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing LLMs' reasoning abilities, yet its data inefficiency remains a major bottleneck. To address this critical yet challenging issue, we present a...
arxivpapersrl