Research2026-04-20

Targeted Exploration via Unified Entropy Control for Reinforcement Learning

arXiv:2604.14646v2 Announce Type: replace Abstract: Recent advances in reinforcement learning (RL) have improved the reasoning capabilities of large language models (LLMs) and vision-language models (VLMs). However, the widely used Group Relative Policy Optimization (GRPO) consistently suffers from...

Read Original Article on Arxiv CS.AI

arxivpapersrl