Research2026-05-08
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
Source: Arxiv CS.AI
arXiv:2605.05566v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, in complex tasks, GRPO frequently suffers from the...
arxivpapersreasoningprompting