Policy2026-05-12
PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning
Source: Arxiv CS.AI
arXiv:2602.03190v3 Announce Type: replace-cross Abstract: Reinforcement learning algorithms such as group-relative policy optimization (GRPO) have shown strong potential for improving the mathematical reasoning capabilities of large language models. While a growing body of work seeks to improve...
arxivpapersreasoningragprompting