Policy2026-05-07
DGPO: Distribution Guided Policy Optimization for Fine Grained Credit Assignment
Source: Arxiv CS.AI
arXiv:2605.03327v1 Announce Type: cross Abstract: Reinforcement learning is crucial for aligning large language models to perform complex reasoning tasks. However, current algorithms such as Group Relative Policy Optimization suffer from coarse grained, sequence level credit assignment, which...
arxivpapers