Research2026-04-30
A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models
Source: Arxiv CS.AI
arXiv:2510.08049v3 Announce Type: replace-cross Abstract: Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by...
arxivpapersvision