Research2026-05-06
GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models
Source: Arxiv CS.AI
arXiv:2605.01203v1 Announce Type: new Abstract: Currently, process reward models (PRMs) have exhibited remarkable potential for test-time scaling. Since large language models (LLMs) regularly generate flawed intermediate reasoning steps when tackling a broad spectrum of reasoning and...
arxivpapersreasoningbenchmark