Research2026-05-11
Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport
Source: Arxiv CS.AI
arXiv:2605.06785v1 Announce Type: cross Abstract: Inference-time scaling methods rely on Process Reward Models (PRMs), which are often poorly calibrated and overestimate success probabilities. We propose, to our knowledge, the first use of conditional optimal transport for calibrating PRMs,...
arxivpapers