BeClaude
Research2026-05-08

Optimal Transport for LLM Reward Modeling from Noisy Preference

Source: Arxiv CS.AI

arXiv:2605.06036v1 Announce Type: cross Abstract: Reward models are fundamental to Reinforcement Learning from Human Feedback (RLHF), yet real-world datasets are inevitably corrupted by noisy preference. Conventional training objectives tend to overfit these errors, while existing denoising...

arxivpapers