BeClaude
Research2026-04-30

Uncertainty-Aware Reward Discounting for Mitigating Reward Hacking

Source: Arxiv CS.AI

arXiv:2604.26360v1 Announce Type: cross Abstract: Reinforcement learning (RL) systems typically optimize scalar reward functions that assume precise and reliable evaluation of outcomes. However, real-world objectives--especially those derived from human preferences--are often uncertain,...

arxivpapers