Research2026-05-14
Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment
Source: Arxiv CS.AI
arXiv:2605.13537v1 Announce Type: cross Abstract: Inference-time alignment techniques offer a lightweight alternative or complement to costly reinforcement learning, while enabling continual adaptation as alignment objectives and reward targets evolve. Existing theoretical analyses justify these...
arxivpapers