Research2026-05-14

Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment

arXiv:2605.13537v1 Announce Type: cross Abstract: Inference-time alignment techniques offer a lightweight alternative or complement to costly reinforcement learning, while enabling continual adaptation as alignment objectives and reward targets evolve. Existing theoretical analyses justify these...

Read Original Article on Arxiv CS.AI

arxivpapers