BeClaude
Research2026-04-28

When Policies Cannot Be Retrained: A Unified Closed-Form View of Post-Training Steering in Offline Reinforcement Learning

Source: Arxiv CS.AI

arXiv:2604.22873v1 Announce Type: cross Abstract: Offline reinforcement learning (RL) can learn effective policies from fixed datasets, but deployment objectives may change after training, and in many applications the trained actor cannot be retrained because of data, cost, or governance...

arxivpapersrl