Research2026-04-28

When Policies Cannot Be Retrained: A Unified Closed-Form View of Post-Training Steering in Offline Reinforcement Learning

arXiv:2604.22873v1 Announce Type: cross Abstract: Offline reinforcement learning (RL) can learn effective policies from fixed datasets, but deployment objectives may change after training, and in many applications the trained actor cannot be retrained because of data, cost, or governance...

Read Original Article on Arxiv CS.AI

arxivpapersrl