BeClaude
Policy2026-05-11

Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parametric Policies

Source: Arxiv CS.AI

arXiv:2602.23811v4 Announce Type: replace-cross Abstract: We investigate the theoretical aspects of offline reinforcement learning (RL) under general function approximation. While prior works (e.g., Xie et al., 2021) have established the theoretical foundations of learning a good policy from...

arxivpapers