Research2026-05-14
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning
Source: Arxiv CS.AI
arXiv:2504.11944v3 Announce Type: replace-cross Abstract: Offline reinforcement learning (RL) learns effective policies from pre-collected datasets, offering a practical solution for applications where online interactions are risky or costly. Model-based approaches are particularly advantageous for...
arxivpapersrl