BeClaude
Research2026-05-01

EXPO: Stable Reinforcement Learning with Expressive Policies

Source: Arxiv CS.AI

arXiv:2507.07986v3 Announce Type: replace-cross Abstract: We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a unique challenge of stable value maximization....

arxivpapersrl