Research2026-05-01
EXPO: Stable Reinforcement Learning with Expressive Policies
Source: Arxiv CS.AI
arXiv:2507.07986v3 Announce Type: replace-cross Abstract: We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a unique challenge of stable value maximization....
arxivpapersrl