Policy2026-05-12
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
Source: Arxiv CS.AI
arXiv:2605.09214v1 Announce Type: cross Abstract: \emph{Kullback-Leibler} (KL) regularization is ubiquitous in reinforcement learning algorithms in the form of \emph{reverse} or \emph{forward} KL. Recent studies have demonstrated $\epsilon^{-1}$-type fast rates for decision making under reverse KL...
arxivpapers