BeClaude
Research2026-05-06

On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization

Source: Arxiv CS.AI

arXiv:2605.02141v1 Announce Type: cross Abstract: Kullback-Leibler (KL) regularization is widely used in offline decision-making and offers several benefits, motivating recent work on the sample complexity of offline learning with respect to KL-regularized performance metrics. Nevertheless, the...

arxivpapers