Research2026-05-05
Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs
Source: Arxiv CS.AI
arXiv:2408.11513v2 Announce Type: replace-cross Abstract: This paper focuses on learning a Constrained Markov Decision Process (CMDP) via general parameterized policies. We propose a Primal-Dual based Regularized Accelerated Natural Policy Gradient (PDR-ANPG) algorithm that uses entropy and...
arxivpapers