Policy2026-05-12
Policy Gradient Methods for Non-Markovian Reinforcement Learning
Source: Arxiv CS.AI
arXiv:2605.10816v1 Announce Type: cross Abstract: We study policy gradient methods for reinforcement learning in non-Markovian decision processes (NMDPs), where observations and rewards depend on the entire interaction history. To handle this dependence, the agent maintains an internal state that...
arxivpapersrl