BeClaude
Policy2026-05-12

Policy Gradient Methods for Non-Markovian Reinforcement Learning

Source: Arxiv CS.AI

arXiv:2605.10816v1 Announce Type: cross Abstract: We study policy gradient methods for reinforcement learning in non-Markovian decision processes (NMDPs), where observations and rewards depend on the entire interaction history. To handle this dependence, the agent maintains an internal state that...

arxivpapersrl