BeClaude
Policy2026-05-07

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning

Source: Arxiv CS.AI

arXiv:2602.20078v3 Announce Type: replace-cross Abstract: Scaling cooperative multi-agent reinforcement learning (MARL) is fundamentally limited by cross-agent noise. When agents share a common reward, each agent's learning signal is computed from a shared return that depends on all agents, so the...

arxivpapersagents