Research2026-05-12

DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments

arXiv:2503.06047v2 Announce Type: replace Abstract: Large language model (LLM)-based agents are increasingly applied to complex strategic environments that demand long-horizon reasoning, multi-agent interaction, and decision-making under uncertainty. However, common existing benchmarks either...

Read Original Article on Arxiv CS.AI

arxivpapersagentsbenchmark