Research2026-04-24
AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
Source: Arxiv CS.AI
arXiv:2601.11044v4 Announce Type: replace Abstract: Large Language Models (LLMs) based autonomous agents demonstrate multifaceted capabilities to contribute substantially to economic production. However, existing benchmarks remain focused on single agentic capability, failing to capture...
arxivpapersagentsbenchmark