Research2026-04-20

The Amazing Agent Race: Strong Tool Users, Weak Navigators

arXiv:2604.10261v2 Announce Type: replace Abstract: Existing tool-use benchmarks for LLM agents are overwhelmingly linear: our analysis of six benchmarks shows 55 to 100% of instances are simple chains of 2 to 5 steps. We introduce The Amazing Agent Race (AAR), a benchmark featuring directed...

Read Original Article on Arxiv CS.AI

arxivpapersagents