Research2026-04-20

GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows

arXiv:2604.15715v1 Announce Type: cross Abstract: The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows. However, current tool-use benchmarks remain misaligned with real-world requirements, relying on...

Read Original Article on Arxiv CS.AI

arxivpapersagentsbenchmark