Research2026-05-01
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?
Source: Arxiv CS.AI
arXiv:2604.09408v3 Announce Type: replace Abstract: Frontier coding agents solve complex tasks when given complete context but collapse when specifications are incomplete or ambiguous. The bottleneck is not raw capability, but judgment: knowing when to act autonomously and when to ask for help....
arxivpapersagentsbenchmark