Research2026-05-12
Ambig-DS: A Benchmark for Task-Framing Ambiguity in Data-Science Agents
Source: Arxiv CS.AI
arXiv:2605.09698v1 Announce Type: new Abstract: As data-science agents shift from co-pilots to auto-pilots, silent misframing becomes a critical failure mode. Agents quietly commit to plausible but unintended task framings, producing clean, executable artifacts that hide their incorrect assessment...
arxivpapersagentsbenchmark